New retention policy deletes old backups in a smart way

Hello everyone!

Smart backup retention is a feature that has been discussed lately on the forum and on github. It can delete old backups in an intelligent way. Assume you run your backup every day, then you have 365 backups during a year which can occupy a lot of storage space. Backup retention will delete old backups in a way that you keep less backups the older they get. For instance, you can have 7 backups for the last days, 4 backups for the last month, 12 backups for the last year. And all this is happening automatically.

TekkiWuff implemented this feature in Duplicati’s server in October. We have since added a standard retention policy to the UI. You can choose to keep a specific number of backups, or all backups for a specific time frame, or you can let Duplicati apply a smart retention policy. This feature is available in the latest canary version 2.0.2.17. Please send us feedback how this feature works for you.

The default policy is to tstore one backup for each of the last 7 days each of the last 4 weeks and each of the last 12 months. You find it under “general options” of your backup configuration. You can also configure the policy yourself.

Backup retention is one of our first “paid development”. The issue in github was created in November 2016 and a $50 bug bounty was put on it via BountySource. With the UI being available, we are going to close the issue and give the bug bounty to TekkiWuff who did most of the work.

René

17 Likes

Congrats to TekkiWuff. :+1:

Another nice feature in the Canary stream – is there an upgrade path from 2.0.2.1_beta_2017-08-01 to Canary?

Will there be a way to customize the policy from the UI, if the “7 days”, “4 weeks”, “12 months” periods are not appropriate?

Yes, just choose “custom” and you can type in the retention policy like you would on the commandline.

That should be a matter of going to settings and choosing the Canary channel, then on the About page choose “check for updates” and install the update.

1 Like

Awesome - thanks for adding this to the GUI!

Can you confirm a few things for me?

  1. Does “There will always be at least one remaining backup” mean files deleted from the source will NOT be deleted from the backup, even after 12 months?

  2. Am I correct in assuming the NEWEST version of each “date range” will be kept? For example, with hourly backups on the 1st minute of the hour I’d have:

    • 24 backups for “today”
    • 7 backups for “this week” (ONLY the 11:59 PM backup from each of the 7 previous days, includes 1 backup also counted with “today”)
    • 4 backups for “this month” (ONLY the 11:59 PM backup for the most recent day of “the week” for each of the 4 previous weeks, includes 1 backup also counted with “this week”)
    • 12 backups for “this year” (ONLY the 11:59 backup for the last day of “the month” for the previous 12 months, includes 1 backup also counted with “this month”)

This is a bit of a generic question, but what happens if I use this UI setting and manually add --retention-policy in the advanced options?

3 Likes

Hi,

I’m not sure if I understand this completely… If I set the retention to “Smart backup retention”, will Duplicati then upload (or keep) the whole backup once every 7 days, every 4 weeks and every 12 months and then reupload it again (completely)? That would be overkill for my slow upload speed because I’m backing up several Gigabytes and am therefore a big friend of incremental backups.

At the moment I use “Keep all backups”.

Greets
Torsten

Not at all. This setting is basically a way of scheduling deletes of old backup versions based on age.

If you were to use this feature and look at your versions (like you can see in the select box when doing a restore) you’d see a lot of backups for today / the last 7 days, 1 backup a week going back a month, 1 backup a month going back a year, and (I believe) NO backups older than a year old.

Again, this just flags older versions for deletion. I expect the actual deletion will happen during normal compacting maintenance - so you can expect a small increase in bandwidth due to multiple archive files being download and re-archived without the flagged-for-deletion content in them.

But you shouldn’t see anything like the bandwidth needed for a “full backup” as if starting from scratch.

2 Likes

It means, that if there is only one backup and it is 13 months old, it will not be deleted although it is older than 12 months.

2 Likes

Got it - so if a file hasn’t changed in a year and still exists in the source, it will NOT be deleted by the smart retention policy.

I assume if the file HAS been deleted from the source, then it WILL be deleted from the backup once it “ages out” at the 13 month point.

Thanks!

1 Like

OK, that sounds great! Thanks.

I have moved from the experimental track to Canary, and downloaded and activated. The about page says

You are currently running Duplicati - 2.0.2.17_canary_2018-01-23

However, I am not seeing the new retention options when modifying a backup. Will this only work on NEW backup configurations?

Nope, doesn’t show up for new backup configurations either.

So what happens to the --retention-policy option? I currently have this:

–retention-policy=6W:0s,16W:1W,2Y:1M

and

“Keep this number of backups”=“A specific number”=40

1 Like

It’s possible your browser is caching the UI - does an F5 fix it?

Thanks! Worked with no issues. Restore performance has improved to slow (30 seconds per click) from almost unusable (20 minutes per click).

Took a combination of doing F5’s and explicitly clearing the cache and a little more folderol, but all of the computers are working right now.

Chrome? I also used shift/ctrl+F5, everything, in order to make it work. A simple F5 wasn’t enough.
But even after all incarnations of F5, the UI tends to show me the previous settings (e.g. “Keep 1 file”) in EDIT Configuration, (5) Settings.
If I look into the export command line I can see, the new retention setting is active.
UI refresh is a thing, at least with Chrome.

EDIT: does it mix old setting and new setting in the same command line? UI displays “Keep specific number: 3”, the saved command line is:
-keep-versions=3 --retention-policy="8D:2D,4W:1W,12M:1M"
In another backup I have: "–keep-time=12M --retention-policy=“7D:1D,4W:1W,60M:6M”"
Seems like a mess. Seems as if the UI does not very well save new settings.

Awesome feature. Can I use != 3 blocks and crazy mixes of syntax? Hypothetical examples:

a) 7D:1D,60D:5D,2Y:1W,60M:10M,1200M:3650D
b) 1M:3D

This way we’d be really free to express our backup retention ideas. Does it understand only integers or also float values (just asking, not expecting).

hm … I haven’t tried this, but at the moment there might actually be situations where you’d end up with less then the specified 40 versions since the “Keep this number of backups” option doesn’t take into account, that the “Retention Policy” will already delete some backups.

So even if the GUI might prevent having multiple Backup retention options simultaneously in the future, I’ll still try to have a look at code for combining these options soon-ish. If it really turns out to be a problem, then I’ll try to change it so that the “Keep this number of backups” is taken into account after both the “Retention Policy” and the “Delete backups that are older than” options were applied.

Yep, you can use more or fewer than three blocks.
And while it only accepts integers, instead of just using days (D), months (M) and years (Y), you can also use seconds (s), minutes (m) and hours (h). And you can combine these for example like 2D22h5m2s to have a period of 2 days, 22 hours, 5 minutes and 2 seconds

Tekki

3 Likes