New retention policy deletes old backups in a smart way

A post was split to a new topic: How to set up schedule and retention

I updated it after the PR: Added minor string fixes to the retention policy descriptions Ā· duplicati/duplicati@0eecdd4 Ā· GitHub

2 Likes

My hat goes off to TekkiWuff. I automated this feature for a client once (for SQL backups), and the smart feature turned out being much more complicated than I had expected. I havenā€™t used it yet in Duplicati, but definitely will be. Thank you!

1 Like

How does the Smart Retention feature decide which backup to keep for a weekly or monthly. Does it just default to the last job that ran in those time periods? For example if a backup is ran daily it would keep the Sunday backup?

If I recall correctly the most recent backup in a particular time bucket is whatā€™s kept.

That said, while Day, Month and Year are pretty commonly defined time frames (each starts at midnight or on the 1st) a Week could be Mon-Sun or Sun-Sat (and that doesnā€™t take into account places like Saudi Arabia where the weekend was Thu. & Fri. until 2013 when it was changed to Fri. & Sat.!).

Iā€™m not sure which @Tekki decided to implement which is why I tend to just use 7d instead of 1w.

If youā€™re really curious you could read through the feature discussion at Github - itā€™s actually pretty crazy how complex what seems to be such a simple idea becomes. :open_mouth:

Thanks @JonMikelV. Iā€™ll skim through that Git discussion.

The online manual doesnā€™t seem to have been updated to include this. Under ā€œcreating a new backup jobā€, it describes retention as:

The retention can be set in 3 ways:

Unlimited:
Backups will never be deleted. This is the most safe option, but remote storage capacity will keep increasing.
Until they are older than:
Backups older than a specified number of days, weeks, months or years will be deleted.
A specific number:
The specified number of backup versions will be kept, all older backups will be deleted.

In the end, what was the final syntax? In particular, all of the examples discuss days, weeks, months and years (D, W, M and Y). Is there support for hours and minutes as well? And were any other letters added (e.g., minimum number of backups to retain, etc.) or are all letters just times?

Thanks for clarifying!

The in-GUI docs say:

Enter a retention strategy manually. Placeholders are D/W/Y for days/weeks/years and U for unlimited. The syntax is: 7D:1D,4W:1W,36M:1M. This example keeps one backup for each of the next 7 days, one for each of the next 4 weeks, and one for each of the next 36 months. This can also be written as 1W:1D,1M:1W,3Y:1M.

But hereā€™s some more detail:

Note that itā€™s important to understand that these are timeframes, not specific periods. So if you say 1W that means 1 week between backup versions NOT any specific Mon-Sun time period. (More specifically, the timeframes are converted to seconds and thatā€™s whatā€™s used when comparing how far apart two backup versions are.)

Thanks for the info - the presence of the s m and h arguments are not documented in the in-GUI example, and they are exactly what I needed.

So if I wanted to keep a backup set every 15 minutes for the first 3 hours, every hour for the next 21 hours, every day for the rest of the week, every week for the rest of the month, every month for the rest of the year, and every three months forever, Iā€™d use:

3h:15m,1D:1h,1W:1D,1M:1W,1M:1Y,U:3M

Or is this instead keeping a backup set every 15 minutes for 3 hours, then after that keeping every hour for an additional 24 hours (thereby saving 12+24 backups vs. 12+21 if my first understanding is correct) , then after that keeping a backup set every day for an additional 7 days (12+24+7 vs 12+21+6), etc.?

In other words, when specifying multiple first arguments - that is, multiple timeframes - do they nest within each other (as in my first explanation) or are they additive (as in my second explanation)?

Sorry for being dense, but having read through this discussion Iā€™m slightly uncertain on this one point.

Thanks again for your help!

No need to apologize. This is one of those things that at first glance ā€œshould be simpleā€ but once you dig into it turns out to be quite complex.

As I said, this is confusing so please forgive me if I get one of these wrongā€¦

Time periods ā€œnestā€ so assuming itā€™s currently noon on the January 1st your example of 3h:15m,1D:1h,1W:1D,1M:1W,1M:1Y,U:3M would break down to:

  • 3h:15m = for the next 3 hours (noon to 3 PM), keep no more than 1 backup every 15 min
  • 1D:1h = for the next day (noon today to noon tomorrow), keep no more than 1 backup every hour
  • 1W:1D= for the next week (noon today to noon 7 days from now), keep no more than 1 backup every day
  • 1M:1W = for the next month (not sure if thatā€™s converted into days or uses actual month breaks), keep no more than 1 backup every week
  • 1M:1Y = for the next month, keep no more than 1 backup every year (Iā€™m pretty sure you meant 1Y:1M, which would be "for the next year, keep no more than 1 backup every month)
  • U:3M = until forever, keep no more than 1 backup every 3 months

One thing to keep in mind is that this is a RETENTION (cleanup) rule, not a scheduling one. So if you schedule backups only once a day, a rule such as 3h:15m isnā€™t going to do much.

The rules are applied after a backup is completed. Basically, Duplicati will look at all the versions it has and try to fit each version into a rule ā€œbucketā€.

Letā€™s say you do hourly backups at the top of the hour, have 24 versions (every hour for the last day), and have a 1D:1h rule then nothing will happen to those backups because they fit in the 1D:1h bucket.

HOWEVER, if you do a manual backup at the BOTTOM of an hour, then when that backup finishes it will look at the retention rules and realize youā€™ve got a 25th backup in the last 24 hours (the oldest one) that does NOT fit in the 1D:1h bucket. Or more precisely, thereā€™s an hour block that has TWO backups, and only the most recent goes in the bucket meaning the older one gets flagged for removal.


Does that help or just make it more confusing?

So if my retention policy is 3h:15m,1D:1h,1W:1D,1M:1W,1Y:1M,U:3M, Iā€™ve been backing up every 15 minutes, and timeframes nest, then at noon on Dec 31st, I should have the following:

12/31 12:00
12/31 11:45
12/31 11:30
ā€¦
12/31 9:00
12/31 8:00
12/31 7:00
ā€¦
12/30 12:00
12/29 12:00
ā€¦
12/26 12:00
12/19 12:00
12/18 12:00
and so on.

Do I have this correct (3 hours every 15 minutes, followed by 21 hours every hour, followed by 6 days of daily, etc.), or do I get three hours every fifteen minutes, followed by a full additional 24 hours every hour, etc.?

Thanks for your patience in making this clear.

So is there a way to globally set the smart retention policy?

The backup retention policy is set on a per-backup basis, with ā€˜smart backup retentionā€™ being the only way to set a complex backup retention policy globally (ie. where I donā€™t have to edit every backup and set the same manual policy).

So is there a way to define what ā€˜smart backup retentionā€™ means? It would be great to have a global option like ā€˜smart-backup-polcy-definitionā€™, which defaults to ā€˜1W:1D, 1M:1W, 1Y:1Mā€™ (which I believe is the default). But that I could edit, and would apply to all backups that use the smart backup retention policy (ie. I can globally change my retention policy for all backups in one fell swoop).

So if this does not exist now, how do I request it as a feature request? :slight_smile:

Yes, thatā€™s it. For me, the main things I remember to keep it straight are:

  1. every ā€œfor the next XXXā€ time period starts with the most recent backup (so 3h:15m and 1Y:1M both start counting their 3h or1Y periods from the most recent backup
  2. versions already counted in a smaller ā€œbucketā€ donā€™t count towards longer ones (so if a version is being kept due to the 3h:15m rule, it would be ignored when counting versions for the 1Y:1M rule)
  3. retention policy is just that, retention of existing versions and NOT scheduling of actual backups (so if you schedule daily backups, the 3h:15m rule likely wonā€™t do much)

Sure! Go to the global ā€œSettingsā€ page and select ā€œretention-policyā€ from the ā€œAdd advanced optionā€ selector.

2 Likes

@JonMikelV Thanks! I didnā€™t know about that option :slight_smile:

Please excuse if I somehow missed this aboveā€¦

Can we nest similar timeframe groupings like Y (years) to achieve a tiered effect? example being 1Y:4M,7Y:1Y,99Y:10Y, the effect being that for year 1 we keep every 4th month, for years 2-7 we keep annual and for years 8-99 we keep every 10th yearā€¦ and everything from year 100 on gets removedā€¦

Additionally, once you have applied this ā€˜globalā€™ settingā€¦ what setting is set within the individual back jobs so that this global setting does not get overridden?

Thanksā€¦

Yes - I believe that is correct. I think of it this way - once a backup is ā€œclaimedā€ by a timeframe, it is excluded from further checks in other (longer) timeframes.

As for using a global setting - setting the job to ā€œkeep all versionsā€ will let the global setting be applied. You can verify this by using the job ā€œExportā€ ā†’ ā€œAs Command-lineā€ menu item and verify the --retention-policy parameter shown is what you expect.

is there a utility where you can enter backup retention times, and which gives the string to be inserted on duplicates?

I come from the Cobian Backup program, and now Iā€™m trying duplicates, only I do not have a clear idea of how to keep backups.
On Cobian, my programming works like this:
daily backup (for 7 days - duration 1 week)
weekly backup (1 time on Saturday for 4 weeks - duration 1 month)
monthly backup (1 on the first day of the month for 12 months - duration 3 years)
how should I set duplicates to get the same result?

thank you all

I donā€™t know of any tool for generating retention policy strings.

But even so, Duplicati doesnā€™t currently support specific day (of week or month) retention settings.

So you can do daily for 7 days (1d:7d) (7d:1d), weekly for a month (1w:1m) (1M:1w) and monthly for three years (1m:3y) (3y:1M) but you canā€™t control that Saturdays or the 1st of the month are the ones that are kept.

It might be a nice feature to add / request, just be sure to consider how to handle situations where they desired backup (say 1st of the month) doesnā€™t exist.

Remember, this is a policy of how to delete existing backups so if we were able to say ā€œdelete every backup for the week EXCEPT Saturdayā€ what should be done if there is no Saturday backup for one of the weeks (maybe the computer was turned off for the weekend)?

I think you might have swapped duration:interval for interval:duration in your examples :slight_smile:

I deduce that the backup is duplicated daily, and I have to think only about the method of storing backups.
But if for example I want to create full backups every month and park them for 1 year or more, how can I do?

EDIT: otherwise I would have to create more backups like es:
1 backup to run once a week and save only the last month
1 backup to be performed once a month and stored for 3 years