A post was split to a new topic: How to set up schedule and retention
I updated it after the PR: Added minor string fixes to the retention policy descriptions Ā· duplicati/duplicati@0eecdd4 Ā· GitHub
My hat goes off to TekkiWuff. I automated this feature for a client once (for SQL backups), and the smart feature turned out being much more complicated than I had expected. I havenāt used it yet in Duplicati, but definitely will be. Thank you!
How does the Smart Retention feature decide which backup to keep for a weekly or monthly. Does it just default to the last job that ran in those time periods? For example if a backup is ran daily it would keep the Sunday backup?
If I recall correctly the most recent backup in a particular time bucket is whatās kept.
That said, while Day, Month and Year are pretty commonly defined time frames (each starts at midnight or on the 1st) a Week could be Mon-Sun or Sun-Sat (and that doesnāt take into account places like Saudi Arabia where the weekend was Thu. & Fri. until 2013 when it was changed to Fri. & Sat.!).
Iām not sure which @Tekki decided to implement which is why I tend to just use 7d
instead of 1w
.
If youāre really curious you could read through the feature discussion at Github - itās actually pretty crazy how complex what seems to be such a simple idea becomes.
Thanks @JonMikelV. Iāll skim through that Git discussion.
The online manual doesnāt seem to have been updated to include this. Under ācreating a new backup jobā, it describes retention as:
The retention can be set in 3 ways:
Unlimited: Backups will never be deleted. This is the most safe option, but remote storage capacity will keep increasing. Until they are older than: Backups older than a specified number of days, weeks, months or years will be deleted. A specific number: The specified number of backup versions will be kept, all older backups will be deleted.
In the end, what was the final syntax? In particular, all of the examples discuss days, weeks, months and years (D, W, M and Y). Is there support for hours and minutes as well? And were any other letters added (e.g., minimum number of backups to retain, etc.) or are all letters just times?
Thanks for clarifying!
The in-GUI docs say:
Enter a retention strategy manually. Placeholders are D/W/Y for days/weeks/years and U for unlimited. The syntax is: 7D:1D,4W:1W,36M:1M. This example keeps one backup for each of the next 7 days, one for each of the next 4 weeks, and one for each of the next 36 months. This can also be written as 1W:1D,1M:1W,3Y:1M.
But hereās some more detail:
Note that itās important to understand that these are timeframes, not specific periods. So if you say 1W that means 1 week between backup versions NOT any specific Mon-Sun time period. (More specifically, the timeframes are converted to seconds and thatās whatās used when comparing how far apart two backup versions are.)
Thanks for the info - the presence of the s m and h arguments are not documented in the in-GUI example, and they are exactly what I needed.
So if I wanted to keep a backup set every 15 minutes for the first 3 hours, every hour for the next 21 hours, every day for the rest of the week, every week for the rest of the month, every month for the rest of the year, and every three months forever, Iād use:
3h:15m,1D:1h,1W:1D,1M:1W,1M:1Y,U:3M
Or is this instead keeping a backup set every 15 minutes for 3 hours, then after that keeping every hour for an additional 24 hours (thereby saving 12+24 backups vs. 12+21 if my first understanding is correct) , then after that keeping a backup set every day for an additional 7 days (12+24+7 vs 12+21+6), etc.?
In other words, when specifying multiple first arguments - that is, multiple timeframes - do they nest within each other (as in my first explanation) or are they additive (as in my second explanation)?
Sorry for being dense, but having read through this discussion Iām slightly uncertain on this one point.
Thanks again for your help!
No need to apologize. This is one of those things that at first glance āshould be simpleā but once you dig into it turns out to be quite complex.
As I said, this is confusing so please forgive me if I get one of these wrongā¦
Time periods ānestā so assuming itās currently noon on the January 1st your example of 3h:15m,1D:1h,1W:1D,1M:1W,1M:1Y,U:3M
would break down to:
3h:15m
= for the next 3 hours (noon to 3 PM), keep no more than 1 backup every 15 min1D:1h
= for the next day (noon today to noon tomorrow), keep no more than 1 backup every hour1W:1D
= for the next week (noon today to noon 7 days from now), keep no more than 1 backup every day1M:1W
= for the next month (not sure if thatās converted into days or uses actual month breaks), keep no more than 1 backup every week1M:1Y
= for the next month, keep no more than 1 backup every year (Iām pretty sure you meant 1Y:1M, which would be "for the next year, keep no more than 1 backup every month)U:3M
= until forever, keep no more than 1 backup every 3 months
One thing to keep in mind is that this is a RETENTION (cleanup) rule, not a scheduling one. So if you schedule backups only once a day, a rule such as 3h:15m
isnāt going to do much.
The rules are applied after a backup is completed. Basically, Duplicati will look at all the versions it has and try to fit each version into a rule ābucketā.
Letās say you do hourly backups at the top of the hour, have 24 versions (every hour for the last day), and have a 1D:1h
rule then nothing will happen to those backups because they fit in the 1D:1h
bucket.
HOWEVER, if you do a manual backup at the BOTTOM of an hour, then when that backup finishes it will look at the retention rules and realize youāve got a 25th backup in the last 24 hours (the oldest one) that does NOT fit in the 1D:1h
bucket. Or more precisely, thereās an hour block that has TWO backups, and only the most recent goes in the bucket meaning the older one gets flagged for removal.
Does that help or just make it more confusing?
So if my retention policy is 3h:15m,1D:1h,1W:1D,1M:1W,1Y:1M,U:3M, Iāve been backing up every 15 minutes, and timeframes nest, then at noon on Dec 31st, I should have the following:
12/31 12:00
12/31 11:45
12/31 11:30
ā¦
12/31 9:00
12/31 8:00
12/31 7:00
ā¦
12/30 12:00
12/29 12:00
ā¦
12/26 12:00
12/19 12:00
12/18 12:00
and so on.
Do I have this correct (3 hours every 15 minutes, followed by 21 hours every hour, followed by 6 days of daily, etc.), or do I get three hours every fifteen minutes, followed by a full additional 24 hours every hour, etc.?
Thanks for your patience in making this clear.
So is there a way to globally set the smart retention policy?
The backup retention policy is set on a per-backup basis, with āsmart backup retentionā being the only way to set a complex backup retention policy globally (ie. where I donāt have to edit every backup and set the same manual policy).
So is there a way to define what āsmart backup retentionā means? It would be great to have a global option like āsmart-backup-polcy-definitionā, which defaults to ā1W:1D, 1M:1W, 1Y:1Mā (which I believe is the default). But that I could edit, and would apply to all backups that use the smart backup retention policy (ie. I can globally change my retention policy for all backups in one fell swoop).
So if this does not exist now, how do I request it as a feature request?
Yes, thatās it. For me, the main things I remember to keep it straight are:
- every āfor the next XXXā time period starts with the most recent backup (so
3h:15m
and1Y:1M
both start counting their 3h or1Y periods from the most recent backup - versions already counted in a smaller ābucketā donāt count towards longer ones (so if a version is being kept due to the
3h:15m
rule, it would be ignored when counting versions for the1Y:1M
rule) - retention policy is just that, retention of existing versions and NOT scheduling of actual backups (so if you schedule daily backups, the
3h:15m
rule likely wonāt do much)
Sure! Go to the global āSettingsā page and select āretention-policyā from the āAdd advanced optionā selector.
@JonMikelV Thanks! I didnāt know about that option
Please excuse if I somehow missed this aboveā¦
Can we nest similar timeframe groupings like Y (years) to achieve a tiered effect? example being 1Y:4M,7Y:1Y,99Y:10Y, the effect being that for year 1 we keep every 4th month, for years 2-7 we keep annual and for years 8-99 we keep every 10th yearā¦ and everything from year 100 on gets removedā¦
Additionally, once you have applied this āglobalā settingā¦ what setting is set within the individual back jobs so that this global setting does not get overridden?
Thanksā¦
Yes - I believe that is correct. I think of it this way - once a backup is āclaimedā by a timeframe, it is excluded from further checks in other (longer) timeframes.
As for using a global setting - setting the job to ākeep all versionsā will let the global setting be applied. You can verify this by using the job āExportā ā āAs Command-lineā menu item and verify the --retention-policy
parameter shown is what you expect.
is there a utility where you can enter backup retention times, and which gives the string to be inserted on duplicates?
I come from the Cobian Backup program, and now Iām trying duplicates, only I do not have a clear idea of how to keep backups.
On Cobian, my programming works like this:
daily backup (for 7 days - duration 1 week)
weekly backup (1 time on Saturday for 4 weeks - duration 1 month)
monthly backup (1 on the first day of the month for 12 months - duration 3 years)
how should I set duplicates to get the same result?
thank you all
I donāt know of any tool for generating retention policy strings.
But even so, Duplicati doesnāt currently support specific day (of week or month) retention settings.
So you can do daily for 7 days (1d:7d) (7d:1d), weekly for a month (1w:1m) (1M:1w) and monthly for three years (1m:3y) (3y:1M) but you canāt control that Saturdays or the 1st of the month are the ones that are kept.
It might be a nice feature to add / request, just be sure to consider how to handle situations where they desired backup (say 1st of the month) doesnāt exist.
Remember, this is a policy of how to delete existing backups so if we were able to say ādelete every backup for the week EXCEPT Saturdayā what should be done if there is no Saturday backup for one of the weeks (maybe the computer was turned off for the weekend)?
I think you might have swapped duration:interval for interval:duration in your examples
I deduce that the backup is duplicated daily, and I have to think only about the method of storing backups.
But if for example I want to create full backups every month and park them for 1 year or more, how can I do?
EDIT: otherwise I would have to create more backups like es:
1 backup to run once a week and save only the last month
1 backup to be performed once a month and stored for 3 years