If I choose unlimited retention and back up once per hour, is this going to cause any problems?
I understand back end storage requirements will continually increase, but besides that are there any problems I need to think about? Running it this way will create ~8760 backup versions per year.
There is a parameter (--upload-unchanged-backups=true) to force a backup to be recorded even if no files changed. Other than the mentioned (eventual) space issues the most likely side effects of LOTS of backups (regardless of frequency) would include:
longer backup runs due to more time necessary for block hash lookups in the sqlite database
shower restore UI performance due to sqlite lookups on more records and browser handling of very long select lists
There are probably more effects but I canât think of any FAILURE scenarios (unless youâre talking hundreds of millions of backups) and expect the items above are the most noticeable.
Note that there are one or two people working on historical thinning so the frequency of kept backups can decrease over time (hourly for the last weekâs backups, daily for the next set out to a month, weekly to the next quarter, etc.) None of them are finished yet, though.
The limit I supposed you would hit first is the size of the SQLite database, and that can be really huge. But it will probably be too slow to use, long before you hit that.
That is actually sneaked into 2.0.2.10 as the advanced option --retention-policy
Use this option to reduce the number of versions that are kept with increasing version age by deleting most of the old backups. The expected format is a comma seperatedseparated list of colloncolonsperatedseparated time frame and interval pairs. For example the value â7D:0s,3M:1D,10Y:2Mâ means "For 7 day keep all backups, for 3 months keep one backup per day and for 10 years one backup every 2nd month "[note missing end quote here]
But my main question â Is there any further instruction for how to use this? I.e. what the acceptable parameters are, how many are allowed (unlimited? just 3? unsure), etc? I like this feature and it seems like it will give a perfect amount of flexibility, hopefully some GUI can be built around it soon but text-based is ok with me for now as long as I understand what I can and canât do with it
Also: when using Retention Policy, does it simply override the old âKeep this number of backupsâ setting, or do I need to set it to âunlimitedâ (or rather a value further out than the longest timespan in the Retention Policy setting)?
Chiming in here, since this is my code (and my spelling mistakes *cough*):
Is there any further instruction for how to use this
Sadly not. I thought keeping the description somewhat brief might be good, as I was afraid it might scare of users when they see a huge wall of text ^^
I tried to explain the feature and how it works in a comment in the corresponding ticket Issue 2084 (comment)
I think the most important information are:
the letters for seconds, minutes, hours, etc are the same as in the keep-time option. So from smallest to biggest: âsâ, âmâ, âhâ, âDâ, âWâ, âMâ, âYâ
As explained in the example in the option description: The interval of 0 (no matter if seconds, minutes, etc) basicly means âkeep all versions in that time frameâ since the distance between two backups will always be bigger than that, thus never deleting anything
The time frames do not stack but rather overlap, with the smaller time frames taking priority:
So if you configured 1 week with keeping all backups and 2 weeks with keeping only daily backups, then it will effectivly result in all backups being kept for the first week (as configured) and after that 1 (sic!) week with daily backups (2 weeks minus the one overlap)
As kenkendk also metioned in his review of my pull request: The option might cause some confusing in regards to the already existing keep-time option.
The way I built it, this feature will never touch any backups that are older than what you configured. So if your biggest time frame spans 2 years, then everything after that will be ignored. If you ultimately want to remove all backups older than a certain age, you ALSO have to add the keep-time option.
Iâd say itâs still open to debate what makes the most sense here.
what the acceptable parameters are, how many are allowed (unlimited? just 3? unsure)
There is no specific limit to the amount of parameters. Iâm using it with 4 time frames: 7D:0s,1M:1D,6M:1W,10Y:1M plus the keep-time option with 10Y
hopefully some GUI can be built around it soon but text-based is ok with me for now as long as I understand what I can and canât do with it
Yeah thatâd be nice indeed. There were some people already asking if they can help building the GUI for it, since I have no experience with that in C# yet.
Iâll update the ticket with the information that this feature made it into Duplicati, so others might be encouraged to work on it
I suppose a good way to word it might be, just to point out that keep-time will take precedence so to make sure itâs set longer than the longest timespan in Retention Policy (if desired, i guess).
Also just for clarification, when a subsequent timeframe is reached, how does it decide which version to keep? I assume it just keeps the newest one and removes the older one at that point, but wasnât sure.
Is the retention policy feature ready for prime-time? Or does it need more testing?
Well ⌠the feature only in the Canary Build so ⌠see it as an experimental feature for now
That said, Iâm using it for my regular backups for a few weeks now and so far didnât have any problems.
Is there a way to run a âwhat ifâ analysis with certain retention policy to see what snapshots would be deleted and what would be retained?
I have added extensive logging of which backups will be kept and which will be deleted. The log entries all begin with [Retention Policy]. Most of them require the Profiling Log Level though.
So you should be able to use the already existent dry-run option in Duplicati to run it via the command line without making any actual changes and then check the log to see what gets deleted and why.
when a subsequent timeframe is reached, how does it decide which version to keep? I assume it just keeps the newest one and removes the older one at that point, but wasnât sure.
When a backup spills over into the next time frame, the algorithm checks if the difference between that backup and the next older backup is at least as much the specified interval. If not, then the just spilled over backup gets deleted.
In my tests it didnât work well to always keep the newer one of the two backups. If there is demand I can draw another example that explains why.
I added my example that I already posted in the Github Issue here as well:
Edit (further explanation):
In this example the backup is run every day, so there is one backup per day. The configured time frames and intervalls are at the top in the coloured areas.
Looking at the line that starts with âSâ: There are 7 days worth of daily backups (green area). On the next day (line below starting with the new backup âTâ) the backup âMâ spills over into the bucket which keeps backups only every second day (yellow area). Since it is more than two days away from âKâ, âMâ is kept.
Again a day later the backup âUâ gets created while backup âNâ spills over into the yellow bucket. But since it is less than two days away from âMâ, âNâ gets deleted.
That makes sense now (and ironically i just got to the part in the Git thread where you posted this graphic too, lol) - thanks for the further explanation
I think this was touched on at GitHub so I apologize if this has already been covered, but when you say âthe backup gets deletedâ do you mean the backup for that day (letâs say âNâ) or just revisions inside âNâ (and of course âNâ if it ends up empty)?
In either case, if --keep-time is unlimited, are you not deleting the most recent âorphanedâ revision (last version of a file backed up before the source was deleted)?
when you say âthe backup gets deletedâ do you mean the backup for that day (letâs say âNâ) or just revisions inside âNâ (and of course âNâ if it ends up empty)?
Yeah ⌠I mean the whole backup gets deleted.
While the idea of deleting individual file versions from within a backup sounds great for the future, it was a too big change for me to introduce. See the second half of this comment: Issue 2084 (comment)
Thanks for the confirmation - I wasnât sure if that (the link) process was what ended up getting implemented.
You might want to include a comment in the final description along the lines of:
Note that as a side effect of taking precedence (within itâs time-frame) over --keep-time it is possible for source-deleted files to be completely removed from the backup.
As a volunteer open-source project, giving time estimates is very difficult since we have âday jobsâ to keep us fed () however we are working to finalize the the features for the next stable release and I believe this (and hopefully these #planned tagged items are to be included when it is released.
I have a rather big backup, 275 GB with 500 versions.
It is extremely slow listing files. It takes several hours if something should be restored.
Getting the initial list takes about 40 minutes. Browsing the next folder takes equal time âŚ
Is it possible to apply retention policy to this backup in order to reduce the versions and then run compact to make it process faster?
This is a known problem. Fixing it will require a major change to the internal database structure and a rewrite of many parts of the source code.
Unfortunally there is no fix or workaround for this issue at the moment. Fixing this problem is on the To-Do list, but a release date is not known.
See this discussion for more info: