Maximum backup versions?

drwtsn32 · October 15, 2017, 7:29pm

If I choose unlimited retention and back up once per hour, is this going to cause any problems?

I understand back end storage requirements will continually increase, but besides that are there any problems I need to think about? Running it this way will create ~8760 backup versions per year.

sanderson · October 16, 2017, 3:45am

Are you’re files changing that consistently? If nothing changes it won’t actually create a new version (I found this out a little while back).

Curious to hear an official answer, I’m sure there is a limit on feasible numbers of versions, but I bet it’s quite a bit higher than 9K

JonMikelV · October 16, 2017, 5:02am

There is a parameter (--upload-unchanged-backups=true) to force a backup to be recorded even if no files changed. Other than the mentioned (eventual) space issues the most likely side effects of LOTS of backups (regardless of frequency) would include:

longer backup runs due to more time necessary for block hash lookups in the sqlite database
shower restore UI performance due to sqlite lookups on more records and browser handling of very long select lists

There are probably more effects but I can’t think of any FAILURE scenarios (unless you’re talking hundreds of millions of backups) and expect the items above are the most noticeable.

Note that there are one or two people working on historical thinning so the frequency of kept backups can decrease over time (hourly for the last week’s backups, daily for the next set out to a month, weekly to the next quarter, etc.) None of them are finished yet, though.

kenkendk · October 16, 2017, 3:38pm

The limit I supposed you would hit first is the size of the SQLite database, and that can be really huge. But it will probably be too slow to use, long before you hit that.

That is actually sneaked into 2.0.2.10 as the advanced option --retention-policy

drakar2007 · October 16, 2017, 3:57pm

Nice!

BTW i noticed a few typos (screenshot)

Use this option to reduce the number of versions that are kept with increasing version age by deleting most of the old backups. The expected format is a comma ~~seperated~~ separated list of ~~collon~~ colon ~~sperated~~ separated time frame and interval pairs. For example the value “7D:0s,3M:1D,10Y:2M” means "For 7 day keep all backups, for 3 months keep one backup per day and for 10 years one backup every 2nd month " [note missing end quote here]

But my main question – Is there any further instruction for how to use this? I.e. what the acceptable parameters are, how many are allowed (unlimited? just 3? unsure), etc? I like this feature and it seems like it will give a perfect amount of flexibility, hopefully some GUI can be built around it soon but text-based is ok with me for now as long as I understand what I can and can’t do with it

Also: when using Retention Policy, does it simply override the old “Keep this number of backups” setting, or do I need to set it to “unlimited” (or rather a value further out than the longest timespan in the Retention Policy setting)?

drwtsn32 · October 16, 2017, 4:52pm

Cool! Does it work well? This should help prevent me from having gazillions of recovery points.

Tekki · October 16, 2017, 4:54pm

Chiming in here, since this is my code (and my spelling mistakes *cough*):

Is there any further instruction for how to use this

Sadly not. I thought keeping the description somewhat brief might be good, as I was afraid it might scare of users when they see a huge wall of text ^^
I tried to explain the feature and how it works in a comment in the corresponding ticket Issue 2084 (comment)

I think the most important information are:

the letters for seconds, minutes, hours, etc are the same as in the keep-time option. So from smallest to biggest: ‘s’, ‘m’, ‘h’, ‘D’, ‘W’, ‘M’, ‘Y’
As explained in the example in the option description: The interval of 0 (no matter if seconds, minutes, etc) basicly means “keep all versions in that time frame” since the distance between two backups will always be bigger than that, thus never deleting anything
The time frames do not stack but rather overlap, with the smaller time frames taking priority:
So if you configured 1 week with keeping all backups and 2 weeks with keeping only daily backups, then it will effectivly result in all backups being kept for the first week (as configured) and after that 1 (sic!) week with daily backups (2 weeks minus the one overlap)
As kenkendk also metioned in his review of my pull request: The option might cause some confusing in regards to the already existing keep-time option.
The way I built it, this feature will never touch any backups that are older than what you configured. So if your biggest time frame spans 2 years, then everything after that will be ignored. If you ultimately want to remove all backups older than a certain age, you ALSO have to add the keep-time option.
I’d say it’s still open to debate what makes the most sense here.

what the acceptable parameters are, how many are allowed (unlimited? just 3? unsure)

There is no specific limit to the amount of parameters. I’m using it with 4 time frames: 7D:0s,1M:1D,6M:1W,10Y:1M plus the keep-time option with 10Y

hopefully some GUI can be built around it soon but text-based is ok with me for now as long as I understand what I can and can’t do with it

Yeah that’d be nice indeed. There were some people already asking if they can help building the GUI for it, since I have no experience with that in C# yet.
I’ll update the ticket with the information that this feature made it into Duplicati, so others might be encouraged to work on it

drwtsn32 · October 16, 2017, 6:12pm

Thank you for the clarifications!

Is the retention policy feature ready for prime-time? Or does it need more testing?

Is there a way to run a “what if” analysis with certain retention policy to see what snapshots would be deleted and what would be retained?

drakar2007 · October 16, 2017, 6:19pm

Sweet, thanks, that all makes sense.

I suppose a good way to word it might be, just to point out that keep-time will take precedence so to make sure it’s set longer than the longest timespan in Retention Policy (if desired, i guess).

Also just for clarification, when a subsequent timeframe is reached, how does it decide which version to keep? I assume it just keeps the newest one and removes the older one at that point, but wasn’t sure.

Tekki · October 16, 2017, 6:56pm

Is the retention policy feature ready for prime-time? Or does it need more testing?

Well … the feature only in the Canary Build so … see it as an experimental feature for now
That said, I’m using it for my regular backups for a few weeks now and so far didn’t have any problems.

Is there a way to run a “what if” analysis with certain retention policy to see what snapshots would be deleted and what would be retained?

I have added extensive logging of which backups will be kept and which will be deleted. The log entries all begin with [Retention Policy]. Most of them require the Profiling Log Level though.
So you should be able to use the already existent dry-run option in Duplicati to run it via the command line without making any actual changes and then check the log to see what gets deleted and why.

Tekki · October 16, 2017, 7:12pm

when a subsequent timeframe is reached, how does it decide which version to keep? I assume it just keeps the newest one and removes the older one at that point, but wasn’t sure.

When a backup spills over into the next time frame, the algorithm checks if the difference between that backup and the next older backup is at least as much the specified interval. If not, then the just spilled over backup gets deleted.

In my tests it didn’t work well to always keep the newer one of the two backups. If there is demand I can draw another example that explains why.

I added my example that I already posted in the Github Issue here as well:

Edit (further explanation):
In this example the backup is run every day, so there is one backup per day. The configured time frames and intervalls are at the top in the coloured areas.

Looking at the line that starts with “S”: There are 7 days worth of daily backups (green area). On the next day (line below starting with the new backup “T”) the backup “M” spills over into the bucket which keeps backups only every second day (yellow area). Since it is more than two days away from “K”, “M” is kept.
Again a day later the backup “U” gets created while backup “N” spills over into the yellow bucket. But since it is less than two days away from “M”, “N” gets deleted.

drakar2007 · October 16, 2017, 7:28pm

That makes sense now (and ironically i just got to the part in the Git thread where you posted this graphic too, lol) - thanks for the further explanation

JonMikelV · October 16, 2017, 8:31pm

I think this was touched on at GitHub so I apologize if this has already been covered, but when you say “the backup gets deleted” do you mean the backup for that day (let’s say “N”) or just revisions inside “N” (and of course “N” if it ends up empty)?

In either case, if --keep-time is unlimited, are you not deleting the most recent “orphaned” revision (last version of a file backed up before the source was deleted)?

Tekki · October 16, 2017, 9:00pm

when you say “the backup gets deleted” do you mean the backup for that day (let’s say “N”) or just revisions inside “N” (and of course “N” if it ends up empty)?

Yeah … I mean the whole backup gets deleted.
While the idea of deleting individual file versions from within a backup sounds great for the future, it was a too big change for me to introduce. See the second half of this comment: Issue 2084 (comment)

JonMikelV · October 16, 2017, 9:50pm

Thanks for the confirmation - I wasn’t sure if that (the link) process was what ended up getting implemented.

You might want to include a comment in the final description along the lines of:

Note that as a side effect of taking precedence (within it’s time-frame) over --keep-time it is possible for source-deleted files to be completely removed from the backup.

lseg · October 17, 2017, 8:07pm

Hi Tekki,

this option really sounds great. Any idea when this will be in the stable version?

I was not able to find it. But I am still a newbie (trying to move from crashplan…)

Kind regards

JonMikelV · October 17, 2017, 8:19pm

As a volunteer open-source project, giving time estimates is very difficult since we have “day jobs” to keep us fed () however we are working to finalize the the features for the next stable release and I believe this (and hopefully these #planned tagged items are to be included when it is released.

Athep · October 19, 2017, 9:54am

I have a rather big backup, 275 GB with 500 versions.
It is extremely slow listing files. It takes several hours if something should be restored.
Getting the initial list takes about 40 minutes. Browsing the next folder takes equal time …

Is it possible to apply retention policy to this backup in order to reduce the versions and then run compact to make it process faster?

kees-z · October 19, 2017, 10:19am

This is a known problem. Fixing it will require a major change to the internal database structure and a rewrite of many parts of the source code.
Unfortunally there is no fix or workaround for this issue at the moment. Fixing this problem is on the To-Do list, but a release date is not known.
See this discussion for more info:

mr-flibble · October 19, 2017, 12:31pm

Hello, workaround that worked for me is using duplicati.commandline list and duplicati.commandline restore comamnds.
You can find help here duplicati/help.txt at master · duplicati/duplicati · GitHub