Maximum backup versions?

If I choose unlimited retention and back up once per hour, is this going to cause any problems?

I understand back end storage requirements will continually increase, but besides that are there any problems I need to think about? Running it this way will create ~8760 backup versions per year.

Are you’re files changing that consistently? If nothing changes it won’t actually create a new version (I found this out a little while back).

Curious to hear an official answer, I’m sure there is a limit on feasible numbers of versions, but I bet it’s quite a bit higher than 9K

There is a parameter (--upload-unchanged-backups=true) to force a backup to be recorded even if no files changed. Other than the mentioned (eventual) space issues the most likely side effects of LOTS of backups (regardless of frequency) would include:

  • longer backup runs due to more time necessary for block hash lookups in the sqlite database
  • shower restore UI performance due to sqlite lookups on more records and browser handling of very long select lists

There are probably more effects but I can’t think of any FAILURE scenarios (unless you’re talking hundreds of millions of backups) and expect the items above are the most noticeable.

Note that there are one or two people working on historical thinning so the frequency of kept backups can decrease over time (hourly for the last week’s backups, daily for the next set out to a month, weekly to the next quarter, etc.) None of them are finished yet, though.

1 Like

The limit I supposed you would hit first is the size of the SQLite database, and that can be really huge. But it will probably be too slow to use, long before you hit that.

That is actually sneaked into 2.0.2.10 as the advanced option --retention-policy

2 Likes

Nice!

BTW i noticed a few typos (screenshot)

Use this option to reduce the number of versions that are kept with increasing version age by deleting most of the old backups. The expected format is a comma seperated separated list of collon colon sperated separated time frame and interval pairs. For example the value “7D:0s,3M:1D,10Y:2M” means "For 7 day keep all backups, for 3 months keep one backup per day and for 10 years one backup every 2nd month " [note missing end quote here]

But my main question – Is there any further instruction for how to use this? I.e. what the acceptable parameters are, how many are allowed (unlimited? just 3? unsure), etc? I like this feature and it seems like it will give a perfect amount of flexibility, hopefully some GUI can be built around it soon but text-based is ok with me for now as long as I understand what I can and can’t do with it :wink:

Also: when using Retention Policy, does it simply override the old “Keep this number of backups” setting, or do I need to set it to “unlimited” (or rather a value further out than the longest timespan in the Retention Policy setting)?

1 Like

Cool! Does it work well? This should help prevent me from having gazillions of recovery points.

Chiming in here, since this is my code (and my spelling mistakes *cough*):

Is there any further instruction for how to use this

Sadly not. I thought keeping the description somewhat brief might be good, as I was afraid it might scare of users when they see a huge wall of text ^^
I tried to explain the feature and how it works in a comment in the corresponding ticket Issue 2084 (comment)

I think the most important information are:

  • the letters for seconds, minutes, hours, etc are the same as in the keep-time option. So from smallest to biggest: ‘s’, ‘m’, ‘h’, ‘D’, ‘W’, ‘M’, ‘Y’
  • As explained in the example in the option description: The interval of 0 (no matter if seconds, minutes, etc) basicly means “keep all versions in that time frame” since the distance between two backups will always be bigger than that, thus never deleting anything
  • The time frames do not stack but rather overlap, with the smaller time frames taking priority:
    So if you configured 1 week with keeping all backups and 2 weeks with keeping only daily backups, then it will effectivly result in all backups being kept for the first week (as configured) and after that 1 (sic!) week with daily backups (2 weeks minus the one overlap)
  • As kenkendk also metioned in his review of my pull request: The option might cause some confusing in regards to the already existing keep-time option.
    The way I built it, this feature will never touch any backups that are older than what you configured. So if your biggest time frame spans 2 years, then everything after that will be ignored. If you ultimately want to remove all backups older than a certain age, you ALSO have to add the keep-time option.
    I’d say it’s still open to debate what makes the most sense here.

what the acceptable parameters are, how many are allowed (unlimited? just 3? unsure)

There is no specific limit to the amount of parameters. I’m using it with 4 time frames: 7D:0s,1M:1D,6M:1W,10Y:1M plus the keep-time option with 10Y

hopefully some GUI can be built around it soon but text-based is ok with me for now as long as I understand what I can and can’t do with it

Yeah that’d be nice indeed. There were some people already asking if they can help building the GUI for it, since I have no experience with that in C# yet.
I’ll update the ticket with the information that this feature made it into Duplicati, so others might be encouraged to work on it

7 Likes

Thank you for the clarifications!

Is the retention policy feature ready for prime-time? Or does it need more testing?

Is there a way to run a “what if” analysis with certain retention policy to see what snapshots would be deleted and what would be retained?

Sweet, thanks, that all makes sense.

I suppose a good way to word it might be, just to point out that keep-time will take precedence so to make sure it’s set longer than the longest timespan in Retention Policy (if desired, i guess).

Also just for clarification, when a subsequent timeframe is reached, how does it decide which version to keep? I assume it just keeps the newest one and removes the older one at that point, but wasn’t sure.

Is the retention policy feature ready for prime-time? Or does it need more testing?

Well … the feature only in the Canary Build so … see it as an experimental feature for now :wink:
That said, I’m using it for my regular backups for a few weeks now and so far didn’t have any problems.

Is there a way to run a “what if” analysis with certain retention policy to see what snapshots would be deleted and what would be retained?

I have added extensive logging of which backups will be kept and which will be deleted. The log entries all begin with [Retention Policy]. Most of them require the Profiling Log Level though.
So you should be able to use the already existent dry-run option in Duplicati to run it via the command line without making any actual changes and then check the log to see what gets deleted and why.

2 Likes

when a subsequent timeframe is reached, how does it decide which version to keep? I assume it just keeps the newest one and removes the older one at that point, but wasn’t sure.

When a backup spills over into the next time frame, the algorithm checks if the difference between that backup and the next older backup is at least as much the specified interval. If not, then the just spilled over backup gets deleted.

In my tests it didn’t work well to always keep the newer one of the two backups. If there is demand I can draw another example that explains why.

I added my example that I already posted in the Github Issue here as well:

Edit (further explanation):
In this example the backup is run every day, so there is one backup per day. The configured time frames and intervalls are at the top in the coloured areas.

Looking at the line that starts with “S”: There are 7 days worth of daily backups (green area). On the next day (line below starting with the new backup “T”) the backup “M” spills over into the bucket which keeps backups only every second day (yellow area). Since it is more than two days away from “K”, “M” is kept.
Again a day later the backup “U” gets created while backup “N” spills over into the yellow bucket. But since it is less than two days away from “M”, “N” gets deleted.

2 Likes

That makes sense now (and ironically i just got to the part in the Git thread where you posted this graphic too, lol) - thanks for the further explanation :slight_smile:

I think this was touched on at GitHub so I apologize if this has already been covered, but when you say “the backup gets deleted” do you mean the backup for that day (let’s say “N”) or just revisions inside “N” (and of course “N” if it ends up empty)?

In either case, if --keep-time is unlimited, are you not deleting the most recent “orphaned” revision (last version of a file backed up before the source was deleted)?

1 Like

when you say “the backup gets deleted” do you mean the backup for that day (let’s say “N”) or just revisions inside “N” (and of course “N” if it ends up empty)?

Yeah … I mean the whole backup gets deleted.
While the idea of deleting individual file versions from within a backup sounds great for the future, it was a too big change for me to introduce. See the second half of this comment: Issue 2084 (comment)

Thanks for the confirmation - I wasn’t sure if that (the link) process was what ended up getting implemented.

You might want to include a comment in the final description along the lines of:

Note that as a side effect of taking precedence (within it’s time-frame) over --keep-time it is possible for source-deleted files to be completely removed from the backup.

1 Like

Hi Tekki,

this option really sounds great. Any idea when this will be in the stable version?

I was not able to find it. But I am still a newbie (trying to move from crashplan…)

Kind regards

As a volunteer open-source project, giving time estimates is very difficult since we have “day jobs” to keep us fed (:taco:) however we are working to finalize the the features for the next stable release and I believe this (and hopefully these #planned tagged items are to be included when it is released.

I have a rather big backup, 275 GB with 500 versions.
It is extremely slow listing files. It takes several hours if something should be restored.
Getting the initial list takes about 40 minutes. Browsing the next folder takes equal time …

Is it possible to apply retention policy to this backup in order to reduce the versions and then run compact to make it process faster?

This is a known problem. Fixing it will require a major change to the internal database structure and a rewrite of many parts of the source code.
Unfortunally there is no fix or workaround for this issue at the moment. Fixing this problem is on the To-Do list, but a release date is not known.
See this discussion for more info:

Hello, workaround that worked for me is using duplicati.commandline list and duplicati.commandline restore comamnds.
You can find help here duplicati/help.txt at master · duplicati/duplicati · GitHub