Why did smart retention delete this backup set?

Version: 2.0.6.3_beta_2021-06-17
OS: Windows 10
Backup Size: 26GB
Backend: Backblaze B2
Retention Policy: 1W:0s , 4W:1W , 12M:1M , 24M:3M , 36M:6M , U:1Y
Backup Policy: Once per day at 9PM (but this can be up to +12 hours if the computer is sleeping)

I installed Duplicati and did my first backup on August 27th and it’s been working fine. I have restored a few small files since then. From time to time I check the Restore window to see how the pruning process works and see which backup sets are available for restore.

Yesterday, a Compact happened, which completed fine and I checked the Restore window to see if it had any effect on available backups. To my surprise, the September 27th backup set had disappeared, which I would have expected to be retained based on the 12M:1M rule.

When I researched further, it turned out that the Sept 27th backup set wasn’t deleted by the Compact, but rather 2 days before during the Oct 25th backup.

I searched all the forum threads and github open issues I could find, but none of them seemed to explain why the September 27th backup set was deleted. I typed the exact dates and times into Excel and calculated the timespans and none exceeds 31 days (which is Duplicati’s definition of a month):

[10/25/2021 9:00:38PM] - [9/27/2021 1:13:49 AM] = 28.82417824 days
[9/27/2021 1:13:49 AM] - [8/27/2021 8:39:12 AM] = 30.69070602 days
[10/25/2021 9:00:38PM] - [8/27/2021 8:39:12 AM] = 59.51488426 days

Can anyone please explain why the September 27th backup set was deleted (see logs below) and how I can prevent this in the future? Was it because the 4W:1W rule (aka 28D:7D) clashes with the 12M:1M (aka 365D:31D) rule because the weekly rule is 28 days and the monthly rule is 31 days? Should I change the 4W:1W rule to 1M:1W instead? If so, I would highly recommend that the default Smart Retention string be changed to 1M:1W, because I based my custom retention policy on the 4W:1W from the default Smart Retention string (I assumed that it had been tested and debugged).

Also, 4 * 4W (28D) does not equal 1M (31D), 3 * 1M (93D) does not equal 3M (90D), 2 * 3M (180D) does not equal 6M (181D), 2 * 6M (362D) does not equal 1Y (365D). It seems like this will cause a ton of problems for backup set retention due to the different intervals. If Duplicati is going to stick with an interval-based instead of a calendar-based system, then it appears that everything should be in days and they should all be modulo of each other (i.e. 1W = 7D, 1M = 28D, 3M = 84D, 6M = 168D, 1Y = 336D, etc. or alternatively, 1W = 8D, 1M = 32D, 3M = 96D, 6M = 192D, 1Y = 384D, etc.)

Also, how do I adjust my custom retention policy to allow for the up to 12 hours lateness between the daily 9PM start time (because the computer might be sleeping at 9PM)? Should I add an extra day onto each interval like this: 1W:0s , 1M:8D , 12M:32D , 24M:94D , 36M:187D , U:366D. Or is it impossible under the current system? If that is the case, then I would vote for adding an Advanced Option that allows one to specify a custom Time Tolerance.

Please add my vote for calendar-based Smart Retention instead of the current interval-based system. I realize that interval-based is easier to implement, but it has some nasty unintuitive surprises for the unwary. Fortunately I have not lost any data, but my intention to keep monthly backups has not been honored by the software.

I did not have a log file at the time of the deletion (I have since turned on Information), but here are the relevant log entries from the Oct 25th backup where the Sept 27th backup set was deleted:

“2021-10-25 21:00:38 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-FramesAndIntervals]: Time frames and intervals pairs: 7.00:00:00 / Keep all, 28.00:00:00 / 7.00:00:00, 365.00:00:00 / 31.00:00:00, 730.00:00:00 / 90.00:00:00, 1095.00:00:00 / 181.00:00:00, Unlimited / 365.00:00:00”,

2021-10-25 21:00:38 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupList]: Backups to consider: 10/24/2021 9:00:00 PM, 10/23/2021 9:00:02 PM, 10/22/2021 9:12:15 PM, 10/21/2021 9:00:00 PM, 10/20/2021 9:00:00 PM, 10/19/2021 9:00:00 PM, 10/18/2021 9:00:00 PM, 10/12/2021 9:00:00 PM, 10/5/2021 4:56:39 AM, 9/27/2021 1:13:49 AM, 8/27/2021 8:39:12 AM”,

"2021-10-25 21:00:38 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsToDelete]: Backups outside of all time frames and thus getting deleted: ",

“2021-10-25 21:00:38 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-AllBackupsToDelete]: All backups to delete: 10/18/2021 9:00:00 PM, 9/27/2021 1:13:49 AM”,

1 Like

I believe I have figured it out. On October 25th, the September 27th backup aged out of the 4W:1W (28 day) bracket. Duplicati then processed the nearest 12M:1M (1 month / 31 day bracket) - That would have been from 9/24/2021-10/25/2021 [Edit: This is wrong. See post below]. In that bucket, only 1 backup need be present, and there were a ton (10/5, 10/12. 10/18, etc.). So the September 27th backup was nuked because there were already other backups in the 9/24/2021-10/25/2021 bracket. In the 1 month (31 day) bucket before that (8/24/2021 - 9/24/2021), there was the original 8/27/2021 backup.

It is confusing, because I thought the pruning algorithm kept the oldest backup in a bucket. So I am still not entirely clear why Sept 27th got deleted. But the fact that it was 28.82417824 days old on 10/25/2021 is pretty strong evidence that aging out of the 4W:1W (28 day) rule killed it.

Looking forward into the future, backup sets will get deleted until there is only one backup set in the 9/27/2021-10/28/2021 bucket, presumably 10/26/2021. That means there will be no backup sets from 8/27/2021 until 10/26/2021 (60 days), despite the 12M:1M directive stating that I want to keep 1 backup per month for a year.

This pruning algorithm is… dumb. I have read the forum threads about its genesis (users were clamoring for generational pruning, someone hacked this up, and the thought was “better something than nothing”). It is a pity because with a bit more thought and effort, it could have been a really great algorithm.

Starting from the current backup and working backward in time is what condemned it to mediocrity. It should have started from the immutable first backup and worked forward in time to see which backups needed to be retained. As a huge bonus, it would have completely eliminated the “Backup Start Time Jitter” problem. An example:

Retention Policy: 1Y:1M (one backup / month per year) , 1M:1W (one backup / week per month)

Initial Backup: 1/1/2021 (Friday) = 44197 in Microsoft OLE Automation datetime notation [Fractional Days since 1/1/1900]

Weeks are always 7 days in every corner of the world. Therefore the second retained backup would be the very first one after:
44197 + 7 = 44204 (Friday 1/8/2021) - Very quick to do with ([Current OLE Date] - [Original OLE Backup Date]) Modulo 7 == 0

Notice how this algorithm doesn’t care about backup start time jitter (it just chooses the very first backup after the threshold fractional date / time). The next few retained backups are the very first ones after:

44197 + 14 = 44218 (Friday 1/15/2021)
44197 + 21 = 44218 (Friday 1/22/2021)
44197 + 28 = 44225 (Friday 1/29/2021)

Now we get to the one month per year backup. The simplest and most intuitive way (for the user) to interpret 1M is to retain backups on the same day of month that the original backup started on (i.e. since our original hypothetical backup started on 1/1/2021, then 2/1/2021, 3/1/2021, 4/1/2021, etc.). Of course this presents a small problem when the initial backup is on the 29th, 30th or 31st day of the month, but those corner cases are relatively easily handled with program logic (presumably, do the same thing that banks do and move those dates to the nearest legal date [i.e. 28th, 29th, 30th or 1st] when necessary). Therefore the next retained backup would be:

2/1/2021

then

2/8/2021
2/15/2021
2/22/2021

3/1/2021 (2/29/2021 is illegal, so pushed to 3/1/2021. Alternatively, could be pushed back to 2/28/2021)
3/8/2021
3/15/2021
3/22/2021
3/29/2021

4/1/2021

As time goes on, the 1/8/2021, 1/15/2021, 1/22/2021, 1/29/2021, etc. backups are pruned, leaving only the 2/1/2021, 3/1/2021, etc. backups.

Similarly, for yearly retained backups, we take the same month/day as the initial backup (i.e. 1/1/2022, 1/1/2023, etc.) In this case, we don’t have to worry about the 28th/29th/30th problem, only the single corner case of a backup started on February 29th in a leap year (again handled similarly to the 28th/29th/30th problem).

Unfortunately, I have no free time for the foreseeable future, otherwise I’d code up a Winforms backup pruning simulator to see if this proposed solution works. Maybe this post will give someone else ideas to go on.

For myself, I’ve changed my retention policy to:

1W:0s , 4W:1W , 48W:4W , 96W:12W , 144W:24W , U:48W

as a slight improvement to avoid backups “falling through the cracks” between 4W / 1M / 3M / 6M / 1Y.

1 Like

Nice to see someone making complete analysis about that. I just thought about the clashing options when I configure retention periods and for the exact reasons you’ve mentioned, I decided to do everything in days and with regular intervals in multiples of days specified. Because I did assume to see exactly the kind of problems you’ve mentioned.

  • Thank you for your analysis and posts, it just confirmed that my expectations were correct.

Welcome to the forum @dt2021

It’s harder to say with only partial logs and no Excel to help with date math. Better logs would look like this:

2021-10-29 07:55:36 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-StartCheck]: Start checking if backups can be removed
2021-10-29 07:55:36 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-FramesAndIntervals]: Time frames and intervals pairs: 1.00:00:00 / Keep all, 7.00:00:00 / 1.00:00:00, 28.00:00:00 / 7.00:00:00, 365.00:00:00 / 31.00:00:00
2021-10-29 07:55:36 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupList]: Backups to consider: 10/29/2021 7:19:47 AM, 10/28/2021 9:50:00 PM, 10/28/2021 8:50:00 PM, 10/28/2021 7:50:01 PM, 10/28/2021 6:50:13 PM, 10/28/2021 5:50:02 PM, 10/28/2021 4:50:33 PM, 10/28/2021 3:50:28 PM, 10/28/2021 2:50:04 PM, 10/28/2021 1:50:04 PM, 10/28/2021 12:50:06 PM, 10/28/2021 11:50:02 AM, 10/28/2021 10:50:01 AM, 10/28/2021 9:50:04 AM, 10/28/2021 8:50:03 AM, 10/28/2021 7:50:05 AM, 10/27/2021 11:51:23 AM, 10/26/2021 10:50:11 AM, 10/25/2021 9:51:01 AM, 10/24/2021 9:50:00 AM, 10/23/2021 8:51:46 AM, 10/22/2021 8:50:09 AM, 10/21/2021 8:50:01 AM, 10/13/2021 8:50:35 PM, 10/6/2021 5:50:03 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-NextTimeAndFrame]: Next time frame and interval pair: 1.00:00:00 / Keep all
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsInFrame]: Backups in this time frame: 10/28/2021 8:50:03 AM, 10/28/2021 9:50:04 AM, 10/28/2021 10:50:01 AM, 10/28/2021 11:50:02 AM, 10/28/2021 12:50:06 PM, 10/28/2021 1:50:04 PM, 10/28/2021 2:50:04 PM, 10/28/2021 3:50:28 PM, 10/28/2021 4:50:33 PM, 10/28/2021 5:50:02 PM, 10/28/2021 6:50:13 PM, 10/28/2021 7:50:01 PM, 10/28/2021 8:50:00 PM, 10/28/2021 9:50:00 PM, 10/29/2021 7:19:47 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 8:50:03 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 9:50:04 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 10:50:01 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 11:50:02 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 12:50:06 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 1:50:04 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 2:50:04 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 3:50:28 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 4:50:33 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 5:50:02 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 6:50:13 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 7:50:01 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 8:50:00 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/28/2021 9:50:00 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/29/2021 7:19:47 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-NextTimeAndFrame]: Next time frame and interval pair: 7.00:00:00 / 1.00:00:00
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsInFrame]: Backups in this time frame: 10/22/2021 8:50:09 AM, 10/23/2021 8:51:46 AM, 10/24/2021 9:50:00 AM, 10/25/2021 9:51:01 AM, 10/26/2021 10:50:11 AM, 10/27/2021 11:51:23 AM, 10/28/2021 7:50:05 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/22/2021 8:50:09 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/23/2021 8:51:46 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/24/2021 9:50:00 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/25/2021 9:51:01 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/26/2021 10:50:11 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/27/2021 11:51:23 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-DeletingBackups]: Deleting backup: 10/28/2021 7:50:05 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-NextTimeAndFrame]: Next time frame and interval pair: 28.00:00:00 / 7.00:00:00
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsInFrame]: Backups in this time frame: 10/6/2021 5:50:03 PM, 10/13/2021 8:50:35 PM, 10/21/2021 8:50:01 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/6/2021 5:50:03 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/13/2021 8:50:35 PM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping  backup: 10/21/2021 8:50:01 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-NextTimeAndFrame]: Next time frame and interval pair: 365.00:00:00 / 31.00:00:00
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsInFrame]: Backups in this time frame: 
2021-10-29 07:55:36 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsToDelete]: Backups outside of all time frames and thus getting deleted: 
2021-10-29 07:55:36 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-AllBackupsToDelete]: All backups to delete: 10/28/2021 7:50:05 AM
2021-10-29 07:55:36 -04 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteRemoteFileset]: Deleting 1 remote fileset(s) ...

That’s from huge Profiling log. If you want to look at retention in great detail, use log-file-log-filter, however I’m not sure how that interacts with log-file-log-level if you still want a general log. Let us know, if you try it.

My impression as well. Because I think both prior posters here read code to some extent, here’s the code.
Assuming it checks in backup order (oldest to newest) within a time frame, the 365.00:00:00 / 31.00:00:00 setting should delete a newer backup that’s less than 31 days from the backup that was taken before that.

9/27/2021 1:13:49 AM, 8/27/2021 8:39:12 AM crosses the end of 31-day month August, so 9/27 is roughly 31 days away, but backup was earlier in the day so it’s actually 30 days and some fractional day, meaning it’s too close to the backup before it, so gets deleted when it leaves the safety of 28.00:00:00 / 7.00:00:00.

In case it’s not clear from the log time display form and the code, the comparison doesn’t go by days, and sometimes it’s helpful to get to at least hours (lowercase h), so that one avoids situations where a daily is slightly late, and chops off the next daily. For people who dislike that, they can set minimum interval to 23h.

The backup start time jitter problem is aggravated by typical backup intervals lining up with typical retention intervals. Get slightly off, and a delete happens. It can happen on hourlies too, but lowercase m is minutes.

Thanks for the detail, but it was a little TL;DR, especially since there’s a lingering question of date ordering.
As evidence that it does times older to newer, look at above log and note which backup it decided to delete:

2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-KeepBackups]: Keeping backup: 10/27/2021 11:51:23 AM
2021-10-29 07:55:36 -04 - [Profiling-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-DeletingBackups]: Deleting backup: 10/28/2021 7:50:05 AM

This is everyone’s problem. There are vastly more things people want than people willing and able to help.

GUI smart and custom backup retention aren’t covered #83 is a documentation issue to at least explain it. There are also pointers to some of the original work you might have missed if you didn’t look over GitHub. Perhaps either a pull request or a collaboration between a code expert and the manual author would work. This wouldn’t likely be me, because I have more than a full load trying to keep up with forum. Any helpers?

Duplicati is a community effort, and all areas (docs, forum, test, code, etc.) would really benefit from help.

1 Like

I have completely figured it out (Thank you for the link to the relevant code @ts678).

This is exactly what happened. My mistake was thinking that the pruning “intervals” were also calculated backwards in time from the current backup time (like the pruning “buckets” are). They are not. The pruning “intervals” are calculated forwards in time from the oldest backup in the pruning “bucket”. To clarify my terminology: For 12M:1M, the 12M is the pruning “bucket” and the 1M is the pruning “interval”.

Putting the backup dates into buckets:

— Current (Backup) Time — 10/25/2021 9:00 PM

10/24/2021 9:00:00 PM
10/23/2021 9:00:02 PM
10/22/2021 9:12:15 PM
10/21/2021 9:00:00 PM
10/20/2021 9:00:00 PM
10/19/2021 9:00:00 PM

— End of 1W (1 week) bucket - Keep all — 10/18/2021 9:00 PM

10/18/2021 9:00:00 PM - Deleted
10/12/2021 9:00:00 PM
10/5/2021 4:56:39 AM

— End of 4W (4 week) bucket - Keep weekly — 9/27/2021 9:00 PM

9/27/2021 1:13:49 AM - Deleted
8/27/2021 8:39:12 AM

— End of 12M (12 month) bucket - Keep monthly — 10/25/2020 9:00 PM

The algorithm pruned the 12M bucket starting at the oldest backup in that bucket (8/27/2021) and added 31 days to it to create the 1M interval for purging. Sadly, the 9/27/2021 backup was slightly within that interval (30.69 days) and thus was preferentially deleted over the older 8/27/2021 backup.

As I suspected, there will basically be 60 days between the first and second backups retained under a 12M:1M system with the current algorithm.

I turned on Profiling, but it was too much - 134MB of logs created for a 26GB backup. Now that I understand the algorithm, I don’t need that anymore, so I changed it to Information. If Information is still too big, I will use the use the log-file-log-filter - Good tip, thanks again @ts678.

The pruning algorithm is very small (less than 100 lines of code, including comments), so maybe I will take a stab at a better replacement sometime. The first thing I would create is a GUI Winforms backup pruning simulator to allow beta testers to create simulated backup runs with an adjustable retention policy, adjustable backup start time jitter, adjustable buckets between backups, and an adjustable endpoint, and then it shows the final result in terms of retained backups at that endpoint. If after testing, the pruning algorithm seems solid, then a pull request would be created.

2 Likes

Much appreciated, and FYI

Backup not showing in Restore has my thoughts on what to do. One part was the manual, which asks for PRs, and it might progress faster if you feel comfortable trying to explain the scheme to non-expert users.

My posts there also dip into why any sort of true-calendar-based scheme might be lots of work to achieve.

Definitely avoids questions like “what’s a month” or "if I use up my slot for this week, when is my next one?

1 Like

I’m not particularly interested in updating the manual because I think the pruning algorithm is very bad and should be replaced ASAP. I have edited my above post for more clarity, so maybe that will help people searching the forum for information about retention policy. You correctly identified the problem before me, but I believe my post is a bit clearer, so I marked it as the solution. No slight intended.

As one example of how dangerous it is, imagine going on vacation for 2 weeks with no backups during that time because the computer was off. You come home, start the next backup and it immediately deletes a bunch of backup sets (new and old) because it calculates pruning buckets from the current date/time. Then you start doing regular backups again, and your future vacations delete a bunch more in coming months/years leaving huge holes in your generational backups because your 2 week vacations nuked a bunch of backups that would have “graduated” into future buckets if they hadn’t been prematurely killed.

There are serious and fundamental problems with any algorithm that uses intervals and the current backup date/time - Unnecessarily deleting backups because of a few seconds jitter, unnecessarily deleting backups because backup days and retention periods are similar, unnecessarily deleting backups due to non-equivalence of weeks, months and years, and finally, it is unintuitive and hard to explain to end-users.

I don’t believe a calendar system is as complicated as you assume it would be. At the very worst, I think it would only require restricting the retention policy to “whole” intervals - In other words, you can’t do 4Y3M2D11m32s : 21D3m23s - It would have to be like 1W:0s , 1M:1W , 3M:1M , 6M:2M , 1Y:3M , 2Y:6M , U:1Y. But I don’t think this would be a huge price to pay for greatly increased intuitiveness and lack of minefields. And I’m not even certain that would be necessary. It may be able to work with “weird” intervals - Unknown until someone tries to code it.

Like I said, I believe the key is to orientate all dates relative to the very first backup, and to forget about “calendar” weeks, months, years, etc. Instead use days, weeks, months and years as intervals of time (somewhat similar to the current system, but without the pitfall of basing things off the current backup date/time). This would produce what I think it is that most people want, which are thinned, regularly-spaced backups (i.e. after 10 years, I have retained backups on 3/27/2021, 3/27/2022, 3/27/2023, etc.)

Thanks for your help.

Vacation is a hole, but there’s no getting around that if the computer was off. The hole created before the vacation seems like it would be the same hole created without a vacation (could use a simulator now…) however the impact of the hole is higher because fine-grained (e.g. daily) backups are quickly thinned out meaning a strategy to keep dailies for past workweek gets defeated by first post-vacation backup deletes.

If the lack of workday review time is the concern, yes, it is short-term worse due to placement of the hole.

Imagine having a 1W:1D,1Y:1W retention policy. As soon as a backup ages past 1W it’s subject to delete based on the backups that came before it, and this affects all pre-vacation backups and would also affect vacation backups if the computer was left on. Do you see any case where long-term result differs hugely?

Examples with oldest backups on the left, using above retention policy. B=Backup D=Deleted V=Vacation

Without a vacation:

B D D D D D D B D D D D D D B D D D D D D B D D D D D D B D D B B B B B B B

Vacation and return. To the left of the Vs you might like a few more Bs and a few less Ds I guess:

B D D D D D D B D D D D D D B D D D D D D B D V V V V V V V V V V V V V V B

I don’t follow that phrasing. Backups age from shorter time frame buckets into longer time frame buckets where the interval is usually longer (more spacing), meaning the kill rate to thin out the backups is higher.

This is what you hit. Thinning happened when a backup aged into greater-time-frame-and-interval bucket.

I’ll agree with the next paragraph that jitter, similarity between backup intervals and retention intervals, and difficulty of explaining this are problems. I’m not clear on proposal details, but look forward to how it works.

Thanks for working on a better solution.

1 Like

Simple example: 1W:0s , 1M:1W , 1Y:1M

User starts backup on January 1st.
Does 30 days of backups.
Accidentally disables Duplicati service for 32 days.
Realizes it and re-enables Duplicati service.
What happens next?

All backups except the original January 1st backup killed.
No backups except the original and the one done after the service re-enable available for graduation to the 1Y bucket (62 days apart). Rather than keeping the first backup after service re-enable and the last backup before service disable which are 32 days apart, Duplicati instead only keeps backups 62 days apart.

I assume you can see how this could be quite shocking to the end-user expecting to have a month’s worth of backups available based on their retention policy.

I don’t know whether you would call that difference ‘huge’. But it’s the mass unexpected deletion of backups that would be much more unnerving, because you would lose all history going back to the original backup.

My proposed solution would not seem to do any better, at least in its default form, although I can think of some ways to make it better (use the temporal distance from the center of the interval to decide which backups to retain, for example). Alternatively, it could interpret “1M:1W” to mean “I want to keep weekly backups for a month from the date of the last backup”. This would prevent mass deletion of history upon returning from vacation, although maybe only for the one backup session after returning.

Or perhaps it could interpret “1M:1W” as meaning “Four backup sets are reserved for this interval. Try to space them one week apart, but if there are big gaps that are more than 1 week long, then just consider the ‘greater than 1 week gap’ to just be 1 week.” This would spare the weekly backups from immediate deletion upon reactivation. I’d have to think more about corner cases.

1 Like

For someone in the future who is trying to understand the Retention Policy and why certain backups were deleted, I find the following settings to be good:

--log-file=[Path to log file]
--log-file-log-filter=+*RetentionPolicy*
--log-file-log-level=Verbose
--log-retention=30D

Verbose logs are reasonable size (171KB in my case), and the log-file-log-filter +*RetentionPolicy* adds the Retention Policy decisions to the log without having to enable the gigantic Profiling log level.

1 Like

I know it already solved, but I need to add some detail about what happen and some suggestion. I think the problem is because the interval is calculated with prior backup in the same timeframe. If the interval is calculated with the first backup in the timeframe, I think the problem should be solve.

The problem seems from this code “(fileset.Time - lastKept.Time) >= singleRetentionPolicyOptionValue.Interval”. I suggest to change this to compare it with rolling interval which probably like “(fileset.Time - fistKept.Time) >= rollingInterval” and only assign lastKept if not already assigned. So, this will make block of code that calculate last kept becoming:

if (lastKept == null || singleRetentionPolicyOptionValue.IsKeepAllVersions() || (fileset.Time - lastKept.Time) >= rollingInterval)
{
	Logging.Log.WriteProfilingMessage(LOGTAG_RETENTION, "KeepBackups", $"Keeping {(isFullBackup ? "" : "partial")} backup: {fileset.Time}", Logging.LogMessageType.Profiling);
	if (lastKept == null)
	{
		lastKept = fileset;
	}
	// calculate rollingInterval, basicly multiply singleRetentionPolicyOptionValue.Interval until it get greater than (fileset.Time - lastKept.Time)
}

Too bad, I cannot guess how calculate rolling interval code look like.