New retention policy deletes old backups in a smart way

Well, you can want retention thinning and unlimited age :slight_smile:

Although at 99 years I think itā€™s close to pointless.

By the way, does anyone know if 99 is max?

Exactly. I want to be able to go back to previous versions, but after a few weeks I donā€™t need EVERY version.

On the flip side I have family members who delete stuff they only use every few years (think 5 year reunions), so donā€™t realize itā€™s missing for quite some time.

The retention policy flexibility lets me handle both scenarios with a single backup config. Yay!

Would it make sense to introduce a specific keyword like 1W:1D,ELSE:1W ?

Can anyone confirm whether the new retention policy functionality now handles the case of accidentally-deleted single files? I believe the first version would not retain the last backup of an individual file post deletion if it wasnā€™t in the ā€œrightā€ backup set - is it any different now?

How about 1W:1D,U:1W? With U being unlimited?

2 Likes

Or just 1W:1D,200Y:1Wā€¦

I know itā€™s practically the same, but I think itā€™s still a valid point to have a semantically logical expression.

If you just want to keep them forever itā€™s weird that you have to make up some arbitrary number. I mean, I literally donā€™t care and the expression still made me pick an arbitrary number, that number being 10 years. And more paranoid people, like @JonMikelV, will tell you they picked 100 - but secretly they picked 9999 just to be safe :wink:

3 Likes

Thatā€™s not how it works, at least, I hope so.
Backup retention is not about file versions, it thins out backup versions (or restore points, whatever you want to name it).

Every backup version is a representation of everything included in your source selection. If you delete a file and start a backup job, that file will not be part of the most recent backup, but it is still part of all other backup versions (provided that the oldest backup is more recent than the file creation date).

When a file exists on your system, it will be part of the most recent backup, You can restore it from every backup version of the last year. Only backup versions that are older than a year are deleted. So if you modified a file more than a year ago and didnā€™t modify it since than, only the current version is in your backup.

If you accidentally deleted or corrupted thast file, you can successfully restore it from any of the backup versions.

That would be great!
The U value could also be used to replace --keep-versions.

  • --keep-versions=30 could be specified as --retention-policy=U:30.
    --retention-policy=1Y:1W;U:1M:30 would mean: keep 1 weekly backup for the last year, from then 1 monthly backup, but never delete backups if there are less than 30.

Discussed at Github:

Somebody has his backup job set to run every day and has set --retention-policy=1M:1D. Usually that would mean 30 backups are kept, if the PC is turned on all the time and created a new backup daily.
Now that person goes on holidays for a month. Upon return Duplicati will then delete all backups except the newest one, since they are all older than a month.

Not seen as a problem in that discussion, but I guess for most people itā€™s unintended behaviour in this scenario. Adding a minimum number of backup versions would resolve that.

I hadnā€™t thought about U:number, only U:date. I guess it reads as ā€œunlimited keep 30ā€, but I initially misread it as ā€œkeep one backup for every 30 days foreverā€ because I missed that there was no letter.
Iā€™m not sure if itā€™s more user friendly or if it would lead to confusion. I think in that Github thread we ended up deciding that --keep-versions should not become legacy exactly because itā€™s so much more easy to understand than custom repressions.

Also would 1M:30,1Y:1M be valid if 1M:30 is valid? It then reads ā€œKeep no more than 30 versions (deleting the oldest) for 1 month, then keep 1 per month for 1 yearā€.
If one is valid and the other isnā€™t, then I think itā€™s confusing because itā€™s inconsistent. At the same time it seems weird if you make 30 backups in a row one day, then there is no backups left from the previous 29 days of the month that can be turned into 1Y:1M.

In the case of U for example Iā€™d say 1M:U,1Y:1M is valid because that then replaces 1M:0s,1Y:1M - Which by the way I always thought was poor semantic form to say ā€œunlimitedā€ by saying 0s and I think it even works with 0D, 0M, or 0Y, which all looks pretty confusing to me.

I think this might be a complicated way of writing `1Y:1W,30M:1M. Although I guess technically, if you missed a full month of backups, then youā€™d only have 29 backups after two and a half year of backups, instead of 30 (then going back 31 months). But how good a use case is that? Would it make sense?

My main thoughts/concerns are:

  • Specifying a number should Always main: I want at least this number of versions and not: I want no more than this number of versions. So 1M:30,1Y:1M should not mean ā€œDelete all versions older than the first 30 versionsā€, but ā€œStop deleting when there are 30 remaining versionsā€. This will prevent deleting more versions than intended.
  • I was thinking about some way to define the total number of versions, not the versions in a specific time frame. Not thought of the exact syntax, but somehow a trailing number could be supplied, indicating the minimum number of versions to keep. You could add an optional K: to supply the minimum total number of versions.
    Alternatively, you could add an optional third value for each time frame, so 1M:1W would mean "one backup for every week last month, and 1M:1W:3 would be interpreted as ā€œone backup for every week last month, but stop deleting backups this months if there are 3 remaining backups in this time frameā€.

My main concern is a common use case like 1M:1D. If I choose this policy, my intention is that I have (around) 30 versions to choose from when doing a restore operation.
If my computer is switched of for 1 month (holiday), all backup versions are deleted except 1, resulting in destroying all file versions with doing nothing. I see this as a potential problem.
Combining with --keep-versions is not supported, so there is no way to keep my retention policy (if I backup twice a day, 50% can be deleted) and keep at least 20 versions. Something like 1M:1D;K:20 would resolve that.

1 Like

Aaaaah, now that makes sense! I didnā€™t get that part from the example, but looking back I see what you meant now :slight_smile:

Then something like 1M:30 make sense and it also answers my concerns about it working in any order. Although it cannot be 30:1M, that makes no sense, right?

I think a letter definition will be good. 1M:K30 cannot possibly be a typo where 1M:30D was accidentally turned into 1M:30 by deleting the letter.

It all makes a lot of sense to me now, though.

Unlike U, which is semantical, K actually supports purposes that were possible by combining retention with keep policy. Except this is more flexible than retention+keep policy.

3 Likes

Thatā€™s exactly what I tried to point out. Sorry if I was unclear about that, but this is quite complex stuff. A small change can have all kinds of unwanted side effects.
Didnā€™t think too much about how to resolve it in detail, but Iā€™m concerned a bit about the number of versions that could be deleted unintentionally.
I guess the syntax can be improved. For example: when choosing a letter for versions to keep, you could choose K (keep), V (versions), N (number of versions) and so on. Itā€™s just a thought.

It is. Itā€™s the classical problem when writing software for arbitrary inputs. There are millions of potential inputs and they all have to work consistently, but they should also cover every type of use case. Itā€™s good to be able to discuss it :slight_smile:

I think thatā€™s a good point.

I think itā€™s best to settle on one. The syntax is already a bit overwhelming and Iā€™ll admit I spent a good 5-10 minutes designing my first retention policy before entirely understanding what it would mean for my backup.

Running 2.0.2.19_canary_2018-02-12

In order to get the retention policy runs listed in the internal log and via the e-mail reports, i must enable this in the settings:

--log-level=Information

I this the way it is supposed to be ?

What do you mean by ā€œretention policy runsā€? Is this any run using the --retention-policy parameter or runs that only do cleanup (because no file changes are needing backup)?

With ā€œretention policy runsā€, I mean when the configured retention policy is executed - after each backup.

Example excerpt from the log:

Messages: [
[Retention Policy]: Start checking if backups can be removed,
[Retention Policy]: Time frames and intervals pairs: ..

[Retention Policy]: Backups to consider: 2018-02-12..

But if --log-level=Information not is configured, there will be no report in the log. No information at all about the retention policy result.

Great clarification, thanks!

I suspect this is by design due to the potential for a LOT of messages to come out of retention processing, but I should probably step back and let the actual developer of this functionality answer for sure. :slight_smile:

I find that the ā€œresultā€ log for the individual backup job itself does reflect retention policy settings - can you check that also?

  1. [Click Backup Job on Home screen]
  2. ā€œReportingā€
  3. ā€œShow Logā€
  4. Select any entry ending with ā€œResultā€
  5. ā€œMessagesā€ section near the bottom of the log output

This is exactly what Iā€™m talking about. Here is more excerpt from the ā€œresultā€ log:
ā€¦
ā€¦
Messages: [
[Retention Policy]: Start checking if backups can be removed,
[Retention Policy]: Time frames and intervals pairs: 7.00:00:00 / 00:00:00, 31.00:00:00 / 1.00:00:00, 181.00:00:00 / 7.00:00:00, 3652.00:00:00 / 31.00:00:00,
[Retention Policy]: Backups to consider: 2018-02-12 00:24:23, 2018-02-11 00:06:49, 2018-02-06 23:20:03, 2018-02-01 22:49:28, 2018-01-29 23:11:58, 2018-01-26 23:39:25, 2018-01-24 23:41:15, 2018-01-23 22:08:36, 2018-01-21 23:14:47, 2017-12-27 02:08:44, 2017-12-13 00:54:57, 2017-12-03 23:25:30, 2017-11-23 01:12:00, 2017-11-15 21:59:49, 2017-11-06 00:35:11, 2017-10-29 23:48:54, 2017-10-17 23:44:29, 2017-10-09 00:40:41, 2017-09-10 23:37:07, 2017-07-25 00:14:59, 2017-06-13 23:44:50, 2017-05-11 00:32:47, 2017-03-29 22:49:06,
[Retention Policy]: Backups outside of all time frames and thus getting deleted: ,
[Retention Policy]: All backups to delete: ,
ā€¦
]
Warnings: []
Errors: []


Again, no information about retention policy checking and result is included if --log-level=Information not is configured,