New retention policy deletes old backups in a smart way

ok it seems that from what you say, if the destination of my backup is not damaged, even if the local disk is broken, I have no problem recovering all my work.
For security, however, I continue to use cobian, and duplicate in parallel.
I also saw the microsoft veeam system, nice, but it does not work in ftp.
However I keep an eye on duplicates

You “should” have no problems recovering. I can’t guarantee it, and sometimes people do have problems. No solution is perfect, and Duplicati (itself not perfect) does rely on things beyond its control, e.g. the destination.

Is this still true or did something change in the retention policy since this post?

I’m asking because I primarily use Duplicati to archive video; I will often delete the video from my main hard drive but still want one version of each file and its modification somewhere in the backup in case at some point in the future I want to restore it.

My current setup is to have Duplicati backup twice a day with an unlimited retention policy. While this works fine as far as archiving goes (no files are ever removed from Duplicati unless I manually purge them), it generates an insane number of backup versions, making it very difficult to browse through.

I was hoping that Smart Retention might take care of this by automatically pruning all those backups to something manageable to browse through while still making sure to retain a single version of each file plus any modifications to that file unless I manually purge.

So am I to understand that this is not currently possible, i.e. that if I switched to Smart Retention would still, at some point, end up deleting actual files?

If so, is there a manual way to thin out all those backup version, like a command to reduce them down to the dead minimum numbers of backups necessary in order to retain all file versions?

BTW, I understand that Duplicati doesn’t store “files” per se but I think you know what I mean. :slight_smile: Thanks.

That’s just true for the retention policy i set personally. It could be configured for anything.

This is tough to solve both because it makes retention policy nearly impossible to use and because when you have 365 restore points you’ll likely spend a long time looking for the file in each restore point.

The retention policy only cares about restore points, it doesn’t care about the actual files, so it won’t fit this usecase. It will delete your files if they aren’t in one of the “kept” restore points.

You can manually delete snapshots, but the overhead in keeping track of which ones to keep does not seem ideal.

I would strongly advice you to put these files in some kind of archive storage instead of storing them in Duplicati. It’ll use the same disk space anyway, so the only difference is added overhead in managing the backup so carefully.

Thanks for the thorough response. In all honesty, I tried numerous other backup and archiving systems before settling on Duplicati because, even in my use case, it’s still far superior to most other software I looked at - it dedupes (key in my use case because sometimes my media files are duplicated over a number drives); it’s web-based and the machine I’m using it for is headless so this was critical; it can delete individual files from the backup (weirdly this is a rare feature in most backup software); it doesn’t require much hard drive space to run (unlike, say, Veeam Agent, an excellent piece of software but one which requires double the size of the backup for its compacting process, i.e. if you have a 1.5 TB backup on a 2 TB drive, the software tanks out when compacting). It’s a really excellent piece of software and if its one downside is that searching for a particular version may take time then so be it. :slight_smile:

Different question that would help in my use case if possible: is there way to label a particular backup version? I know each backup automatically gets the time/date stamp but is there a way to add a user comment to a particular backup a la “This backup contains the best version of blah blah blah file” or something, some kind of notation system? Thanks.

Although you probably didn’t mean it this way, that’s exactly what happens with the backup. A new version is not made (even if the backup runs) unless there is some change (such as a new file version) calling for that.

It appears that the original motivation for the author of this feature was genuinely to trim file versions to save space at the destination but to give a more configurable way than previously available, e.g. to thin over time.

Duplicati is oriented towards tracking views of every file in particular dated backup versions. A request heard (especially from CrashPlan users) is to have a different retention plan (such as forever) for deleted files and also to have a way to add them to the UI for restore. That would probably take a lot more code than this plan which does the usual delete, just with a more powerful selection ability, so I doubt it will happen anytime soon.

Although it’s not as pretty, finding briefly existing files can be done with The FIND command and all-versions. Having said that, Duplicati does mean to be a backup program not an archiver, and it also hasn’t left beta… Personally, I see it as great to save you from short term issues, but keeping your only copy forever is riskier.

There isn’t currently a way, but it’s a possible feature suggestion.

Just in case it helps the next person, s/m/h are case-sensitive. Took me an embarrassing amount of time to figure that out.

I’m not shure if I understand right…
If I run backup every hour, and want to keep last 24 hours worth of backup, 1 backup for every day, 1 backup for every week, 1 backup for month, 1 for a year, how would my custom retention policy look like then?
U:24h,1D:1D,1W:1D,1Y:1Y?

What I want actualy is to keep 24 versions of file in last 24hours, 1 version from past day, 1 version from past week, 4 versions from past month, 1 version from past 4 months, and not to delete yearly backup. Can someown help with syntax please?

How you word this is a bit confusing to me. Keep 24 versions in the last 24 hours, but then you say 1 version for the last day. Which is it, 1 version or 24 versions in the last day?

Let me show you something else that may be close to what you mean:

1D:U,1W:1D,1M:1W,4M:1M,99Y:1Y

This means:
All backups within the last 1 day (24 hours) will be retained.
Beyond that, only 1 backup per day for the past 1 week will be retained.
Beyond that, only 1 backup per week for the past 1 month will be retained.
Beyond that, only 1 backup per month for the past 4 months will be retained.
Beyond that, only 1 backup per year for the past 99 years will be retained.

What do you think? Does that help?

2 Likes

Wow. That’s exactly what I want, and now I understand better how that works! Thank you.

You bet - glad it helped!

Apologies - newbie alert. :slight_smile:

Is there a way to configure Duplicati (with or without Smart Retention) in such a way, that a file that has been deleted from the source, will always be available in the backup? (f.i., also after 13 months, when using Smart Retention)

Thanks!

If you do “keep all versions” that will of course keep even deleted files.

Beyond that, sort of.

If using a retention policy, you should use Custom and be sure to include a final rule like @drwtsn32’s 99y:1y which will keep a copy (even of deleted files) for to to 99 years BUT… Only if the file is in the ONE backup chosen to be kept for the year.

It’s not perfect - in fact it’s easy to “lose” files in the time gaps but it’s the best we’ve got for now.

Personally, I’d love a secondary retention policy for deleted files - including never remove.

Thank you!

I wasn’t sure about “keep all versions”, in the scenario where the local file got deleted. Good to know how that works. This really helps with designing my backup strategy.

FWIW, I like the way Bvckup handles deleted files. There’s an option to ‘archive’ deleted files - when Bvckup notices a file got deleted, it moves that file (on the backup side) to an ‘Archive’ directory.

I was looking for a solution and found this thread discussing the same problem. But I still haven’t found a proper solution. What I want is a retention policy that keeps all versions for a short while, say, a month, and then keeps only one version for up to three years. Based on the discussion here, I tried “–retention-policy=1M:U,3Y:3Y”, but that gave me an error message. I figured it was the “3Y:3Y” part that was not accepted.

So I have to ask what is the recommended way to do it?

For now, I have “–retention-policy=1M:U,2M:1M,3Y:2Y”, which in my understanding should do what I’m trying to achieve, but surely this cannot be the recommended way?

When you say “one version” for up to three years, do you mean one file version? Because that’s not how Duplicati backup retention works. Retention is by entire backup set versions - it does not support retention at the file version level.

So you wanting 1M:U,3Y:3Y (if it were accepted) would cause all but one backup job to be deleted after one month. And after 3 years you’d have zero backup jobs retained.

That I had not comprehended yet. Maybe I have too long a history with version-control systems that build up the history of a configuration from the history of files (unlike Git, which is revolutionary in that regard).

I will have to rethink the retention oolicy for all of my backups now.

I possibly miss the reference given, but if you consider a configuration to be conceptually like a backup version, Duplicati builds it as a set of files where history of any given file is represented as the blocks it contains, whether those blocks are new or old. That’s one way block-based deduplication happens, so space use of a slightly changed version of a backup is a small change upload, plus a lot of references.

Block-based storage engine

How the backup process works

Someone with many files having frequent changes will face space issues, and deleting versions will help, however any file that arrived and disappeared between the surviving versions will not be any version at all.

I’m not clear on the reason for “only one version for up to three years”, but I wanted to mention the design.

Backup system is not a VCS, and I am not trying to equate them in any way. I am just explaining where I am coming from; I have a long history with coding and VCS systems.

I hope this helps.

The difference between Git and all its predecessors is that the traditional VCS tools hold file versions as the primary objects and derive the history of a configuration from file versions. This derivation order has been traditionally motivated by performance aspects, but is often incomplete or otherwise flaky. This is where Linus Torvalds turned the tables quite completely. In Git, the version of a configuration is the primary object, and if one needs at all to see the version history of a file (as a tree or some such presentation), it can be derived from the history of the configuration that contains the file, but the derivation process is not always unambiguous.

So which is more important, the history of the whole configuration or the history of its individual files? In VCS, the configuration always is more important.

When it comes to backup sets, the individual files usually are more important than the state of the whole set, but not always. So now I will have to figure out what I actually need.