New retention policy deletes old backups in a smart way

ts678 · October 8, 2018, 2:07pm

Cobian Backup appears to do full, differential, and incremental backups, and possibly that’s causing the feeling that Duplicati needs a complex schedule. Duplicati’s approach changed away from that older model in Duplicati 2. Now, only new or changed file information is ever uploaded, and every backup has a full list of files. Because there’s only one backup type, you just have to decide how fast to thin out the backup versions as they grow old, however you don’t get to set the day, only the interval. Note that files that only existed between backups will not be available for restore, just as if you took a traditional full monthly, and the files existed between the backups…

If for example you want a daily backups kept until 1 week old, then weekly backups kept until 1 month old, then monthly backup kept until 3 years old, and then deleted, use 1W:1D,1M:1W,3Y:1M. Case matters. m is minutes.

tigro11 · October 8, 2018, 6:51pm

ok as you say, but my problem is that if the first backup that is complete is damaged I am in shit. That’s why I wanted to have complete separate backups, to overcome the problem I wanted to create complete backups to always ensure a complete recovery.

ts678 · October 8, 2018, 9:29pm

Even a complete backup gets damaged if the destination gets damaged. Backing up to different destinations using different software would be safest. Ideally you would periodically do heavy verification or a test restore.

The first backup is most different in that it’s typically larger, thus it produces more destination files to damage. Heavy verification can ensure it went in right, and some verification of this and other destination files happens automatically by default on every backup. You can adjust the amount higher by setting –backup-test-samples or you can force a (likely time-consuming) test of as many files as you care to do, using The TEST command.

Making a full from-scratch backup, copying it elsewhere, and repeating the process periodically (e.g. monthly) could probably be cobbled together with enough scripting and maybe some tools to copy off the backup files, The DELETE command with –allow-full-removal can probably empty out destination files, and file deletion can delete the local per-job database. The job schedule is kept in the server database, and might put up with this.

Unless you backup the local job database, recovery from parked backups will have to do extra work building a partial temporary database. Losing a database is painful, but your plan seems to demand intentional deletion.

Duplicati works best backing up to a continuing database and destination, and your wish is kind of opposite… The retention policy idea in this topic and elsewhere refers to handling backup versions from a continuing job.

tigro11 · October 8, 2018, 9:38pm

I’m not very convinced of the logic that adopts duplicates, then if I’m not mistaken if I lose the local databases example the local disk is broken and I no longer have access to the local database, you tell me that my backups can not recover them anymore?

ts678 · October 9, 2018, 1:01am

I didn’t understand “adopts duplicates”, but loss of disk does not mean inability to restore. It would take longer because there’s at least a partial database recreate that would not be needed if it hadn’t been lost or deleted. The database maintains information on what’s on the destination, so that, for example, backups can see what data is already at the destination so that it can be included by reference (not upload) in a new or changed file.
How the backup process works and How the restore process works give details which might aid understanding. Restoring files if your Duplicati installation is lost and video Duplicati Tutorial 11 Disaster Recovery explain too.
Should there be further questions, perhaps this deserves its own topic rather than staying in “retention policy”.

tigro11 · October 9, 2018, 9:22am

so if I make a backup of the local database folder are more quiet for recovery?
Is there a way to export the complete configuration of both the backup and the duplicate settings?

ts678 · October 9, 2018, 12:28pm

Starting with the easier question, backing up the configuration starts with Configuration --> Export on the job menu, and this merges in the default options on the global settings, but omits non-job stuff like UI password. Back up the configuration somewhere other than Duplicati to use for getting Duplicati back if a disk gets lost.

Database backup is not as easy. Backup duplicati database: is it better or doesn’t matter? provides another statement about when it helps, and I’m still thinking intentionally deleting it is contrary to the expected usage. Reasons for a backup probably don’t include “quieter” unless something goes wrong on the recreate and it gets noisy. If no issues occur, it will probably be “quicker”, however that’s a bigger issue for bigger backups.

Pulling back to closer to on-topic, you could use one of --retention-policy, –keep-time, or –keep-versions for trimming your backup, but you might need to run The COMPACT command to reclaim the destination space. These backup retention operations are run after the backup, so they can help shrink the destination usage (helping your copy-off of the destination), however my belief is they won’t delete last backup before backup (which is dangerous anyway) even if you turn off the safety by saying --allow-full-removal, due to the timing.

Typically, people who do database backup seem to do something like a separate job that runs after the real backup. You can find out the path to of your job’s local database by looking at Advanced -->Database for a backup, and this is also what you would delete if you insist on trying to reset to force doing complete upload. Attempting folder backup needs to carefully avoid backing up the database that’s active doing the backup…

While a backup database is much smaller than the actual backup (because it’s basically cached information attempting to stay in sync with the destination data), some people find that Duplicati’s deduplication doesn’t help much due to the extent of the changes, so the upload winds up being pretty much the whole database. Possibly that won’t matter to you, because the wish seems to be a large self-contained backup to then park.

To get back to an earlier suggestion, maybe you could use a different backup program to do self-contained backup for loss-of-disk cases, while letting Duplicati run a continuing backup maintained by retention policy. rclone may be useful for remote copying. Local has many options, but a big local disaster will leave nothing.

If you like the complete file backups, possibly you could also consider even-more-complete image backups. Free versions of commercial products exist, likely limited in fancy options you might not want. One example.

If you decide to design a complex strategy that you count on totally, please be sure that you test it very well.

tigro11 · October 9, 2018, 4:04pm

essentially duplicates does an incremental backup and that’s it. If I split the zip file into smaller parts, do I have more security if the full backup is damaged?
If the first backup that is always complete in the long run is damaged, I also lose the files afterwards or just the whole backup?

ts678 · October 9, 2018, 8:00pm

To provide some comment relevant to this retention policy thread, as retention policy (of any sort) does its job, what was originally the initial backup may eventually hold data that no longer exists in any file, and therefore is space waste. The –small-file-size, –small-file-max-count, –threshold, and –no-auto-compact settings configure compact behavior. Compact may blend data from different backup times. The intent is to reduce space usage, not maintain initial backup, however if you go down the backup-once-then-park-and-start-again path, compact will never feel a need to run. This makes me think the second question is about a continuing backup, asking if damage can lose the original backup or the updated data, and the answer is it depends on which file was lost and what was in there. The AFFECTED command can show the impact of a given file loss. You can also read Disaster Recovery. Losing a few files has a limited impact, unless it somehow breaks the database recreation.

Choosing sizes in Duplicati might be worth reading, but it doesn’t get into damage questions, maybe because there’s not enough information on actual losses to put together anything close to statistically accurate claims.

I can drag in statistics (actually probability) in another way to say trying to improve reliability by some percent fails dramatically compared to a second independent backup. To invent numbers, if a given way fails 1 out of 100 times, the chance of both failing (assuming no commonalities, which is a stretch) is only 1 time in 10,000. That’s far better than guessing at the effect of zip file sizes, and getting 1 in 200, or 1 in 50 with wrong guess.

tigro11 · October 9, 2018, 9:21pm

ok it seems that from what you say, if the destination of my backup is not damaged, even if the local disk is broken, I have no problem recovering all my work.
For security, however, I continue to use cobian, and duplicate in parallel.
I also saw the microsoft veeam system, nice, but it does not work in ftp.
However I keep an eye on duplicates

ts678 · October 9, 2018, 10:01pm

You “should” have no problems recovering. I can’t guarantee it, and sometimes people do have problems. No solution is perfect, and Duplicati (itself not perfect) does rely on things beyond its control, e.g. the destination.

mobamoba · October 14, 2018, 1:02pm

Is this still true or did something change in the retention policy since this post?

I’m asking because I primarily use Duplicati to archive video; I will often delete the video from my main hard drive but still want one version of each file and its modification somewhere in the backup in case at some point in the future I want to restore it.

My current setup is to have Duplicati backup twice a day with an unlimited retention policy. While this works fine as far as archiving goes (no files are ever removed from Duplicati unless I manually purge them), it generates an insane number of backup versions, making it very difficult to browse through.

I was hoping that Smart Retention might take care of this by automatically pruning all those backups to something manageable to browse through while still making sure to retain a single version of each file plus any modifications to that file unless I manually purge.

So am I to understand that this is not currently possible, i.e. that if I switched to Smart Retention would still, at some point, end up deleting actual files?

If so, is there a manual way to thin out all those backup version, like a command to reduce them down to the dead minimum numbers of backups necessary in order to retain all file versions?

BTW, I understand that Duplicati doesn’t store “files” per se but I think you know what I mean. Thanks.

Pectojin · October 14, 2018, 6:14pm

That’s just true for the retention policy i set personally. It could be configured for anything.

This is tough to solve both because it makes retention policy nearly impossible to use and because when you have 365 restore points you’ll likely spend a long time looking for the file in each restore point.

The retention policy only cares about restore points, it doesn’t care about the actual files, so it won’t fit this usecase. It will delete your files if they aren’t in one of the “kept” restore points.

You can manually delete snapshots, but the overhead in keeping track of which ones to keep does not seem ideal.

I would strongly advice you to put these files in some kind of archive storage instead of storing them in Duplicati. It’ll use the same disk space anyway, so the only difference is added overhead in managing the backup so carefully.

mobamoba · October 14, 2018, 7:15pm

Thanks for the thorough response. In all honesty, I tried numerous other backup and archiving systems before settling on Duplicati because, even in my use case, it’s still far superior to most other software I looked at - it dedupes (key in my use case because sometimes my media files are duplicated over a number drives); it’s web-based and the machine I’m using it for is headless so this was critical; it can delete individual files from the backup (weirdly this is a rare feature in most backup software); it doesn’t require much hard drive space to run (unlike, say, Veeam Agent, an excellent piece of software but one which requires double the size of the backup for its compacting process, i.e. if you have a 1.5 TB backup on a 2 TB drive, the software tanks out when compacting). It’s a really excellent piece of software and if its one downside is that searching for a particular version may take time then so be it.

Different question that would help in my use case if possible: is there way to label a particular backup version? I know each backup automatically gets the time/date stamp but is there a way to add a user comment to a particular backup a la “This backup contains the best version of blah blah blah file” or something, some kind of notation system? Thanks.

ts678 · October 14, 2018, 7:46pm

Although you probably didn’t mean it this way, that’s exactly what happens with the backup. A new version is not made (even if the backup runs) unless there is some change (such as a new file version) calling for that.

It appears that the original motivation for the author of this feature was genuinely to trim file versions to save space at the destination but to give a more configurable way than previously available, e.g. to thin over time.

Duplicati is oriented towards tracking views of every file in particular dated backup versions. A request heard (especially from CrashPlan users) is to have a different retention plan (such as forever) for deleted files and also to have a way to add them to the UI for restore. That would probably take a lot more code than this plan which does the usual delete, just with a more powerful selection ability, so I doubt it will happen anytime soon.

Although it’s not as pretty, finding briefly existing files can be done with The FIND command and all-versions. Having said that, Duplicati does mean to be a backup program not an archiver, and it also hasn’t left beta… Personally, I see it as great to save you from short term issues, but keeping your only copy forever is riskier.

Pectojin · October 14, 2018, 7:50pm

There isn’t currently a way, but it’s a possible feature suggestion.

molecular_eskimo · October 18, 2018, 6:13pm

Just in case it helps the next person, s/m/h are case-sensitive. Took me an embarrassing amount of time to figure that out.

INTELBB · April 15, 2019, 12:50pm

I’m not shure if I understand right…
If I run backup every hour, and want to keep last 24 hours worth of backup, 1 backup for every day, 1 backup for every week, 1 backup for month, 1 for a year, how would my custom retention policy look like then?
U:24h,1D:1D,1W:1D,1Y:1Y?

What I want actualy is to keep 24 versions of file in last 24hours, 1 version from past day, 1 version from past week, 4 versions from past month, 1 version from past 4 months, and not to delete yearly backup. Can someown help with syntax please?

drwtsn32 · April 15, 2019, 6:16pm

How you word this is a bit confusing to me. Keep 24 versions in the last 24 hours, but then you say 1 version for the last day. Which is it, 1 version or 24 versions in the last day?

Let me show you something else that may be close to what you mean:

1D:U,1W:1D,1M:1W,4M:1M,99Y:1Y

This means:
All backups within the last 1 day (24 hours) will be retained.
Beyond that, only 1 backup per day for the past 1 week will be retained.
Beyond that, only 1 backup per week for the past 1 month will be retained.
Beyond that, only 1 backup per month for the past 4 months will be retained.
Beyond that, only 1 backup per year for the past 99 years will be retained.

What do you think? Does that help?

INTELBB · April 16, 2019, 5:01am

Wow. That’s exactly what I want, and now I understand better how that works! Thank you.