Retention Policy and Deleted File Behavior

handyguy · October 2, 2022, 9:12pm

For a while now I’ve been using the default “Smart Retention Policy” for my backup sets. Some recent near-misses have forced me to re-evaluate my retention policies. After hours reading through the Retention Policy discussions I think I understand it (maybe).

I have 2 backup sets:

General Purpose (daily) files. Add/change/delete often. Backup runs daily. Retention policy will be: 1W:1D, 4W:1W, 24M:1M
Media files (photos, home movies, music, etc.): Once saved, they rarely change. Backup runs weekly. Retention policy will be: 6M:1W, 24M:1M

My understanding from my reading is the following:

Active files: As long as a file still exists on the drive it will continue to be included in the backups, even if it hasn’t been changed/updated for a long time. E.g., an unchanged file on the drive is 3 years old and the retention policy ends at 24M, it will still be included in new backups because it is still an ‘active’ file.
Deleted files: Once a file is deleted from the drive it will not be included in any new backups, but is still included in the existing backups for as long as the retention policy specifies. When you need to restore that file, you may have to go back through several backups to find it.
Old Deleted Files: Once a deleted file is older than the retention time (24M in my examples) it will be deleted from the backup database and will no longer be available anywhere to restore.

My Questions:

Is my understanding of the behavior around deleted files correct? Did I miss any important points I should consider before proceeding?
Do my proposed retention policies seem sensible for the the types of files in each backup set?

Thanks in advance.

ts678 · October 3, 2022, 9:59pm

Quite close, however I’d prefer not talking about deleted files being deleted from the backup, as there’s no individual file delete handling. It’s the backup versions that get deleted. Earlier had the file, but later did not.

There are some minor issues about how the delete time lines up with the surviving backup times, and file availability might end maybe a month early. Here’s an exagerrated example where it can end a year early:

Jan 1 2022 backup runs and gets file
Dec 31 2022 file deleted from source
Jan 1 2023 backup runs but lacks file
Jan 1 2024 backup deletes 2022 one

Your backups are more frequent and you thin to no less than 1 month apart, so maybe you get 23 months.

Another way to lose files early is to have them come and go too soon, e.g. if you thin to a 1 month interval, any file that came and went between the surviving versions is gone because it was not seen by either one.

Setting the interval at exactly the backup frequency (which smart retention unfortunately does) can cause surprising results if the backup times vary even by seconds, as the actual interval and deletion interval are very close. This bothers some people. The way to avoid this is to add some margin manually, e.g. for 1D, use 23h instead, etc. Some people also expect this is calendar-based. It’s not. It’s times - to the second…

handyguy · October 4, 2022, 12:42am

Thanks @ts678, great explanation. Just to clarify one point (as this is crucial for my backup strategy), as long as a file is “active” on my hard drive, even if it has not changed at all during the retention schedule’s lifetime, it will always be included in at least one backup version (the most recent)?

Great point about adding a “margin” to the retention schedule. I’ll have to consider that as I revise my strategy.

ts678 · October 4, 2022, 12:53am

I prefer to use the word “present” (or just delete “active”). Whatever is found is in the backup version.
This is of course influenced by what you asked to backup, and any filters or exclusions that are used.

If a file has been sitting there unchanged for three years, it will be in every version configured to grab it.
Something’s still not clicking. A backup version is a point-in-time snapshot of the files seen at the time.
Browse the Restore GUI for different backup dates to look at some actual old unchanged file if you like.

handyguy · October 4, 2022, 1:11am

If a file has been sitting there unchanged for three years, it will be in every version configured to grab it. … A backup version is a point-in-time snapshot of the files seen at the time.

That last point is what wasn’t clicking. Consider it now clicked

Thanks for the explanations.

aestetix · October 4, 2022, 3:39pm

This is the clearest explanation I’ve seen of this feature. The docs would do well to phrase it this way.

ts678 · October 5, 2022, 2:04pm

GUI smart and custom backup retention aren’t covered #83 asks for parts of it. You can suggest more.
I’m not sure exactly which parts you liked. There are many that confuse people. Is that you on GitHub?
If you can do pull requests and have some writing skill, volunteers are always encouraged – anywhere.
For the Duplicati manual, its officially advised change preference is for pull requests, instead of issues.
I can help with how-it-works part if someone else can figure out how to present and do the pull request.

aestetix · October 5, 2022, 2:41pm

I’ll have a look at doing a pull request. That’s the wrong aestetix though.

aestetix · October 6, 2022, 9:27am