Unexpected results with deleted files when using Custom and Smart Retention settings

In this post i’ll discuss the effect of Custom and Smart Retention settings on deleted files.

In short: deleted files may be gone long before you expect them to

Please note the following:

  • Duplicati works as designed
  • According to a long read of forum posts: many users will not expect this (but some already know)
  • I could not find this in the manual pages
  • Maybe i do not correctly understand the Duplicati backup model, it would be nice if an expert would comment to confirm or deny stated facts from this post

In order to understand the behavior, first we must know:
How does Duplicati makes backup’s?

  1. Your source is one or more directory trees, for which you want to maintain backups
  2. With each full and uninterrupted backup, Duplicati creates a snapshot of your complete source. This snapshot is also called backup, backup-set or version
  3. The snapshot essentially is a complete listing of your source, including (a reference to) its contents
  4. The content of the files (split in dblocks) is maintained seperately by Duplicati in volume files
  5. The snapshot refer to these content (hashes to dblocks) and unchanged content is only saved once

Please note that a snapshot essentially is a point-in-time listing/backup of your source. A snapshot only has knowlegde about existing objects in the source.

Ok, what does all this mean?

  • For the first backup, Duplicati creates a snapshot (listing of all source objects) and saves all contents
  • For an unchanged source, each Duplicati backup creates a snapshot (listing), but does not save any content, because this is not needed as it is unchanged
  • For a source with changed content, only the changes are propagated. The old content is kept as long as other, older, snapshots refer to it
  • For a source with deleted files, nothing special is done. The deletions are not present in the snapshot. The contents of deleted files is kept as long as other, older, snapshots refer to it

Suppose you make a backup every day for an entire year. You will have 365 snapshots, or backup-sets of versions. Can be a lot of data to backup.
Suppose you are happy with backup thinning and you only need one backup a day for the first 7 days, one per week for the next 4 weeks and one per month for a year after that.

This can be done with a Custom Retention setting: 7D:1D,4W:1W,12M:1M (a Smart Retention is just a preset Retention Setting of this kind).
The effect is that your amount of snapshots to keep will be reduced to 7 + 4 + 12 = 23, and the oldest one wil be about 12 months plus 4 weeks plus 7 days old.

For source objects that stay present in your source all works as expected. Nothing to tell here. For source objects that are deleted this does not work as expected by me (and i fear by many others). Why is that?

Somehow, i expect that deleted files follow the Custom Retention rule as set up and that the deleted file wil be retrievable for the total time. A bit over a year in the example. But this is not how it works. Depending on how long a file existed before it was deleted it can be retrievable as expected, or be gone after a week!

Worst case:

  1. One backup each day and a Custom Retention rule of 7D:1D,4W:1W,12M:1M
  2. Create a file and run a daily backup. The file now exist in one snapshot/backup-set/version (version 0f, * means it contains the file)
  3. Next day, delete the file and backup. All backup’s from this day on do not contain the deleted file. We now have two snapshots (0,1f)
  4. After 6 more days, we have 8 snapshot and backup thinning kicks in. But there is only one snapshop in the week bin, so i think all 8 are still kept (0,1,2,3,4,5,6,7f)
  5. After another day, we have 9 snapshots. 7 for the first 7 days and 2 for the week after that (0,1,2,3,4,5,6,7,8f). I think the very oldest snapshop is now deleted.

Conclusion: after nine days your deleted file is lost forever.
Please note that i do not fully understand snapshot selection during thinning. I could be that in this example the oldest snapshot is kept. If so, add an extra day and backup before stap 2.

The issue here is that short lived files can be lost before you know. Short lived means shorter than the largest thinning cup, in this example one month. The reason is that thinning only looks to snapshot dates, nothing else. Again: Duplicati works as designed.

For me, retrieval of short lived files must be possible in my total backup retrieval period, say one year. Deleted files must be retrieval like all other files. For this use case, using a Smart or Custom Retention rule is not possible. The solution simply is to use an other rule (Delete backups that are older than).

I wonder if this aspect was on the radar when custom retention rules where designed. I feel they are only usable for very advanced users.

So please be aware of the effect of Smart or Custom Retention rules if you care about deleted files!

1 Like

So what you’re saying is that if you use eg smart backup and delete a file locally that it then gets fully deleted from Duplicati backups within some days?

That definitely doesn’t protect from accidental deletions that show up months later. That should not be an advanced user thing for sure. I would think its just an oversight.

According to the following then this would indeed be an incorrect situation or bug as smart backup says it will have a limit for 12 months (There will remain one backup for… each of the last 12 months:)

any file deleted from the source will subsequently be deleted from the backup when it reaches the retention policy limit.

Depending on the date the now deleted file was created and depending on the Custom/Smart Retention rule: yes, this could happen.

I cannot immediately find the source of your citation, so i cannot comment on it due to missing context. But it seems incorrect indeed.

The issue here is dat during snapshot/backup thinning (that is what custom/Smart Retention does), Duplicati cannot manage anything regarding deleted files; it has no knowledge of deletions. It is not part of the conceptual backup model. The result is that snapshots may be removed that contain the only copy/copies of the deleted files. This can be anywhere in the backup time frame e.g. before the retention policy limit, depending on how long the the file has lived.

The source is linked in my post. Not sure why you’re not seeing it.

Ah, so you need to have snapshots enabled for this problem then? Its disabled by default. Might be a byproduct of that use. I don’t use snapshots and haven’t looked into that so don’t know how it stores data there.

From what I can see, it should just use the snapshot to gain the data and adds it to the backup. Maybe there’s some other complex situation though. I’d have to fully re-read everything. Maybe your use is too complex here. Someone else might see what’s going on here.

Missed it. Indeed the statement in JonMikeIV’s post is generally incorrect. It is only true when the now deleted file has existed longer than the retention limit. Only then it behaves as i (and i think many others) expected.

No, my use of snapshot is a descriptive term to explain the way Duplicati works. The --snapshot-policy is something different and has nothing to do with my post. Sorry for the confusion.

1 Like

If I understand correctly. It could be that as it deletes backups, files in those are lost, and it incorrectly loses files. One would have to look at the code and see how its dealing with backup deletions for the smart feature. It should be not deleting if there’s a file timestamp not beyond limit and not found elsewhere or something like that.

At least that’s my quick guess here as to why it could be failing to be correct.

We don’t have to look at the code. Due to the Duplicati conceptual model e.g. the way it works, it has no knowledge of deleted files. It cannot manage deleted files. Normally not a problem. It’s only with Smart or Custom retention rules that things can happen with (short lived) deleted files. They may disappear. Maybe it’s an oversight and i think it cannot and should not be repaired. It’s just how it works. (In order to get Duplicati to manage deleted files, is has to gain knowledge about this, e.g. do comparisons between snapshots/backup-sets/versions. I guess this is a bridge way too far.)

Just use another rule (Delete backups that are older than) and you are safe.

Oh, my reference to deleted wasn’t local. It was referring to the backup data.

Either way, sure if it doesn’t find a local file in the DB, its been deleted or moved (moved is half the same as deleted) anyway so it can know anyway. That’s generally how it works in programming unless code can check in trash.

Welcome to the forum @Jochem

It probably depends on what parts you read. Some seems fine. Comparison to CrashPlan seems off…
I’m not going to run a word count to see if “generally incorrect” fits, but I’ll do a deeper dive on it below.

The first bullet covers files that still exist, so doesn’t apply. The second bullet covers deleted file losses from their backup aging past the longest time frame as was discussed in the posts just above that one

It doesn’t, however, cover the point that deleting a backup makes things as if the backup didn’t happen. Some people might think that’s obvious, however others would miss it, so some support volunteers may specially point out the risk you cite. It would be nice if the manual could point it out. Care to try to write?

GUI smart and custom backup retention aren’t covered #83 (seeking help on explaining a messy topic)

Part below is your worry, right?

People who actually think this through thoroughly may realize (or may be told in the forum) that a file which exists (between creation and deletion) for a time less than current minimum interval (which can increase with time) may disappear completely. This may matter to some people, but there’s a limited amount that can be said. Would it be better to have an “advanced” section for those wanting it?

CrashPlan has special features for deleted files. Are you familiar with those, or other backup programs?
Aside from improving documentation (help wanted, as with everything), what change might you advise?

Migrating from Crashplan - Retention & other general questions covers the difference and your warning.

looks to me to give a very concise summary, and “will remain one backup” is a clue. Certainly, it could expand a bit (if someone volunteers), but I wouldn’t want it to be a book. That’s what the manual is for.

One also has to understand what you describe, which is that a backup is a point in time of the system.
Bringing up CrashPlan again, it blurs the line between system and files. A volunteer plans work below:

Implementing the feature to restore any version of a single file (features can happen given volunteers)

Great advice but I wonder if publicity of this whole area can improve? Documentation can possibly find volunteers, but not often. Actual code (not just message changes) is even more seriously understaffed.

Duplicati relies on volunteers from the community, and that’s what limits the progress that can be made.

Thank you ts678, for your detailed response. I’d like to respond in a few separate replies.

First, please, let us discus a fairly standard use case including a reasonable expectation from an user.

The use case is this:

  • a user creates a daily backup
  • the user understands that a file created and deleted before any backup has run, is lost
  • the uses uses a smart or custom retention rule, lets say 7D:1D,4W:1W,12M:1M
  • the user creates a file and accidentally deletes this file a few days later

The reasonable expectation of the user could be that the deleted file will be retrievable upto the end of the retention period, e.g. a little over year in this example.

Do we agree on all of this? For this moment i assume so.

Now, we know that the expectation wil not be fulfilled. The deleted file (in most cases) will be lost forever if the user tries to retrieve say a month later. The user will be very disappointed.

From the manual this is not clear at all. Even an informed user with generally excellent information incorrectly writes:

The issue with deleted files is pointed out in the forum at only a very few places. Your description in Github is indeed the most complete:

Yes, this is my worry. It makes clear that people need to think [about Smart of Custom Retention policy, Jochem] thoroughly, to only then realize that deleted files can and will be lost within (the smart or custom) retention policy period.

Please note that ts678 describes the same issue as i did, and deeply to, but maybe a bit abstract. The interval refer to the interval times from the retention policies, while i explained the issue via the “thinning process”. It’s all the same.

So my question is:
Should Duplicati expect users to think so hard to get to the conclusion about deleted files?
Or -mayby- should Duplicate protect users against certain predictable and unfortunate use cases?

Yep, and my all-time favorite was the conceptual model of ADSM/TSM/Spectrum Protect.

I guess i’ll make a suggestion for the documentation later on, have to figure out how to do that first.

I strongly disagree on this. There is no clue whatsoever regarding irretrievable deleted files due to snapshot thinning. I would suggest to start the Smart and the Custom description with something like: “For advanced users only. Not all deleted files will be available for retrieval.” and then continue with the current description. (And of course there must follow a good explanation in the manual.)

Note: the deleted file issue only can occur with Smart or Custom retention policies, not with:
image

As explained, the manual does not cover these options, so of course the missing section is not clear.

The retention can be set in 3 ways

is where it ends. but I posted a link to the issue asking that the manual cover this – and your concern.

What you quote seems correct, is written for context I’ll quote below, but isn’t addressing your context:

Notice how that flows to:

any file deleted from the source will subsequently be deleted from the backup when it reaches the retention policy limit.

(and now that I read it again, this is translating backup versions into file terms – maybe a tip for us…)

What I see as the step-too-far, possibly inspired by the context it’s in though:

It’s talking of fell-off-the-end, not deleted-from-middle, so I agree with your point more than your quote.

I think pointing it out would be better. Manual needs writing, and GUI text could expand. Any thoughts?
Either way, this needs someone who knows the mechanics of how to do it (and volunteers to do so…).

Is that Yep to CrashPlan? Regardless, how do any products that you know deal with version thinning?
Can anybody thin versions in a way that doesn’t lose deleted files into holes created by the deletions?

Care to explain, especially if on-topic and maybe even if it’s not? Duplicati can’t bend its model totally.
I thought about maybe (if developers help) moving last version of a deleted file into a nearby survivor.
That’s kind of ugly though, to have things show up in a version that weren’t originally backed up there.

I think the motivation for thinning was to get rid of intermediate file versions that are no longer useful.

OK, so you think “one backup” per year is no clue whatsoever that there’s a gap? I guess we disagree.
Regardless, “concise” meant it was brief, and brief leaves the impact for user to figure out (clue or not).
It’s definitely possible to be too brief, and this one could use expansion, though expansion has limits…

You’re listing the three the manual covers now, so there’s definitely a blank canvas to try to explain this.

1 Like

I guess I didn’t have this expectation. I was a CrashPlan Home user back in the day, and I did appreciate its special handling of deleted files. It seemed like a pretty unique feature for the backup programs I was familiar with.

When I first started experimenting with Duplicati in 2017, I didn’t see any such special option. With a bit of research I understood that it pruned backups at an entire snapshot level, never deeper down making individual decisions on particular files. I adapted a custom retention option to balance that risk with how many versions I wanted to retain.

Your concern and input is very valuable - it shows that the documentation is inadequate! If you could volunteer to adjust the documentation, it would be appreciated. This project could use additional volunteers.

1 Like

Hello, I stumbled upon this topic after having a in-depth discussion with an LLM about retention policies.

The truth about short-lived deleted files and custom retention policies was unexpected. I tried to find a more recent post, but as far I know the issue has not changed since 2022 and is still valid today in 2026-02.

Like @Jochem, I did not and do not read anything in the GUI (1.x or 2.x) that would warn me about this behaviour.

From the GUI:

Enter a retention strategy manually. Placeholders are D/W/Y for days/weeks/years and U for unlimited. The syntax is: 7D:1D,4W:1W,36M:1M. This example keeps one backup for each of the next 7 days, one for each of the next 4 weeks, and one for each of the next 36 months. This can also be written as 1W:1D,1M:1W,3Y:1M.

The topic already requires so much cognitive load that one is already happy to understand the first layer. Other information that is absent - beyond the caveats - is what happens to snapshots older than 3 years and the that subsequent bucket starts at the prior bucket+1s. (So to a snapshot of 8 days hold, the second policy is applying, keeping 1 per week.)

For the record: I would expect software to warn me about caveats.

Link

I agree with Jochem it is a reasonable expectation. In the technical realm one may have a different position, but if one objectively evaluates the user experience and creates a proper interaction design of setting a retention policy, one cannot deny short-lived deleted files are effectively treated differently than normal files and have a different recovery period. It does not matter there is technical reason for this. In the functional realm, you may not assume complete technical understanding and you should assume incorrect mental models of users which your software should protect them from. Therefore the user must be informed.

How can you do that? By warning users or guiding the configuration process.

The least-complex, is to inform the users about the caveats. For example you could add to the interface the following information at the backup’s options page:

  • The pruning of snapshots is determined by the retention policy.
  • Caveats for Custom backup retention:
    • Short-lived files that are deleted, do not have the same recovery period as regular files.
    • A specific file’s addition that was deleted (days) after, may be pruned from the snapshot collection when the pruning-phase ‘selects’ another snapshot over the one that could have recovered the short-lived deleted file.
    • Since Duplicati has no mechanism to determine importance of one snapshot over another (beyond timestamp) and file deletions or modifications are NOT consolidated when pruning snapshots, please update your mental model and expectations accordingly.
    • If you want all files (including short-lived deleted files) to have the same minimum recovery period after (accidental) deletion, you must not use a custom retention policy with buckets that was designed thin/prune intermediate snapshots.

Feel free to iterate on this example. To goal is to communicate. One needs to explain what is going on, why it is relevant to the reader and how get out the situation if they need to.


To me, it was not clear out of the box that duplicati’s snapshots are not consolidated when they are pruned or thinned. Duplicati does not keep track of deleted files and take that into consideration when pruning.

The normal recovery period of normal files that are deleted is the highest retention period’s bucket. For short-lived deleted files it is not.

For me, I now need to balance two opposing ‘forces’ regarding backup retention.

The one above forces me NOT to use custom retention policies buckets to avoid the (statistical?) unfortunate outcomes for user’s short-lived (accidentally) deleted files.

The lack of an interval time after backup completion (4851) caused by catch-up backups and subsequent too-soon regular schedule backups and independently limited storage space forced me to consider a thinning strategy.

I have not found the ideal solution yet. To voice for completeness: I would prefer not to micro-manage my client’s computer’s backup to this degree.

Example client A has 200GB of storage. They have 180GB of compressed duplicati files that are stored. Backup triggers every 8 hours on system that is not always on. The user needs to be able to restore any back-upped deleted files up to 3 years after deletion. (If it had been technically possible, the intermediate snapshots could have been optimized.)
Go Auto-Configure that for me while staying within that limit. If the user goes beyond 200GB of backup data (incl. versions and deleted data), the system may request intervention, otherwise make it happen.

Hi @JCd, welcome to the forum :waving_hand:

I agree. This is difficult to understand and communicate.

I think I understand the confusion now.

Duplicati makes backups that are logically a “snapshot” of the files at the given moment.
The retention options controls which snapshots are removed.

I think you and @Jochem would want (or expect) the “smart backup” retention to be more of a hybrid approach where snapshots are merged somehow.

The problem with merging snapshots is that it is not a well defined process and not guaranteed to create a working result.

The issue is essentially, that there is no universal way to pick which version to retain, as what is “important” is subjective. Similar to retaining a deleted file between snapshots, what if the file was overwritten with empty data in the meantime? In this case you would also loose the contents if the snapshots were merged.

It gets worse if you also account for files that depend on each other and want to merge snapshots, as you have to know the dependency to always choose those files from the same snapshot.

I would argue that is incorrect and even more confusing. It begs the question: “what is short-lived”?

If your mental model is snapshots, short-lived files are treated exactly the same as any other file.

A better approach would perhaps be to explain that Duplicati works in snapshots?

If you need to retain short-lived files, the only guaranteed solution is to ensure you keep all backups.

In your specific setup it might be sufficient to just ensure Duplicati keeps the last copy of all files for some period, but that is not general, as the last copy might not be the “right” copy.

I was under the impression that the main issue was clear, Jochem described it in the top post. Or maybe you think it’s too complicated term to put in the GUI. In my words, a short-lived file is a file that is added and captured by a snapshot and it’s deletion or rather its absence is then captured in a subsequent, nearby, snapshot. Then, due to the application of the retention policy you effectively get a kind of lottery and the snapshot containing the last version of short-lived file can get pruned and therefore the short-lived file cannot be restored.

You said: “If your mental model is snapshots, short-lived files are treated exactly the same as any other file.”
I said: “Short-lived files that are deleted, do not have the same recovery period as regular files.”

From a literal technical standpoint, I think you are correct. A file is a file. From a functional standpoint this statement is not, because the (zoomed-out) net result for short-lived files is different compared to files that have been captured by the very first snapshot or are referenced in so many snapshots that intermediate pruning would never delete the file before the oldest snapshot is deleted. So the time a user has the possibility to restore the file (=”recovery period”), is lower for short-lived files. That is the functional difference for the user.
‘Short-lived file’ is just a term to talk about those files that can fall through the cracks. Feel free to define a better term.

Furthermore, I don’t think the term of ‘snapshot’ is so absolute that it would inherently relay the notion of not being able to merge or be consolidated. Different contexts use the same term and sometimes it is possible.

That has now become clear. Creating a policy-engine that would somehow be able to determine what is right seems hard.

True, but 0-kb-ing a file seems an edge-case in this discussion.

On it’s own, I don’t think that is sufficient. I guess this thread also has been a request to warn users about it in the GUI, because ‘snapshot’ might be understood as a collection of files of a moment in time, but that doesn’t mean it is logical for the Custom Backup Retention policy ‘applier’ to not have ‘deleted-file-awareness-and-handling’, because I think people have this reasonable expectation that backup software must protect data-loss and the current (custom) retention pruning mechanism doesn’t do that, in that context.

If we are allowed to functionally & technically refine about this desired ‘deleted file awareness and handling’ without merging or consolidating because I understand from you the challenges from that, the next step could be asking the questions:

  1. Currently, a backuped file is deleted from duplicati’s backup when no snapshot (dlist) references to this file exist and the compact stage is started in which unreferenced blocks are dropped. Is this correct?
  2. So one method for duplicati to detect a file deletion is to compare dlist files, correct?
  3. Another, in duplicati local DB, one can find in which snapshots a particular file is referenced, correct?
  4. So the database also has the information, when the last reference to a particular file is removed, correct?
  5. If the database has that information, then the software could be made aware of when it deletes the last reference in the database, correct?
  6. If true, could the software add a virtual reference xOR retention-tag to this particular file? So in stead of snapshot(s) exclusively referencing a file, “keeping it alive”, this retention-tag takes over and keeps the file in the backup for a minimum period by this tag, preventing it from being pruned too soon.

For ‘7D:1D,4W:1W,12M:1M’, the imagined default retention-tag would set 12 months starting from the application date of the tag, preventing the file from being pruned too soon.

So in @Jochem ‘s example:

He wrote about snapshot 8f being deleted after ~9 days instead of 12 months. In our imagined retention-tag system, the snapshot is still deleted, but the tag is assigned to the file so that the file is not deleted until the tag expires, which in this example is 12 months / 1 year after it was assigned.

In recovery, beyond snapshots, retention-tag-group could be pickable in order to navigate the tagged deleted files in a normal but limited tree view.


Conceptually, do you think this direction has merit?



To wrap up this other end, also for anyone looking for it:

Understood. At 2026-02-11, if you need to retain short-lived files in duplicati backup, do not apply custom backup retention policy (with time buckets), keep all backups (or only use the non-thinning polices like “keep last number of backups” or “keep last n years of backup”).