Apologies if this is a duplicate question, I thought I’d seen it before but couldn’t find it.
I have a backup for a Windows 10 system that was intended to sort of replace Crashplan. I set it to backup every few hours. At some point I realized that there were some files that get touched frequently (at least once a day), that were often the only change in a new backup version. Most of them were unimportant so I removed those from the backup.
I recently also purged those files from all the older backups. This has left me with many backup versions that have nothing in them. The actually backup target storage was cleaned up as intended. However, I kind of expected (hoped) Duplicati would also automatically remove such “empty” versions - at least as part of a compact - but it did not. Ok, that’s a design decision and I have no quarrel with it. Now I’m looking for a way to get Duplicati to find and remove those “empty” backup versions for me.
I know I can use “delete” but we’re talking hundreds of versions out of ~2000 total backup versions. Manually finding them and then manually deleting each one is a task I’d prefer not to do!
I could probably script doing “compares” between each version then deleting the “empty” ones but Duplicati is less likely to make a mistake than I am.
which means only their data is uploaded. Unchanged files refer back to previously uploaded data.
What happened to the unchanged files? If you purged only the rapid changers, the others remain.
Did you mean you think you have many backup versions where remaining files have no changes? upload-unchanged-backups is how one can actually request that. You get a dlist and DB clutter.
is already some clutter, but sometimes it’s what people want. If not, adjust the Retention settings.
For example, if thinning out backups as they age is interesting, the no-changes-here ones age out because everything ages out. CrashPlan has that, but handles deleted files specially (unlike here).
Smart backup retention and Custom backup retention are the options for thinning with age. Looking forward, the backup runs which find no changes will just not make versions, unless asked. Attempting to get to that point retroactively may be hard, and it’s questionable what it will gain you.
Compact would probably be the wrong place, as its job is repackaging dblock files that grow empty. The COMPACT command gets into this a bit, showing how one can tune the levels for when to run.
For one that’s probably worse than The COMPARE command, there’s the database and expert SQL. FilesetEntry table where a FilesetID has the same set of FileID rows as another is a duplicate.
I think. This is probably the sort of comparision that the compare command uses to version-compare.
I struggled a bit with the right terminology, and I’m probably making an assumption I shouldn’t somewhere.
I realized as I was typing this I can test this myself, but to continue explaining myself… The assumption I’m making is that if I take a backup, don’t change anything, then take another backup I was assuming that Duplicati would not bother taking the second backup because other than the timestamp at which it was taken, the content is exactly the same. I’m guessing this assumption is false.
But based on that assumption, I also assumed there would be a way to have Duplicati automatically delete any backup version that had no differences from the prior backup. I referred to this as “empty version” which was probably a bad choice. I get that logically it’s a full copy of all my data. However, logistically, there’s no difference between it and the prior backup so it’s kind of a waste for it to exist.
Having said all that, I can completely see that design decision too - too make a new version when asked, even if that version is identical to the prior version. NOT making the new version would probably confuse many people.
So I guess I’m left with either the “compare” scripting or based on the database. Thanks!
That’s I have always seen. Could it be that in your case something is changing in the metadata of some files ? Can you see if there is a new dblock file generated along the dlist file ? From memory there could be an option to force generation of a new version even if there is nothing new, but the Duplicati servers seem to be down at the moment, I can’t look at the online doc. Well, there is still Github. It’s not down.
This is the option:
–upload-unchanged-backups = false
Did you have set to true ?
This was mentioned earlier, but I’m not sure it was understood in the correct way.
Additional backups were probably made because rapidly changing files changed.
If no changes then no new backup version without upload-unchanged-backups.
This does NOT mean that if one manually works two backup versions to identical,
Duplicati will then notice by comparisons that such work was done, and do delete.
That was the seemingly bad assumption, both from your test and my look at code.
That’s the default and it does confuse people sometimes, but it keeps clutter down.
So you’re sure you don’t want to reduce the number of versions by thinning them?
That would probably make a much bigger reduction than only deleting duplicates.
Yes, this is the case. There were real changes so Duplicati did, correctly, make all of the backup versions.
Yup, this does seem to be the crux of it. I was hoping it would do the “notice by comparisons …” piece. Although now that I think about it, there’s no simply way for Duplicati to do that. It doesn’t see any of those backups as “empty.” They’re all full of pointers. It would really have to do the same work as a “compare” to find and delete those “no changes [now, do to a purge]” backups.
You’re probably right that Backup Retention is the best option. While it’s only an approximation of what I want to do, it’s much simpler.
Also, interesting to get confirmation that an attempted backup with no change won’t happen. Personally I think that’s the right behavior (not that anyone asked )
Two things to watch out for are that a file that exists briefly in backups that get deleted is gone, and that files that are not in even in longest time frame are gone, meaning notice the 1 year maximum in Smart where you only have that long to recover a deleted file. If you need longer (or forever), make a Custom.
You can set up Custom pretty much as you like. You can start conservatively if you prefer, as one can change to more aggressive thinning later, but what’s gone is gone. I backed off a bit, and am running:
I’m not sure of your objective. The manual deletion of identical versions would remove dlist files but rather little data in dblock files (which is where most of the space is). Thinning would reduce storage, because you’re giving up ability to get very fine-time-grained different views of often-changed old files.
I’m actually only interested in reducing the number of backup versions at this point. I already freed up a fair amount of space with the purge, and that wasn’t really my goal anyway. The number of backup versions impacts the “find” command and that’s where I really felt the pain that sent me down this road. It’s not a huge deal; it may just be some form of OCD at this point.
I’ve used the custom backup retention before but for this use I wanted to go back as far as I could (again, like Crashplan - before they reduced their retention!).
Thanks for the suggestions and help all! I’m all good (for now).