Why does "Deleting unwanted files ..." require decryption?


#1

I have my backups set to 2AM, the backups now fail because the final step is “Deleting unwanted files …”, which requires decryption. I have to enter my smartcard, but it times out as I’m asleep. Do I have any solutions?

I don’t think deleting backups should require decryption. Decryption should only be used for when I wish to restore from backup. What if my decryption method is kept off-site or requires n-of-m?


#2

The unwanted files might be in a volume (folder) containing other files that are still required.

So without decryption Duplicati would have 2 options:

  1. find all the files the volume contains on the disk, then check their hashes against the hashes in the backup, and then create a new volume and upload it before deleting the old volume.
  2. Don’t delete unwanted files and let them pile up until later

#1 would fail if you had changed any of the files on disk so they didn’t match the volume contents. So even if #1 was implemented it may still fail every time.

I don’t remember if there is an option for #2 but maybe we should add that option if it doesn’t exist. Just have the user remove them manually, or have a job periodically clean it up but skipping it on regular runs.


#3

@Pectojin has it right - because Duplicati stores file blocks in an archive file it has to decrypt that file to be able to remove the no-longer-needed blocks.

To get around this problem you can turn off the automatic deletion of those blocks by using:

--no-auto-compact (default false)
If a large number of small files are detected during a backup, or wasted space is found after deleting backups, the remote data will be compacted. Use this option to disable such automatic compacting and only compact when running the compact command.

Using that means Duplicati will still flag un-needed blocks as ready-to-be-deleted but it won’t actually DO The deletion step unless you manually run the compact command from the job “Command line” menu.


#4

Ah, there it was! I thought there maybe was an option I just couldn’t for the life of me remember what it was.


#5

Wouldn’t such an option simply have Duplicati only mark volumes for deletion when their “uselessness” reaches 100%?

Edit: it seems like --no-auto-compact (while fine) takes it a step too far for this use case. Per OP’s scenario, at least if it were me, I’d want something that DOES remove unneeded volumes, as long as they’re no longer needed at all.


#6

I guess that’s a middle ground.

I suppose I could go either way. If I wanted to avoid clean up (e.g. to reduce backup times) and then just clean up once a month, then --no-auto-compact would allow just that.


#7

I suppose OP’s use case is a pretty strong endorsement for this particular middle-ground implementation - i.e. someone who doesn’t want to be forced to do manual cleanup (or forced to have to remember to do manual cleanup more like) on some periodic basis, but still have no-decryption-required cleanups, and perfectly willing to accept the necessary compromise that there will be a slight overhead in storage space due to this.


#8

I suppose it could be argued that compacting (taking multiple sparse archives and re-compressing them into a single non-sparse one) is different than deleting a 100% unused archive.

So it’s possible the straight-up delete of an encrypted file even might actually work as discussed - I think testing or a question to @kenkendk might be needed to answer that one.

The actual benefit of such functionality would vary greatly be user. For example, if you have a bunch of video files that never change and decide to delete one then removal all (or most) of the related archive files should happen pretty quickly.

However, if you’re working on writing a book and it’s contents change daily, then the likelihood of older version archives being 100% unused (thus deletable) is pretty low unless the original contents of your book have changed enough that EVERY original block is not longer used.

Basically, I’m worried that such a setting would cause archives very infrequently be deleted (frequency doing DOWN as size goes up) and the resulting “flagged for deletion” records sitting in the database long term could slow down run times.