This is a long one.
TL;DR Backups are failing due to compact operation failing, presumably due to extra encryption blocks, and all attempts to repair fail.
I have a large backup I run daily to capture all my active VMs just for belt-and-suspenders safety in case something goes really wrong, which is to say, if this backup got corrupted, it wouldn’t be the end of the world. But, I’ve found issues that raise concerns for the safety of other backups I run where corruption could potentially be a very big issue. I haven’t yet run the restore (and/or recovery) process against all my backups (in progress) but I’m seeing similar problems in terms of hash errors and also some decryption errors. I will be confirming that the compact operation will run successfully against each of my backups.
So, to the details.
The flag was raised when my VM backups started failing, not because of the backup itself but because the compact operation is encountering remote files with AES errors: many files with numerous, sometimes hundreds, of ‘Extra’ blocks. Being around 450GB, it’s been a while since the automatic compact process was run (it runs at 25% wasted space, right?) and that would cover at least a month or 2 of backups and multiple Duplicati releases. On the plus side, this has forced me to review Duplicati’s test, restore and recover processes and I’ve learned a lot, established a repeatable process and, despite the following, gained confidence in the safety of my data.
Running a ‘compact’ operation manually fails with “SharpCompress.Compressors.Deflate.ZlibException: Bad state (invalid bit length repeat)” errors. This does not occur within a Started/Completed log block which logs the file being operated on so it’s not clear which file is causing the issue. I guess I could run the operation with ‘Profiling’ output instead of my default ‘Verbose’.
Running the ‘test’ operation with ‘all’ and ‘–full-remote-verification’ parameters finds numerous files with ‘extra’ errors, eg
duplicati-if156b20ed6294ca489683ef9c797c9b7.dindex.zip.aes: 2 errors
that I assume to be the root cause of the ‘compact’ operation failing. I’ve read elsewhere that these can be ignored, despite being described as a ‘should never happen’ condition, the data is still present and not corrupted and the successful completion of the following restore would confirm this to be correct. Note that the 2 extra errors listed here is only for illustration; some files have hundreds of these errors. I’ve run the ‘list-broken-files’ operation which states “Found no broken filesets, but 1 missing remote files. Run purge-broken-files.”; the ‘purge-broken-files’ operation states “Marked 1 remote files for deletion” and “Found no broken filesets, but 1 missing remote files. Purging from database.” but that process doesn’t appear to actually do anything as repeating the list/purge cycle yields the same result. So, how do I clear these errors so the compact operation can complete?
I’ve tested the restore process and, although it took 18hrs to restore 455GB (moving the temp folder from an SSD slowed things down a bit too), it did apparently restore successfully. I think the key here is the “Found 1 missing volumes; attempting to replace blocks from existing volumes” action which uses blocks from other versions (I’m assuming that’s the case).
I also tested the full recover process (download/index/list/restore) on my VM backup via Duplicati.CommandLine.RecoveryTool but that didn’t go well. The download phase recorded multiple “Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content” errors and the restore phase recorded monstrous numbers of “Failed to read hash/Unable to locate block with hash” errors when rebuilding the files: many didn’t survive this process, being absent from the restore location. Other backups I ran the recover process on didn’t record errors per se but did have “Restored file hash mismatch - restoring file in damaged condition” messages and again leaving the file absent from the restore location.
The recovery tool also raises “Unable to locate block with hash: …” errors when the affected file is zero bytes, so that’s an easy ignore especially considering the hash is the same for each error, but still a pain to verify/validate when you’re looking at a wall of text.
Having read the online docs, I’m aware that the recovery tool has a recompress action with --reupload and --reencrypt options however I haven’t yet tried that because:
a) given the issues listed above, low confidence of it actually succeeding,
b) I really want to avoid deleting and re-uploading around 400GB to my rcloned offsite copy of duplicati data,
c) In the end, I’d be better off just blowing away the current backend data and running a new backup anyway
So, I’d like to aim to have say a monthly scheduled process that runs the ‘test all’ and/or ‘restore’ operations against all my backups to non-original location (and, for extra points, maybe even a less often ‘recovery’) but I’d really like to know how to work around or resolve these hash and encryption issues when they arise.
I run Duplicati Canary, but don’t necessarily update to a new release straight away, so from the timestamps on the %LOCALAPPDATA%\Duplicati\updates folders:
* v18.104.22.168 since 13/Feb
* v22.214.171.124 since Nov/20
* v126.96.36.199 since Jul/20
All my backups run via command line scripts (I just find the management easier) with ‘Verbose’ console logging output captured.
My local backup data lives on a FreeNAS box running mirrored ZFS pools, scrubbed weekly, with ECC RAM so bitrot is not a thing as far as storage goes. My main PC does not run ECC RAM so it’s not outside the realm of possibility that something local could be causing corrupted encrypted files. Running a full MemTest pass is on my TODO list but I do run parchive (par2) against large static files and archives and I’ve yet to see one require repair.
Thanks for reading