Checking only 1 sample per backup when there are thousands of files doesn’t make much of a dent, especially if only backing up once per day.
While backup-test-samples provides the option to test more, a user often will have no idea how many files they have in their backup.
From a user point of view, I would think the ability to specify what percentage of your backup you want checked after each backup would be a friendlier method than having to specify an absolute number of samples.
On a similar topic, should the default check be something like 1% of the backup set?
Assuming that duplicati keeps track of who’s already been checked (I believe I’ve read that it does), a rolling 1% would - assuming there isn’t too much change each day - in a reasonable amount of time have checked 100% of the backup.
@sylerner, I’m curious what you think about setting test count based on actual backup upload count. For example, if I have a job run that uploads 5 filesets, should that run test 5 (or even better 6) filesets?
The problem here is that, unless done from the very beginning, there would be little or no progress on verifying the backlog of unchecked files.
If done from the beginning, what you have in essence is a full verification of each backup immediately after it is made. While this is good in and of itself, it doesn’t make progress in the search for files that have become corrupted.
An example of finding a file that’s become corrupted - while still using the latest Beta - I got the following error after a backup that was set to backup-test-samples 500, full-block-verification true and full-remote-verification true.
Operation Get with file duplicati-b71b4cb04d4854190aaa4a00353fa67c3.dblock.zip.aes attempt 5 of 5 failed with message: Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content
System.Security.Cryptography.CryptographicException: Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content —> SharpAESCrypt.SharpAESCrypt+HashMismatchException: Message has been altered, do not trust content
at SharpAESCrypt.SharpAESCrypt.Read (System.Byte buffer, System.Int32 offset, System.Int32 count) [0x0046f] in <5e494c161b2e4b968957d9a34bd81877>:0
at Duplicati.Library.Utility.Utility.CopyStream (System.IO.Stream source, System.IO.Stream target, System.Boolean tryRewindSource, System.Byte buf) [0x00035] in <0828ce86ffa94a4bbbb2da4331bcc67b>:0
at Duplicati.Library.Encryption.EncryptionBase.Decrypt (System.IO.Stream input, System.IO.Stream output) [0x00008] in <404991515997453aa94eeb36e72aeeb7>:0
— End of inner exception stack trace —
at Duplicati.Library.Main.AsyncDownloader+AsyncDownloaderEnumerator+AsyncDownloadedFile.get_TempFile () [0x00008] in :0
at Duplicati.Library.Main.Operation.TestHandler.DoRun (System.Int64 samples, Duplicati.Library.Main.Database.LocalTestDatabase db, Duplicati.Library.Main.BackendManager backend) [0x000b0] in :0
Not sure at this point what the significance of this error is, but it isn’t confidence building.
True, if using something like “current upload + 1” the progress would potentially be small - but still not likely negative which is what the current default of 1 gives us. But you’re right - if done from the start it would indeed simply verify what was just uploaded.
That’s a tough one. I suspect some backup software falsely inspires confidence by doing no (or minimal) testing - thus never finding errors leaving the user potentially stuck when things can’t be restored.
For me, I feel more confident because of the fact that Duplicati actually found something. Sure, something went wrong and I have to deal with it - but I get to do that on my time instead of after an emergency when I may realize I can’t restore things I assumed I could.
But I agree with the unclear significance of the error. Even with kenkendk’s description it’s still not really clear what a user should do about it.
Duplicati generally stores 3 types of files at the destination:
dlist (list of what files were included in a specific backup, usually well under 1 MB in size)
dblocks (actual backed up data blocks, “Upload volume size” in size)
dindex (index of block hashes to dblock files, usually well under 1 MB in size)
The local database Duplicati uses can be rebuilt almost entirely from the contents of dlist and dindex files (both usually well under 1MB). Similarly, a broken / missing dlist or dindex file on the destination can be recreated from the local database OR downloading and re-processing the raw dblock files.
So if you have an error with a dindex or dlist file, running a “database repair” really just fixes the file from the local database. But if you have an issue with a dblock file It needs to be addressed more aggressively as now we’re talking actual backed up data at risk.
So if you look at the two (out of thousands) of files that had errors, did they end with dindex, dlist, or dblock followed by .zip or .zip.aes?
The two files that failed both end in .dblock.zip.aes - meaning that I’ve “lost” data.
Since the “Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content” error showed up both as part of a post-backup verification as well as during a command-line test of the entire archive (consistently the same files being corrupted in multiple checks), I don’t think it’s an I/O error.
The backup is via SFTP to a NAS, so file rot in the backup archive is highly unlikely.
While it’s unlikely to be the cause, particularly since thousands of files checked out OK (3111 files total, 1553 dblock files, 5 dlist files and 1553 dindex files), I am using SHA512 instead of the default SHA256.
I don’t know where to go from here. Is it possible to find out what files are stored in the bad dblock files? If so, in order to compare with the original files, is it possible to force a restore despite the error?
Since the death of Crashplan Home, duplicati seems to be the only viable alternative out there for Linux. But given the critical nature of my backups (whose aren’t ), I can’t deploy until I’ve confirmed reliable operation. (And I’m not a freeloader. I’ve already contributed and will contribute more if/when I deploy this to more machines.)
Any suggestions of what to do next will be greatly appreciated.
This #howto guide should help with listing (and purging) the files that are “broken” by the apparently corrupt dblock files:
The caveat is that this process detects what files are broken and lists the contents of those files. I don’t know that there is a way to list affected files for specific dblocks (@kenkendk or @Pectojin might be able to answer that).
If you try list-broken-files please let me know if it works with the “failed to decrypt” related dblocks.
No broken filesets found in database, checking for missing remote files
Listing remote folder …
Skipping operation because no files were found to be missing, and no filesets were recorded as broken.
Return code: 0
I had to remove all of my exclude arguments, but I left in full-block-verification and full-remote-verification both being set to true.
So I still can’t find out which files have corrupted backups - is there another way to figure out the link between a bad dblock file and the original files backed up within it?
While the corruption of these files is unnerving, I’d be less nervous if I could - assuming I still had them - force the source files affected to back up again. And obviously, the first step for this is knowing which files are involved.
So it sounds like the list-broken-files command assumes the only thing that can break a file is for the dblock to be missing. @kenkendk - are “failed to decrypt” messages stored anywhere such that list-broken-files could know about them?
I know this isn’t a great user experience, but unless @Pectojin has a better idea I’d suggest you MOVE the two bad dblock files to another folder than try list-broken-files again. This basically forces the files to be “missing” which is what is apparently needed for that command.
I’ll let you know if I think of anything better, but at the moment that’s all I’ve got.
Thanks! I keep forgetting that affected lets you specific a specific destination file to see what source backups would be affected if that file were gone.
So in @sylerner’s case affected could be used with the reported failed file(s) to get around the issue that list-broken-files doesn’t count non-decryptable files as broken. Once the list is “acknowledged” the dblock file itself could be moved (or deleted) which should allow purge-broken-files to work as expected (though --no-backend-verification=true may be needed to avoid a complaint about missing files).
This is useful in order to understand what original files are affected.
The bad news is that I am having progressive “rot” of my backup. Every day or two I’m picking up an additional dblock file or two with the same “Failed to decrypt data” error. Since I had initially done a few full validations of my backup with no problems, this is not a problem with the initial backup but something that is developing over time.
I will attempt to gather info on what dblock files failed when and what files are affected and open a new thread to discuss that problem and to try and figure out what is going wrong.
This discovery adds to the argument that test, verify and backup-test-samples should be percentage with a default large enough that a full test will be completed after a not-too-large number of backups.
I look forward to your results - it’s good that you started with a successful full test as “proof” that the files started out in good shape.
If you find it’s older (already tested) files that are “rotting”, that would suggest rot on the destination source or during transfer to Duplicati (where the actual test is performed). Depending on your destination, there are some python (I think) scripts that can do testing AT the destination which could take “transfer rot” out of the equation.
Oh, and I should mention that Duplicati tries VERY hard to not re-use file names for backup (dblock, dlist, and dindex) files - so it’s extremely unlikely that a file that used to be good but is now bad somehow got corrupted by Duplicati itself.
Just in case anybody cares, here’s how Duplicati handles various file “updates” - most caused by compacting (usually due to retention policy version thinning):
dlist files (those duplicati-YYYYMMDDTHHMMSSZ.dlist.zip.* files) may need to be updated. When this happens, the actual process ends up creating a new file with a similar name where usually the the SS (seconds) part of the name is incremented. Once the new file is verified to have been uploaded, the old one will be flagged for deletion.
dindex file (those small duplicati-*.dindex.zip.* files) updates are handled differently in that a new file name with essentially no similarity to the current one is created. Again, one the new upload is confirmed then the old file is flagged for deletion.
dblock file (the large duplicati-*.dblock.zip.* files) updates are handled the same as dindex files where an entirely new file name is generated (OK - it still starts with duplicati- and ends with .dblock.zip or .dblock.zip.aes)
Note that all the above assumes default settings. Obviously if you’re not using ZIP compression or AES encryption the files won’t end with .zip or .aes. Similarly, if you are using a backup prefix they won’t start with “duplicati-”.