Should test / verify commands & --backup-test-samples support percentage parameter?

JonMikelV · June 2, 2018, 5:36pm

I’ve been playing with the test / verify commands and it occurred to me that if you didn’t want to test ALL files, you’re basically choosing some random number that may or may not have anything to do with the actual number of dblock files in our destination.

Since not everybody would want to look at their destination file count and count only dblock files (or divide total files by 3 for a close estimate) would it make sense for the commandline parameter to support a percentage in addition to ALL or a number?

If I passed in 25% Duplicati would convert that 25% to whatever that is when applied to the actual (or at least database recorded) dblock count.

This would allow something like running 20% tests each week night thus ensuring every remote file has been tested at least once each week without the need to constantly increase a hard coded number as backup size increases.

Just for fun I’m going to turn this into a poll - it likely won’t have any bearing on whether or not it gets implemented, but I’m kind of curious.

Do you think the `test` / `verify` commands should support a percentage based parameter?

Yes
No

0 voters

sylerner · June 22, 2018, 10:53pm

Checking only 1 sample per backup when there are thousands of files doesn’t make much of a dent, especially if only backing up once per day.

While backup-test-samples provides the option to test more, a user often will have no idea how many files they have in their backup.

From a user point of view, I would think the ability to specify what percentage of your backup you want checked after each backup would be a friendlier method than having to specify an absolute number of samples.

On a similar topic, should the default check be something like 1% of the backup set?

Assuming that duplicati keeps track of who’s already been checked (I believe I’ve read that it does), a rolling 1% would - assuming there isn’t too much change each day - in a reasonable amount of time have checked 100% of the backup.

JonMikelV · June 23, 2018, 2:53am

I’ve moved your extremely insightful post into an existing #features topic as I feel they relate well to each other.

Another consideration is a “bandwidth” option so users with usage limits can not have to worry about overage charges as backups age & get bigger.

Ideally, a calculation somewhere of approx how many backup runs it would take to test everything might be an eye-opener.

sylerner · June 26, 2018, 2:43am

Created issue on GitHub and posted $25 bounty. (How do I label an issue on GitHub?)

JonMikelV · June 26, 2018, 4:11am

Awesome, thanks!

For me there is a Labels item to the right of the post, but that may not be available to all users.

I’ve gone ahead and tagged it with Bounty as well as added a link to the bounty itself in case others want to donate to make it happen.

JonMikelV · July 12, 2018, 7:35pm

@sylerner, I’m curious what you think about setting test count based on actual backup upload count. For example, if I have a job run that uploads 5 filesets, should that run test 5 (or even better 6) filesets?

sylerner · July 13, 2018, 4:42pm

The problem here is that, unless done from the very beginning, there would be little or no progress on verifying the backlog of unchecked files.

If done from the beginning, what you have in essence is a full verification of each backup immediately after it is made. While this is good in and of itself, it doesn’t make progress in the search for files that have become corrupted.

An example of finding a file that’s become corrupted - while still using the latest Beta - I got the following error after a backup that was set to backup-test-samples 500, full-block-verification true and full-remote-verification true.

Operation Get with file duplicati-b71b4cb04d4854190aaa4a00353fa67c3.dblock.zip.aes attempt 5 of 5 failed with message: Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content
System.Security.Cryptography.CryptographicException: Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content —> SharpAESCrypt.SharpAESCrypt+HashMismatchException: Message has been altered, do not trust content
at SharpAESCrypt.SharpAESCrypt.Read (System.Byte buffer, System.Int32 offset, System.Int32 count) [0x0046f] in <5e494c161b2e4b968957d9a34bd81877>:0
at Duplicati.Library.Utility.Utility.CopyStream (System.IO.Stream source, System.IO.Stream target, System.Boolean tryRewindSource, System.Byte buf) [0x00035] in <0828ce86ffa94a4bbbb2da4331bcc67b>:0
at Duplicati.Library.Encryption.EncryptionBase.Decrypt (System.IO.Stream input, System.IO.Stream output) [0x00008] in <404991515997453aa94eeb36e72aeeb7>:0
— End of inner exception stack trace —
at Duplicati.Library.Main.AsyncDownloader+AsyncDownloaderEnumerator+AsyncDownloadedFile.get_TempFile () [0x00008] in :0
at Duplicati.Library.Main.Operation.TestHandler.DoRun (System.Int64 samples, Duplicati.Library.Main.Database.LocalTestDatabase db, Duplicati.Library.Main.BackendManager backend) [0x000b0] in :0

Not sure at this point what the significance of this error is, but it isn’t confidence building.

JonMikelV · July 13, 2018, 5:42pm

True, if using something like “current upload + 1” the progress would potentially be small - but still not likely negative which is what the current default of 1 gives us. But you’re right - if done from the start it would indeed simply verify what was just uploaded.

According to @kenkendk from another post…

That’s a tough one. I suspect some backup software falsely inspires confidence by doing no (or minimal) testing - thus never finding errors leaving the user potentially stuck when things can’t be restored.

For me, I feel more confident because of the fact that Duplicati actually found something. Sure, something went wrong and I have to deal with it - but I get to do that on my time instead of after an emergency when I may realize I can’t restore things I assumed I could.

But I agree with the unclear significance of the error. Even with kenkendk’s description it’s still not really clear what a user should do about it.

sylerner · July 16, 2018, 4:01am

When I did a full verify, I had two out of several thousand files throw this error.

FWIW: I am using SHA512 for this backup set.

I’m afraid I’m not yet that familiar with duplicati’s file naming, etc. What files and where should I be looking at to verify?

Thanks!

JonMikelV · July 16, 2018, 4:31am

Duplicati generally stores 3 types of files at the destination:

dlist (list of what files were included in a specific backup, usually well under 1 MB in size)
dblocks (actual backed up data blocks, “Upload volume size” in size)
dindex (index of block hashes to dblock files, usually well under 1 MB in size)

The local database Duplicati uses can be rebuilt almost entirely from the contents of dlist and dindex files (both usually well under 1MB). Similarly, a broken / missing dlist or dindex file on the destination can be recreated from the local database OR downloading and re-processing the raw dblock files.

So if you have an error with a dindex or dlist file, running a “database repair” really just fixes the file from the local database. But if you have an issue with a dblock file It needs to be addressed more aggressively as now we’re talking actual backed up data at risk.

So if you look at the two (out of thousands) of files that had errors, did they end with dindex, dlist, or dblock followed by .zip or .zip.aes?

sylerner · July 19, 2018, 2:32am

The two files that failed both end in .dblock.zip.aes - meaning that I’ve “lost” data.

Since the “Failed to decrypt data (invalid passphrase?): Message has been altered, do not trust content” error showed up both as part of a post-backup verification as well as during a command-line test of the entire archive (consistently the same files being corrupted in multiple checks), I don’t think it’s an I/O error.

The backup is via SFTP to a NAS, so file rot in the backup archive is highly unlikely.

While it’s unlikely to be the cause, particularly since thousands of files checked out OK (3111 files total, 1553 dblock files, 5 dlist files and 1553 dindex files), I am using SHA512 instead of the default SHA256.

I don’t know where to go from here. Is it possible to find out what files are stored in the bad dblock files? If so, in order to compare with the original files, is it possible to force a restore despite the error?

Since the death of Crashplan Home, duplicati seems to be the only viable alternative out there for Linux. But given the critical nature of my backups (whose aren’t ), I can’t deploy until I’ve confirmed reliable operation. (And I’m not a freeloader. I’ve already contributed and will contribute more if/when I deploy this to more machines.)

Any suggestions of what to do next will be greatly appreciated.

JonMikelV · July 21, 2018, 6:09am

Yes, with a caveat.

This How-To guide should help with listing (and purging) the files that are “broken” by the apparently corrupt dblock files:

The caveat is that this process detects what files are broken and lists the contents of those files. I don’t know that there is a way to list affected files for specific dblocks (@kenkendk or @Pectojin might be able to answer that).

If you try list-broken-files please let me know if it works with the “failed to decrypt” related dblocks.

sylerner · July 25, 2018, 9:12pm

I just ran it with the following result:

No broken filesets found in database, checking for missing remote files
Listing remote folder …
Skipping operation because no files were found to be missing, and no filesets were recorded as broken.
Return code: 0

I had to remove all of my exclude arguments, but I left in full-block-verification and full-remote-verification both being set to true.

So I still can’t find out which files have corrupted backups - is there another way to figure out the link between a bad dblock file and the original files backed up within it?

While the corruption of these files is unnerving, I’d be less nervous if I could - assuming I still had them - force the source files affected to back up again. And obviously, the first step for this is knowing which files are involved.

As always, your help is greatly appreciated.

JonMikelV · July 25, 2018, 9:16pm

So it sounds like the list-broken-files command assumes the only thing that can break a file is for the dblock to be missing. @kenkendk - are “failed to decrypt” messages stored anywhere such that list-broken-files could know about them?

I know this isn’t a great user experience, but unless @Pectojin has a better idea I’d suggest you MOVE the two bad dblock files to another folder than try list-broken-files again. This basically forces the files to be “missing” which is what is apparently needed for that command.

I’ll let you know if I think of anything better, but at the moment that’s all I’ve got.

Pectojin · July 25, 2018, 10:08pm

This sounds right from a performance perspective… Else you’d have to download your entire backup to decrypt, validate, and identify broken files.

However, if “failed to decrypt” messages are not stored anywhere then it kinda sounds like there isn’t a correct way of dealing with corrupted archives.

I’m afraid moving them is all I got too.

ts678 · July 29, 2018, 2:39pm

The AFFECTED command might help figure out what the bad dblock files mean in terms of the source files.

Docs » Manual » Disaster Recovery gives an example of its use, and maybe some other useful information.

JonMikelV · July 30, 2018, 3:08am

Thanks! I keep forgetting that affected lets you specific a specific destination file to see what source backups would be affected if that file were gone.

So in @sylerner’s case affected could be used with the reported failed file(s) to get around the issue that list-broken-files doesn’t count non-decryptable files as broken. Once the list is “acknowledged” the dblock file itself could be moved (or deleted) which should allow purge-broken-files to work as expected (though --no-backend-verification=true may be needed to avoid a complaint about missing files).

sylerner · August 1, 2018, 4:43pm

This is useful in order to understand what original files are affected.

The bad news is that I am having progressive “rot” of my backup. Every day or two I’m picking up an additional dblock file or two with the same “Failed to decrypt data” error. Since I had initially done a few full validations of my backup with no problems, this is not a problem with the initial backup but something that is developing over time.

I will attempt to gather info on what dblock files failed when and what files are affected and open a new thread to discuss that problem and to try and figure out what is going wrong.

This discovery adds to the argument that test, verify and backup-test-samples should be percentage with a default large enough that a full test will be completed after a not-too-large number of backups.

JonMikelV · August 1, 2018, 5:10pm

I look forward to your results - it’s good that you started with a successful full test as “proof” that the files started out in good shape.

If you find it’s older (already tested) files that are “rotting”, that would suggest rot on the destination source or during transfer to Duplicati (where the actual test is performed). Depending on your destination, there are some python (I think) scripts that can do testing AT the destination which could take “transfer rot” out of the equation.

Oh, and I should mention that Duplicati tries VERY hard to not re-use file names for backup (dblock, dlist, and dindex) files - so it’s extremely unlikely that a file that used to be good but is now bad somehow got corrupted by Duplicati itself.

Just in case anybody cares, here’s how Duplicati handles various file “updates” - most caused by compacting (usually due to retention policy version thinning):

dlist files (those duplicati-YYYYMMDDTHHMMSSZ.dlist.zip.* files) may need to be updated. When this happens, the actual process ends up creating a new file with a similar name where usually the the SS (seconds) part of the name is incremented. Once the new file is verified to have been uploaded, the old one will be flagged for deletion.
dindex file (those small duplicati-*.dindex.zip.* files) updates are handled differently in that a new file name with essentially no similarity to the current one is created. Again, one the new upload is confirmed then the old file is flagged for deletion.
dblock file (the large duplicati-*.dblock.zip.* files) updates are handled the same as dindex files where an entirely new file name is generated (OK - it still starts with duplicati- and ends with .dblock.zip or .dblock.zip.aes)

Note that all the above assumes default settings. Obviously if you’re not using ZIP compression or AES encryption the files won’t end with .zip or .aes. Similarly, if you are using a backup prefix they won’t start with “duplicati-”.

sylerner · August 1, 2018, 6:58pm

FWIW - File rot at the destination is unlikely - I’m using sftp to a RAID 6 NAS.

Should test / verify commands & --backup-test-samples support percentage parameter?

Do you think the test / verify commands should support a percentage based parameter?

Do you think the `test` / `verify` commands should support a percentage based parameter?