Occasional errors "HashMismatchException"

lately I get occasional errors with one of my backup jobs. It just backs up my local d-drive to the NAS next to me, every 30 mins. So most of the time it finishes within seconds. But about once a day I get a pop-up “[Error-Duplicati.Library.Main.Operation.TestHandler-FailedToProcessFile]: Failed to process file duplicati-id558a946141545b6861b981e1c49d559.dindex.zip.aes”
I looked into the log files I save and found a possible reason in

[Retry-Duplicati.Library.Main.BackendManager-RetryGet]: Operation Get with file duplicati-ia087833b572f4071beb1921a7e67c95d.dindex.zip.aes attempt 5 of 5 failed with message: Fehlerhafter Hash-Wert für Datei “C:\Users<myuser>\AppData\Local\Temp\dup-0ac9daf2-c11e-4de9-902e-f6975c99ff9d”, gespeicherter Hash: M2AB7Gxx+7i0ZAHkvy9NCMM6OIoYqwsh4Hp+bodx17s=, aktueller Hash 9Job3cosun/UY4pZjlx6HOrWS/8BnqO+ckV2T6OuTs0=
Duplicati.Library.Main.BackendManager+HashMismatchException: Fehlerhafter Hash-Wert für Datei “C:\Users<myuser>\AppData\Local\Temp\dup-0ac9daf2-c11e-4de9-902e-f6975c99ff9d”, gespeicherter Hash: M2AB7Gxx+7i0ZAHkvy9NCMM6OIoYqwsh4Hp+bodx17s=, aktueller Hash 9Job3cosun/UY4pZjlx6HOrWS/8BnqO+ckV2T6OuTs0=

So it complains about a mismatch between a stored and an actual hash value.

The odd thing is that there are no such files in C:\Users<myuser>\AppData\Local\Temp - at least not after the job has completed.
If I manually trigger the job again, it completes with no errors.

Should I worry?

Hello

by default after each backup Duplicati verifies some files picked randomly. That’s why you get downloads on a backup. Index files are part of a backup and they are zipped/encrypted like other files. When Duplicati reads these files, it put them in a temp directory, decrypt and unzip them, and verify the hash (signature) stored in its local database against the file downloaded from the backend.
What happens in your case is that Duplicati has problems with this check. There is a file that it can’t verify. When you are doing a backup again, this file is not necessarily tested again, since data is checked randomly after each backup.
The reason is probably not on the backend, because if the backend had corrupted the data, Duplicati could not have uncrypted it.

If the problem concerns only this particular file, the reason is probably that a hardware error has caused an error while creating it. If it is the case, you can try to delete it and let Duplicati recreate it (possibly using repair).

If the files names with problems are changing, you may have a defect in your computer (probably memory). No backup can fully protect against defective hardware, unfortunately, not when it’s a corrupting data problem.

You may consider in any case adding an additional backup in the cloud, that’s why Duplicati exists. Two backups on 2 different media are 2 times better than a single backup.

The hash is on the encrypted file. My guess is that its decryption and hashing are tested in parallel.
This is kind of worrisome because I “think” there’s a length check on files. Did file contents go bad?

One way to test all files is with The TEST command in GUI Commandline, asking it to test all files.
Verifying backend files after backup is a small sample, typically 1 set of 3 files, so it can miss things.
backup-test-samples and backup-test-percentage can test more, but reliable storage is important…

Is this SMB? That sometimes causes trouble for unknown reasons. One issue is it’s pretty complex.
If it’s now just file-like access from Windows, the good news is there’s a file-oriented verification tool
utility-scripts\DuplicatiVerify.ps1 you can use, after a backup with upload-verification-file.

Thanks for the reference to cmd-line options. I ran a “test all” of which I only post the last lines:

duplicati-i79f96c38158d4f2794b774e8b12a3011.dindex.zip.aes: 935 errors
Extra: +185iT3IzImG+f99k0BSBZU5HhEwlj48joIvwlNr7Wc=
Extra: +2RnZG3E+Xz6AVpXKEUTPwg5F5iniR+JU9dMRlMXipQ=
Extra: +3c/YI9uZLC1H6qIPDD7iHJr96rFnRjUAgdSceOHksY=
Extra: +E8/xP1n55qrSTlgy2s2GXppixatCaLxwVVrNs4pXmM=
Extra: +ES7p/j30TS0cecZPOuZf4Czz7TKXjF0EZVaVbB0uQY=
Extra: +IAA/Dl0F80Aujb1rLxDBFPTAEVnb/G9xJb9HRmW+ps=
Extra: +ZLFvguzB5nCW2xkhVQDrq/jIZJdYK2f1IL1xb1c6/4=
Extra: +fC8K6rmyk8qXCmgXx+AAz6V+UMUA48upVGfDDOUjwM=
Extra: +fwTcdfwJMC3kTpWmjy9UrrxnAprdlcl3iEXCRQJ9p4=
Extra: +u9I/BRfZ+zlRzd0FPe1Utd52av8YV4DdIYl4dzrFr8=
… and 925 more
duplicati-i3b4aedc471504ec797bbd8c9591e7d9e.dindex.zip.aes: 382 errors
Extra: +AQ7LHPXCJwJpTP7pqDMjiIVoQ7eHtts8HByJkN+rwo=
Extra: +CqUAnmDHfgM9ES+mfrsdbb8sJ4KaLbrnlWYdVkXYSA=
Extra: +LUFmmsUgoEFkxGCgJBAz+4eei/uqSk2r3T3+ukJLiA=
Extra: +OsAFn6tiTQeCFt9CbJuqfCj4FRop/GZZuBMw7TkURM=
Extra: +QJN3mW2L4DHQ9kQldzMjX20h4bwiQGq6CXGsqM4chM=
Extra: +lBDnKctn6l6/W+WAgsmOtpFTh+jCE6T8v2Kyv9v7vQ=
Extra: +pMioF2tJ1PKRPvYtXZbLTklsXWlPlVbXdInbIzxpyc=
Extra: +pgjcu7hkhXJXeIQH4Pbs/xPzV82HMOOGPGlygZSfrU=
Extra: +uCKShlJU9oy7WCh1qbUACkI1/Oe8HQLTR0yLQuI+cw=
Extra: +w3h8bB3y6lB782NXFjupzXo23WUxVPvYAK+SdXyR4U=
… and 372 more
duplicati-ie562e530fd0547d99d4524602b1d8628.dindex.zip.aes: 590 errors
Extra: +/r9RbIhLvK7yd5HJmTxaCCcAv8YOR/ScE3totlmnmY=
Extra: +9qiCn2B3OSOMoCH4CHNpNp1hm/umDpxpdx+XnJOiYQ=
Extra: +H+BDaZm0pPneVYh9/C0KPYhebifshbRHbvzna4zKAM=
Extra: +JZiJcnT9pKdHsPOySsMUgIiiNaP1Jj3CX7x3XXzH8E=
Extra: +OETNK9M8a37ByDsskEkHOSSSYiDZDbG2tHQhnpP6eA=
Extra: +S0xM0qaB089J72YsubZsL9gkP+3WLvdTQE2D2agkqk=
Extra: +gJvAO8Lkj55uRfAGMwcsJniNO4n0V+f/oiVqIee5r8=
Extra: +p1R1A/NispgBF5tmsBO5AM0UITfZ2+w9YH7Ts7Hgx4=
Extra: +pAGqTKX1gPeJY+xEbhnhuGM/NWg2UAo3LP7S7+uqPM=
Extra: /Bcx1LW1BAeGK2+NN/My+B6A4/dJU6bX+p8r3oPCh84=
… and 580 more
Return code: 3

so in total it reported 14 errors in index files
After running a backup with -upload-verification-file I guess I need handholding which parameters I should provide to the DuplicatiVerify script. First thing it asks me is “FileOrDir:” – maybe more?

Connection to my Synology NAS is via SMB, I think.
The odd thing is that I now see the very same kind of error when runninng a similarly configured backup to the same share (different folder) from a laptop (also Win10). This also used to run without errors in the past.

Seemingly full-remote-verification was used too, which wasn’t really necessary just to do file test.
If you really want to see all the output (e.g. to make sure all is “Extra”), I think full-result does that.

Is original post’s duplicati-id558a946141545b6861b981e1c49d559.dindex.zip.aes readable or not?
Actually, is anything unreadable like that (ignoring Extra, which proves that the file could be read)?

Assuming you’ve told Duplicati that destination is a Local folder or drive, tell the script the same path.
There should be a duplicati-verification.json there now because you turned on upload-verification-file.

As of today all runs of this job have ended with errors, most on different index files The last one reported

“2023-03-23 15:38:05 +01 - [Error-Duplicati.Library.Main.Operation.TestHandler-FailedToProcessFile]: Failed to process file duplicati-ied188a07f2e7453c842ee2ff96713fcb.dindex.zip.aes”

I looked up this file in the target: it exists, has a size of 42,6kB and looks “reasonably scrambeld” in a texteditor.

The DuplicatiVerify script only provided errors:



FWIW I still have the option “upload-verification-file” enabled, so every run shold upload a (new?) hash file. I also see such a file with a time stamp of the latest run in my target folder.

BTW all my other jobs that backup into a cloud drive are still running without errors.

If these are all TestHandler errors, you should have a job log which would explain exact failure better.

I’m not sure what that’s saying. It’s encrypted, so should have very little readable beyond initial bytes.
Is your texteditor smart enough to look for balanced square braces? Your duplicati-verification.json is supposed to start with one, e.g. [{"ID":, and the matching right square brace should be at file’s end.

If you’re using Local folder or drive destination type and doing multiple backups, the verification
file can have leftover data at the end if it shrinks. If I dummy up such data, I get result similar to yours.

Actually, it doesn’t need a fancy text editor as there’s only one level, so see if you can see data past ], comparing it to the text of your error message. If it’s extra, you can edit it off, or delete file and backup.

2.0.6.100 Canary and beyond fix this, but it’s not in any Beta release yet. Are you running latest Beta?

File backend now overwrites files, thanks @warwickmm

This is supposed to work, but the bug fixed in Canary means overwrite with shorter file has leftovers.

Ah - now there is light: I assumed that each run of a “uplod-verification-file” would completely overwrite an existing one, and missed that the hash file only makes sense togeher with the matching block/index/list files. Now that I deleted the previous hash file and created a new one, DuplicatiVerify.ps1 did its job on the matching remote files. As a result it reported (again) 14 errors like

*** Hash check failed for file: duplicati-i<14 different names>dindex.zip.aes

What can I do to fix these broken index files? Delete them and somehow repair?
What might be the reason for such mismatches to occur out of the blue?
Is there a way to find out which files of the backup set may be actually be affected, so I can check if they are corrupt?

BTW

yes I think so: 2.0.6.3_beta_2021-06-17

was meant with tounge in cheek, I didn’t expect being able to read it, just checking if it looked “encrypted” - after a closer look it actually starts with “AES … !CREATED_BY … SharpAESCrypt v1.3.3.0” so it definitely is…

If you’ve never done a database recreate, and it still has needed info, repair will recreate the files.
Because you might not be sure, you could probably move them out of the way instead of deleting.
Before that, you could try seeing that they’re not decryptable by attempting decryption, e.g. in GUI
AES Crypt or the CLI SharpAESCrypt command in your Duplicati folder.

SMB as used by Duplicati is not reliable sometimes. Something’s lost between Windows and NAS. Looking at the file closely can sometimes at least recognize symptoms. For example, corruption is commonly a truncation. Did the tool mention sizes? If not, you can read them in the verification file.

For whatever reason (possibly a feature omission), The AFFECTED command hasn’t (IIRC) worked successfully for me given a dindex file, but you can try it in GUI Commandline. I think a dblock works.

A more direct way is to look in a copy of the database with DB Browser for SQLite or similar, where a table called IndexBlockLink shows what dblock goes with a dindex. Then you can ask with affected.
The Remotevolume table knows all the destination files, and its ID is what IndexBlockLink table uses.

So you’ve done one part now of the index file check – you’ve found the header, so its damage is later. Sometimes truncation is to an even binary size if you want to post sizes or type them into a calculator.

Losing an index file doesn’t directly lose data, but not having them is not good and can slow recreates. Actual data is in the dblock file, and each one normally has a dindex to make it easier to know content.

I tried three examples:

  1. The outer index file is decryptable. Inside the zip file I have *dblock.aes file in the vol folder. This one fails to decrypt with “invalid signature”.
  2. I also tried another index file which is NOT listed as hash mismatch, and trying to decrypt the inner dblock.aes file also failed.
  3. Randomly picking a pretty old index file I noticed that there was no “list” folder included - and the inner dblock file also failed to decrypt.
    Not sure what to expect…

The duplicati- verification.json file lists name/hash/size of dblock and index files, but after a few attempts I gave up on finding a match in the name to the dblocks I extracted from the index file. (I browsed the json file in MS-Edge. Some test searches seemed to work, but not the ones I wanted). BTW two bock and two index files were listed with “null” as hash.

I tried the AFFECTED command from GUI commandline with --dblock and --full-result but it gave me

No files are affected

followed by : AFFECTED log.zip (1.7 KB)

I opened my local DB in the DB Browser and located the IndexBlockLink table. But I don’t know how to relate the table content to the identifier of the index file (the filename?). So I still don’t know which files are affected so I could check if they are corrupted.


Taking one step back: actually I wonder about the benefit of drilling down to the bottom of this (except for the analytic excercise, which for which I appreciate your help, because I happen to have the time for this :+1:). Should I expect that this inspection gives me a clue about the root cause for all this?
Without a reason which I can fix (SMB hiccup or instability?), I guess all I can do is move on, try to repair what can be repaired and hope for the best in the future.

This was much more than requested, but since you’ve begun exploring, I’ll comment.

That’s surprising. Usually bad hash means damaged file, which means it won’t decrypt.

The vol file is the volume index of the dblock file that this dindex indexes. It’s JSON text.

They’ll all fail to decrypt because they’re not encrypted. Only the associated dblock file is.

This folder contains convenience copies of blocklist files which are in some dblock file too.
Database recreate can avoid reading any big dblock folder this way – just dlist and dindex.
Processing a large file describes how a multiblock file is represented via a list of its blocks.
A block is known by its SHA-256 hash. The blocklist file strings 32 byte hashes in a series.
The file name is the Base64 of the SHA-256 hash of file bytes. The technique is used a lot.
Database rebuild gets into the external (destination) and internal (database) formats more.

Names should all match, but you might be confusing the dblock file with its index in dindex.
I open up a dindex.zip. In vol is duplicati-bef7f8a54492b45ef9959fe75cb4b2b59.dblock.zip
which is the index for that file which is also in the destination. Its name is in verification file:

“Name”:“duplicati-bef7f8a54492b45ef9959fe75cb4b2b59.dblock.zip”,“Hash”:“c/AC4voky1XOqGWgTMBh/deB2pqEH+BxAYe0tcbd/CE=”,“Size”:752

Check State value. Usual in my short test is 3. If something else, it might be a deleted file.
These are held in the database awhile, but shouldn’t actually be present at the destination.

I don’t think there’s any --dblock option. You put the file names (without any path portion)
into the Commandline arguments box (on separate lines if you’re doing multiple at a time).

For any dindex file that failed the hash test yet somehow decrypted, its dblock name is in vol folder.
Using the database, you find the dindex in Remotevolume table, look up its ID in IndexBlockLink as IndexVolumeID, and take the BlockVolumeID for that row back to Remotevolume to get a file Name.

Did you try running the verification script? When I intentionally shortened a dindex file, it told me that

C:\>powershell -ExecutionPolicy Bypass -file "C:\Program Files\Duplicati 2\utility-scripts\DuplicatiVerify.ps1"

cmdlet DuplicatiVerify.ps1 at command pipeline position 1
Supply values for the following parameters:
FileOrDir: C:\ProgramData\Duplicati\duplicati-2.0.6.104_canary_2022-06-15\RUN\test 1
Verifying file: duplicati-verification.json
Verifying file duplicati-20230318T211945Z.dlist.zip
Verifying file duplicati-20230318T230243Z.dlist.zip
Verifying file duplicati-20230324T142351Z.dlist.zip
Verifying file duplicati-b0be1098d590f49ec8abf5ed48ff8c6a2.dblock.zip
Verifying file duplicati-b68cae76f5fb14adfa22ece3a0b21efa4.dblock.zip
Verifying file duplicati-bef7f8a54492b45ef9959fe75cb4b2b59.dblock.zip
Verifying file duplicati-i01af6a3a2aca4942b651e8052e733c6f.dindex.zip
Verifying file duplicati-i6917b5de06b3479abfa484d201785878.dindex.zip
Verifying file duplicati-i97990165a1864f1c9ff863261ba20ba2.dindex.zip
*** Hash check failed for file: duplicati-i97990165a1864f1c9ff863261ba20ba2.dindex.zip

Errors were found

so its answer is rather non-specific. Even though it can get the length, it only complains about hash.

Being able to report errors suggests that it still knows about the files in Remotevolume table, which is probably more or less dumped into duplicati-verification.json. Does the verification tool give 14 errors?

If so, you could possibly just move on to hiding 14 dindex files (e.g. move them to a folder) and Repair.
A worrisome thing is that you were able to decrypt a dindex that gave a hash failure. That’s a surprise.
You didn’t say length value for any of the bad dindex files, but duplicati-verification.json would know all.
The reason this matters is that if the problem is always truncation, there’s now a backup-time test for it.

Added check to detect partially written files on network drives, thanks @dkrahmer

but it’s only in the Canary releases so far. If network drive got unreliable, maybe use a different access?
Truncation is commonly to a binary-even value. I think filesystems and SMB like binary. The calculator Windows supplies can do conversions, or if you want to tell me 14 actual file sizes, I can convert a few.

As a side note on verifying (assuming you can get affected to tell you after you tell it the dblocks) is a
no-local-blocks is needed to force it to go to the destination, otherwise it will use source blocks, if found.

Duplicati will attempt to use data from source files to minimize the amount of downloaded data. Use this option to skip this optimization and only use remote data.

Do you ever simulate disaster recovery with Direct restore from backup files? That creates a database, however somewhat more thorough (because it does all versions) is to move yours aside and do Repair. Assuming it works fairly quickly (it would have trouble if a dindex is actually corrupted, so read dblocks), you get your choice of putting your old database back (it has logs and such) or staying on recreated DB.

You could look up that file in the verification file to see what it was expected to be.
One worrisome explanation for your ability to decrypt files with bad hashes is that
perhaps the file sometimes reads back OK, and sometimes delivers bad contents.
This would possibly be visible in, say, the verification tool errors versus test ones.

Now I see - since the name of the inner dblock ended on *.aes I assumed this was also encrypted (why not give it a *.json file name? confusing …) BTW: I can only make sense of your comment if I read it as “only the associated dindex file is” - didn’t you refer to the outer dindex file?

I think I understood this correctly: duplicati-i5ef82f1d81454e49b849cf606c695752.dindex.zip contains duplicati-b345fd83a9a60407a8e7ff0ee7666fe27.dblock.zip.aes (the json file) in the vol folder. I tried to look up duplicati-b345fd83a9a60407a8e7ff0ee7666fe27.dblock.zip in the duplicati-verification.json file - but no luck.

So how should I read the description in the AFFECTED command which says: " The advanced option dbpath is required to specify the location of the local database."
Ahh - I see: in the GUI Commandline the dbpath option is set by default. So this was probably redundant.

I followed your guidance and can confirm that the dblock file in the vol folder of the dindex file is indeed the correct one :slight_smile: Actually I was hoping to find away which source file was affected, so that I could check through a restore operation if it was corrupted.
I know we’re talking about block-type backup and deduplication, but somewhere in the DB there must be the information how to put the pieces together to make a source file, right?
I spotted the "File"table, and I guess the BlocksetID is a mapping to the list of blocks that make up that file (?). But in the Blockset table I only find hashes…

yes, see my second to last post. This where I got the list of 14 different dindex files failing hash check.
The dindex files I examined were all out of this list.
---- need to stop here - will continue tomorrow ------

Because its name is exactly the name of its dblock file, which in some cases might end in .zip, however that would also be misleading to a tool or a human, but I don’t know why one couldn’t just strip off .json. Maybe it wasn’t something the original author thought about. They certainly knew exactly what file meant.

The encryption is always outer, and file extensions are read left to right, so .zip.aes is zip then encrypt which is what you have for both your dindex and dblock files, and for your dlist file as well for that matter.

The dblock file would probably have the exact file name, ending in .aes (just as in Remotevolume table). Depending on how you searched, it might genuinely not be in the file which might mean an extra dindex. You can also look on the NAS. These names are supposed to be tracking actual files on the Destination.

??? dblock and dbpath are not the same, but both start with db. Maybe you meant to speak of --dbpath?

That’s exactly what happens when you give the dblock file names to the affected command. Please try.
Inventory of files that are going to be corrupted has some sample output. Versions are involved as well.

It’s in the same manual article that I deep-linked to earlier when talking about the blocklist. You can read:
How the backup process works
How the restore process works

EDIT:

You’ll see a reference to filelist.json in those articles, so that was used at least one time although there are more cases where it wasn’t. The manifest and fileset files are JSON but don’t have the suffix. Having no suffix is less misleading than having a suffix where the final one misleads about the content.

Ah yes, of course - my bad. Wrote --dblock and meant --dbpath :upside_down_face:

I followed your suggestion and initially got 17 errors on index files after a repair (“… registering a missing remote file”), but subsequent backups then ran without errors.
Today I recreated a verification.json file and the verification script reported no errors.

Tried again with two dblock files I randomly picked from the verification.json file, and it indeed provided a nice list of affected backup sets and file names. [On a side note: Since I did specify the --full-result option the final line in the output was confusing: “Found 2 related log messages (use --full-result to see the data)”. It seems the option is set by default, since it didn’t make a difference when I omitted it.]

When I tried with the dblock file contained in one of the index files that were reported in my original errors, AFFECTED returned “no files affected”. I checked the index filename and it indeed was listed in the 17 errors (see above), so I guess this would explain that no files are affected, because the index (and therefore the block) are registered as “missing remote file”, right?
BTW list-broken-files returned no files, so I assume that these missing files are irrelevant.

So in summary things look fairly consistent now, and I learnt a lot about Duplicati’s internals - more than I ever fancied I’d need …
Fortunately this specific backup job is not a very important one (because I also run a mirroring program in parallel), but I think next time I may just duplicate the job, target another remote folder and hope for a clean new start. I may keep the old job + target folder for a while, just in case.
After all, my primary interest is that the backups “just run” and leave me alone - just like my other jobs that backup into the cloud.
So I’m fine to close this thread here.
Thank you @ts678 for your patience and detailed advice and references - I will happily get back to this thread when I need to remember!

If your message was the above, you ran a Recreate instead of a Repair which would have uploaded.
I’m not sure what you have now, but this mess is getting messier. Another sign that you ran Recreate would be if all of your old job logs are gone (which isn’t a big deal, but it can confirm that path taken).

If you did that without copying old database, we lost some valuable information on expected files too, basically wiping out the original records that were making the complaints. Is that good or bad though?
Maybe all is well unless Recreate was struggling, in which case it may struggle in disaster recovery…

You can decide if you want to worry about that. I’ve been noticing sometimes recreate cleanups don’t cleanup the destination. Basically, it lives with the problem so things look OK until next database loss.