I am playing around with the latest Canary 220.127.116.11 with a test copy of one of my production backup sets.
I have done various purge operations, backups, database recreations, restores, etc.
But now I’m testing the deletion of dindex files to see if they are recreated properly. I deleted 1550 dindex files then ran another backup. Duplicati detected the missing files and prompted me to Repair. I ran the Repair and 1548 of the 1550 files were recreated without issue. Two of the files caused this type of error:
2019-10-21 14:37:32 -07 - [Error-Duplicati.Library.Main.Operation.RepairHandler-CleanupMissingFileError]: Failed to perform cleanup for missing file: duplicati-i40c3ea119a344bfa8d5e8e7faed60c24.dindex.zip.aes, message: Internal consistency check failed, generated index block has wrong hash, JK08SX0qCZzPHTVQ2tSqICHbTx0ddPiRp4YcXwHjkB4= vs yuJVEmzJrjO2iD9aWhWzi7XKPJ+SKReFmnrEIFYOLOs=
2019-10-21 14:37:32 -07 - [Error-Duplicati.Library.Main.Operation.RepairHandler-CleanupMissingFileError]: Failed to perform cleanup for missing file: duplicati-i94d4b285035b4fc59d77625e182ae5ce.dindex.zip.aes, message: Internal consistency check failed, generated index block has wrong hash, i+LMfmCATFXwu7A3YGHVGkU0iqD4NH1rqzZfW1Q3xO0= vs tpI5x47zReN2kh4Cf1Uu8gPLqsXhgYLBXf3Vs0k/Y2o=
Anyone know what the root cause of this is, and if there is any way to recover?
I’m not familiar with this code, but the GitHub issue can probably supplement my/your attempt to read.
Was the production backup made before 18.104.22.168 even though you’re playing with a copy on 22.214.171.124?
Is the problem described reproducible on 126.96.36.199? If so, can you get a better log and stack trace of it?
If you have a debugger or a debug build, some of the variables in the code might add to the message, although if you get a DB browser you can probably also use the “should-have” hash to get some clues.
A DB bug report might also help, although some recent attempts have found that they’re slow to create.
Does it happen on 188.8.131.52? I’d prefer to know that it’s an old but rare bug, not a regression in 184.108.40.206.
Possibly in some cases, but it may be rare, and I’m not an expert. Nevertheless, let me try explanations.
The BlocklistHash Index is likely for blocklists that exceed 1 block for the file, like BlocksetEntry Index is for file data that exceeds 1 block. Big files are not unusual, and you can look up BlocksetID in File table to see if the path makes sense. The SHA-256 hash for a block is stored directly in the blocklist without a Base64 encoding, so at 32 bytes you can get about 3200 of them in a block, each representing a 100KB source file block, so the file here is probably at least 1 GB if I did the math right. You might have two files which have a common section appearing midway. Is the Hash for earlier and later Index values different between the two files? Two files that are totally identical usually just point to the same blockset I believe.
Duplicati’s deduplication mechanism may worry most about large savings such as duplicated source file data blocks. Duplicate blocks with the small data block hashes might not be worth another dedup stage.
Thanks for your thoughts. I am very unfamiliar with Duplicati at this level. I’ll try to do more debugging when I have time, and also see if the problem is reproducible on a previous version of Duplicati - like 220.127.116.11 as you suggested. I won’t be able to run the purge tests but I can test the dindex recreation to see if it fails.
Depending on your install situation, I was thinking you might be able to just drop 18.104.22.168 where 22.214.171.124 was having trouble with Repair to see if 126.96.36.199 has trouble too. Basically, don’t do all, but start from fail. Whatever you like though.
Have confirmed it’s not a problem with 188.8.131.52 specifically. I downgraded to 184.108.40.206 and it has the same error. I then reset my test by deleting the test backup data and recopying from a production backup. I then just deleted the dindex files and ran a repair. This time 3 files were flagged with consistency check failures: the same 2 files as before and now a third file that was created in my production backup after I did my earlier test.
The machine running the production backup data I’m testing is using version 220.127.116.11. Wonder if there’s some problem with that version that is leading to incorrect dindex files.
Can someone else running 18.104.22.168 or newer do a test where you delete your dindex files and run a repair?
Fixed a retry error where uploaded dindex -files would reference non-existing dblock files, thanks @warwickmm
although I don’t know (and haven’t looked at) code enough to say whether that might be related. If your default backup logs show RetryAttempts at anything other than 0, then the possibility of that bug opens.
Possibly this would be the easiest one to track down, e.g. look at its timestamp on destination, work out what backup run made it, look for oddities e.g. non-zero RetryAttempts. Ideally you kept additional logs.
Just to follow up on this. I repeated my tests using 22.214.171.124 and didn’t have any issues at all. I was able to run multiple purge operations, database recreations, dindex file recreations, etc and everything is working well.
I don’t think any changes were made between 126.96.36.199 and 188.8.131.52 to account for this, so maybe my earlier tests were faulty somehow.