Please verify the sha256 hash error - Is it my block sizes?

handyguy · December 14, 2020, 10:43pm

I have been having a recurring problem with backups on my media server. I keep getting error messages like:

remote file duplicati-b9726ce8a1b9a43d7b027f1b28049d50e.dblock.zip.aes is listed as Uploaded with size 621805568 but should be 786287405, please verify the sha256 hash "zsWsgxkwU7xcE51TLVhd/g9eNzr5J/ITbrIGA2kxtOs="

Sometimes it’s just one bad file, other times it is multiple bad files. I’ve been able to repeatedly fix the problem by:

Deleting the offending .dblock.zip.aes file(s)
Running a ‘purge-broken-files’ command line
Repairing the database
Re-running the backup

Sometimes this fixes the problem right away, other times it takes 2-3 rounds to get it fixed. But of 13 backups on multiple systems daily, the media server backups (2 of them) are the only ones where this keeps happening. I’ve eliminated most causes of potential error (bad disk, bad NAS server, bad network connections, etc.) and even reinstalled the OS and Duplicati to get clean builds of both and none of that has solved the problem. Thus, I’m wondering if the block sizes I am using could be contributing to the high error rate on this system.

Here are the specs:

Media Server Hardware: VMWare VM, 3 CPUs, 32GB RAM
Media Server OS: Debian Linux 10 (Buster)
Backup connection: CIFS to NAS drive. (Before commenting that CIFS can be unstable, I use CIFS from all my Linux systems to the NAS and none of the others have this high an error rate. I’m not saying the NAS or the CIFS can’t be the issue, but I’m doubtful.)
Files: ~700 files, total of 1.5TB, each file ranging in size from 4G-30G in size
Block Sizes: Remote Volume Size: 2GB, --blocksize=500KB, --dblock-size=750MB

Keep in mind that because this is a media server (movies, TV, music), files are being added and deleted all the time, so I’m sure Duplicati has to do a lot of work in calculating and recalculating backup blocks.

I know block size determination requires knowledge of the Dark Arts, so I’m hoping one of the Wizards here on the Forum can help me figure this out. Given the makeup of the source files I am trying to back up, are the block sizes I’m using causing enough churn in the system to cause the high rate of “please verify the hash” errors? Any suggestions for changes that would reduce (or eliminate) the errors I am getting?

Thanks in advance.

ts678 · December 17, 2020, 2:50pm

I doubt it, but to my knowledge nobody understands the origin of the SMB corruptions.
This is a very different question than what block size to use for performance reasons.

If you would like to try a non-SMB plan or work with debugging SMB issue, please say.

Debugging can benefit from some tools on NAS side. Is NAS very limited, or quite full?

Very basic debugging could just record the file sizes from client and server constantly.
EDIT: Show log → Remote → (click on list) has lists, but old ones are difficult to find.
Creating a bug report has historical lengths too, if you’re willing to make/post a report.
You can also look in a copy of your own datbase if you prefer. Remoteoperation table.

Yours show signs of a filesystem or SMB truncation because size is very binary-even:

621805568 but should be 786287405

621805568 is 0x25100000 after converting in Windows calculator Programmer mode.
Binary sizes are very typical of filesystems, and I believe are common in SMB as well.

You can look at size before delete. You could also try decrypting to verify it’s corrupted.
Running ls -l on both client and server would be nice to see if they agree on the size.

Do these correlate with events such as system reboot, or connection gap in between?
SMB uses caching by default, though I think it can be configured to run safe-but-slow…

handyguy · December 17, 2020, 3:10pm

Doing some additional testing now, but if that doesn’t work I may try an NFS connection instead of CIFS.

The corruption correlates directly to file deletion on the server. For example, I can have a clean run for several days. Then, when we watch a show episode or a move we delete it form the server. The next backup will show the “hash error” message. That’s what led me to believe it was a block issue; terabytes of creates and adds don’t have an effect. As soon as I delete it throws the error.

ts678 · December 17, 2020, 4:12pm

What backup retention are you using? If you keep only 1 version, source deletes may trigger compact.
If you keep multiple versions and your source file got in older one, delete should have no instant effect.
You can certainly turn on the no-auto-compact option to see if it helps and see if compact button hurts.

There’s something missing from this:

backups for days, successfully
delete a file from media server
?
next backup fails on dblock size

So this doesn’t even need Duplicati to do anything at step 3? Source delete changes dblock size view?
If so, you definitely need to keep an eye on your files to see if file listings or content is changing around.

Here’s a remote log for a tiny backup. It did a file list at start, and at end. When does your file go bad?

Dec 17, 2020 11:09 AM: get duplicati-b401a59f3c6b64ac6beee1f74ca91a31d.dblock.zip.aes
Dec 17, 2020 11:09 AM: get duplicati-ie813968cad3846998f9d7a066b3be661.dindex.zip.aes
Dec 17, 2020 11:09 AM: get duplicati-20201217T160929Z.dlist.zip.aes
Dec 17, 2020 11:09 AM: list
Dec 17, 2020 11:09 AM: put duplicati-20201217T160929Z.dlist.zip.aes
Dec 17, 2020 11:09 AM: put duplicati-ie813968cad3846998f9d7a066b3be661.dindex.zip.aes
Dec 17, 2020 11:09 AM: put duplicati-b401a59f3c6b64ac6beee1f74ca91a31d.dblock.zip.aes
Dec 17, 2020 11:09 AM: list

handyguy · December 17, 2020, 4:19pm

That’s the odd thing, I’m not doing anything else. However, the system is a Plex server, so I assume ‘deleting’ a file within Plex involves more than just removing the media from the FS, but it shoujldn’t be that much more.

I’ll keep doing more testing and will report back if I come up with anything solid.

Thanks.

ts678 · December 17, 2020, 5:38pm

I have no idea, but if you like, Duplicati compare (e.g. from Commandline) can show its change.
Backup, delete from Plex, backup, compare without even naming versions will show its change.

handyguy · December 23, 2020, 4:17pm

Famous last words. After many, many (many, many) years in tech, I should know better than to assume anything until it’s proven.

I changed the connection between the media server and the NAS from CIFS to NFS, and the backups seem to be working just fine again. I’ll keep testing a bit, but it looks like for very large files, using an NFS connection between systems seems to be more stable than CIFS/SMB.

Thanks everyone for all the commentary.