Restore failed 2 files

haus · March 14, 2018, 5:12pm

Ah, edit: you mean SMB/CIFS as the source…yes, those are working. Backups where the dblock files are saved to an SMB/CIFS destination share failed too, after a number of days. This is the issue that I and @amanz were having here:

JonMikelV · March 14, 2018, 7:03pm

Sorry I don’t have any suggestions yet - I’m still trying to isolate the issue…

When restoring to an SMB/CIFS location errors appear as SOME corrupt files, reported as “expected hash” errors
When backing up to an SMB/CIFS location errors appear as SOME corrupt files, reported as “please verify the SHA256 hash” errors

In both cases, it appears the writing TO the SMB/CIFS location is the source of the issue since, at least in the first instance, restoring via something other than an SMB/CIFS share seems to work correctly.

haus · March 14, 2018, 7:25pm

Bingo.

And actually re-reading the sha256 hash thread, I saw this comment by @kenkendk:

His assumption that I should be able to replicate the problem with a manual copy to a CIFS destination has not been true for me so far. I’ve use the linux cp command to copy the restored files several times to a SMB/CIFS mount and there have been no errors. I’m also currently testing with this command:

dd if=/dev/urandom of=/mnt/helios-restore/large.txt count=50 bs=1048576

where /mnt/helios-restore is a mount on my Windows machine. It keeps writing files that are exactly 52,428,800 bytes, I’ll go bigger and try the same on the NAS as well.

haus · March 14, 2018, 7:51pm

New test script:

#! /bin/bash
for n in {1..40}; do
    dd if=/dev/urandom of=/mnt/helios-restore/file$( printf %03d "$n" ).bin bs=1048576 count=50
done

This created 40 files on my local Windows hard drive (from my Ubuntu server), each exactly 52428800 bytes. Then I mounted a share on the NAS using CIFS and ran the same script (after updating the mount point) and all 40 generated files are exactly the same size.

I wonder if this test is sufficient to show that my hardware is reliable, or does Duplicati move data in a different way (maybe multiple files at a time, so I should run multiple scripts concurrently?).

JonMikelV · March 14, 2018, 9:01pm

Personally, I’m guessing there may be an issue with the SMB/CIFS implementation in specific versions of mono (I’m running 5.10.0.160), associated with particular kernel versions (I’m on 4.14.13), samba (I’ve got 4.6.12), etc… Unfortunately, I’m not really sure how to go about proving any of those.

My test restore came up fine for 2.7G (970 files) from Duplicati running on Linux to an SMB mounted windows share finished with WARNINGS about applying metadata (I’m guessing timestamps) but no actual errors. The binary compare between the restore and the source tells me they’re the same.

So I haven’t been able to replicate your failure scenario, but then maybe I had too few files…

haus · March 14, 2018, 9:33pm

That’s an interesting theory. I wonder if I can use mono to perform a write test like I did my linux dd test? I know absolutely nothing about mono but I will try to learn more to see if there’s a way to test further.

Hmm:

haus · March 15, 2018, 6:14am

Important edit: Since I’ve marked this as the solution, I’ve discovered that CIFS with cache=none is extremely slow. I noticed my backups were taking longer than normal. I ran a dd test with one of my shares mounted via NFS and then the same test with CIFS (with cache=none):

NFS: (14.3MB/s - not great, but OK):

root@home:/media/duplicati_a# dd if=/mnt/nas/Backup/home-backup.tar.gz of=/media/duplicati_a/testfile status=progress
816660992 bytes (817 MB, 779 MiB) copied, 57.0038 s, 14.3 MB/s

I do believe this is the solution - mono has problems with CIFS caching. Thank you @JonMikelV for your help and for suggesting that mono might be having trouble with CIFS shares.

CIFS: (84.4kB/s) yes that’s a “k”; I didn’t bother letting it finish.

root@home:/media/duplicati_a# dd if=/mnt/nas/Backup/home-backup.tar.gz of=/media/duplicati_a/testfile status=progress
22611968 bytes (23 MB, 22 MiB) copied, 268 s, 84.4 kB/s

Final answer: use NFS.

I mounted the same share on my NAS on two different mount points, with one mount point using cache=none, and the other mount point using the default cache=strict as verified with the mount command.

Then I restored the same ~4GB dataset to two different directories in this mount, using one mount point for each restore:

Restore to r-cifs/nocache, total size:

3.90 GB (4,195,885,480 bytes)

Restore to r-cifs/strictcache, total size:

3.83 GB (4,114,604,523 bytes)

Perhaps more interesting is the Duplicati has no idea anything went wrong. No errors in the second restore report, but WinMerge reports 7 files are truncated in the /strictcache version.

The solution here is to turn off caching for CIFS mounts. My corrected /etc/fstab line contains:

rsize=32768,wsize=32768,cache=none

I’ll run more tests in the coming days but I’m fairly confident in this result based on what I’ve seen elsewhere about mono and CIFS caching.

JonMikelV · March 15, 2018, 2:04pm

Thanks for the awesome research and testing supporting the theory that the cached CIFS mounts issue is due to the underlying mono framework!

Honestly, I’m not sure if my mount is cached or not but I did notice I’m running mono 5.10.0.160 while the link you found is for 5.4.0.201. Do you know what version mono (and samba) you’re running? Hopefully this issue has already been resolved somewhere between 5.4 and 5.10…

In the mean time, @kenkendk is there somewhere we should consider mentioning this POTENTIAL issue with writing to CACHED CIFS mount points?

haus · March 15, 2018, 2:15pm

My mono version is 5.8.0.108; samba 4.3.11-Ubuntu.

I’m continuing to test different scenarios; I’m duplicating the test from last night right now, and next I’m going to try altering the rsize and wsize back to the 1MB defaults for the mount command but leaving the cache=none.

JonMikelV · March 15, 2018, 2:25pm

Good idea, thanks!

My unRAID appears to be based on Slackware but I think I have an Ubuntu VM floating around somewhere that I might be able to test something with…eventually.

haus · March 15, 2018, 3:40pm

The duplicate of last night’s test yielded some interesting (yet similar) results.

Restoring to “nocache2” resulted in a perfect set of data and no errors from Duplicati.

Restoring to “strictcache2” resulted in the restored file sizes being identical, but size on disk is smaller than the correct version, and according to WinMerge, two of the large binary files do not match, even though they are the same file sizes. However analysis shows the “size on disk” to be about 10% smaller on one of them. I have never seen a situation where file size is lower than the size on disk, since I thought the latter accounted for block size differences on the disk itself (i.e., a 5-byte file takes up 1024 bytes on disk with a 1K block size). Whatever is going on internally, the restore is borked, so it’s the same result as before. Duplicati did find an error on one of the two binary files that was corrupted.

The last test was to leave the rsize and wsize at whatever default value mount wants by removing the 32K values from the line in fstab. The default is then 1MB (1048576 bytes) according to mount. The resulting restore is again a perfect dataset, no errors in Duplicati, and WinMerge reports exact match on all files.

I will see if I can upgrade mono and test again but I’m running out of time…been devoting a lot of time to this issue lately.

haus · March 15, 2018, 4:17pm

Mono version 5.10.0.160 (upgraded using apt-get update and apt-get upgrade), restore to CIFS share with default (strict) caching - failed. No errors reported in Duplicati, but WinMerge clearly shows 4 truncated files.

Now I just need to determine whether I should go back to CIFS with cache=none or remain with NFS. I would prefer CIFS because it offers a semblance of permissions (login credentials) while at least on my NAS, NFS can only be limited by host IP. Not that I’m too concerned about something on my home LAN spoofing the static leased IP of my server, but I would prefer the route that uses actual credentials since I have a “home-backup” user defined which only has read access to certain shares (to help guard against cryptoviruses).

JonMikelV · March 16, 2018, 3:32am

Good to know it’s apparently not resolved in the latest mono. If you’re at the CLI again I’d be curious to hear what version comes out of an smbstatus.

Though if I recall correctly you only have problems when using CIFS and mono…