Restore failed 2 files

Yep - I think I’m confused. :slight_smile:

I have Duplicati running on Windows backing up to a local SFTP destination running on Linux based unRAID. Aside from an issue not related to this, I haven’t had any issues.

I apologize in advance if I’ve totally got this wrong, but are you running Duplicati on the ReadyNAS and backing up your Windows contents via a share mounted on the ReadyNAS rather than running Duplicati on the Windows box itself?

Close. I am running Duplicati on an Ubuntu server.

Duplicati is backing up 3 windows machines, 6 shares on a NAS, and (so far) one remote Linux system. The Windows machines are mounted via SMB, the NAS is (now) mounted via NFS, and the remote Linux system is mounted with SSHFS.

Each backup goes to four destinations: Dropbox, B2 Cloud, and two removable USB drives.

Got it - so you’re running Duplicati on one “central” box and backing up everything else via “local” (to the Duplicati installed box) mounts. And I assume you’ve got (at least) 4 jobs - one for each of your destinations (Dropbox, B2, USB 1, USB 2).

So in your tests Duplicati was writing from the Linux box over a CIFS/SMB mount point to a Windows box when it had the failures.

If I do my testing, I would be writing from a Windows box to an SMB share - so not quite the same thing, but maybe worth a test anyway.

Exactly. Your ability to distill my situation down to a couple of clear sentences is heartening. :slight_smile:

And yes, I have 3 to 4 jobs for each backup (B2 is new to me, it’s very affordable). I keep track of them all in an Excel spreadsheet so I can sort by start time, source, or destination.

The most interesting thing I learned yesterday was that it wasn’t just writing to the SMB shares on the NAS that was a problem; I also had data corruption writing to a Windows share that I created just for the test. And when I copied directly from the Ubuntu server to an SMB share there was no corruption, which may rule out general network stack/routing/switching, and points me in the direction of my Duplicati installation or Duplicati itself being the source of the problem. But assuming SMB shares are included in “File storage” as a backend in the stats, some of the 18737 reported backups yesterday must be to SMB shares so I’m back to only myself and one other user (from my other thread) seeing issues with SMB as a destination.

My testing method was to restore a ~4GB directory, containing various subdirectories and about 2000 files ranging from <1kb to 500MB, to a blank folder in a new destination, then run WinMerge to compare my original/live files against the restored set. I used WinMerge because Duplicati didn’t always throw an error for every corrupted file in the recovery folder so I wanted to capture all the differences. Most of them were corrupted JPG and PSD files that were only partially written, so I could see the top part of the JPG files for example.

Sorry if you already said, but about what percentage of files in your ~4GB test came up bad (according to WinMerge)?

Anywhere from three to ten files in each failed test (it varied). I just checked, it’s about 1600 files total. While that may be a small percentage, in general I’d consider any failure highly significant.

I agree - I’m just not sure I’ve got 4G of test data floating around so wanted to make sure I picked a source size that should reflect at least a few issues. :slight_smile:

Edit:
I’ve got a 2.5G backup running on my Linux box using an SMB mounted Windows share as the source. Once that’s done I’ll try restoring back over the SMB mounted Windows share (to a different folder) and binary compare the two locations.

In case I haven’t already said it…just to be clear, it sounds like in all your testing it seems the backups (even with an SMB/CIFS mounted source) are working just fine - it’s the restore over the SMB/CIFS mount that has issues. Restoring from the same backup over something else (such as NFS) works just fine.

Ah, edit: you mean SMB/CIFS as the source…yes, those are working. Backups where the dblock files are saved to an SMB/CIFS destination share failed too, after a number of days. This is the issue that I and @amanz were having here:

Sorry I don’t have any suggestions yet - I’m still trying to isolate the issue…

  • When restoring to an SMB/CIFS location errors appear as SOME corrupt files, reported as “expected hash” errors
  • When backing up to an SMB/CIFS location errors appear as SOME corrupt files, reported as “please verify the SHA256 hash” errors

In both cases, it appears the writing TO the SMB/CIFS location is the source of the issue since, at least in the first instance, restoring via something other than an SMB/CIFS share seems to work correctly.

Bingo.

And actually re-reading the sha256 hash thread, I saw this comment by @kenkendk:

His assumption that I should be able to replicate the problem with a manual copy to a CIFS destination has not been true for me so far. I’ve use the linux cp command to copy the restored files several times to a SMB/CIFS mount and there have been no errors. I’m also currently testing with this command:

dd if=/dev/urandom of=/mnt/helios-restore/large.txt count=50 bs=1048576

where /mnt/helios-restore is a mount on my Windows machine. It keeps writing files that are exactly 52,428,800 bytes, I’ll go bigger and try the same on the NAS as well.

New test script:

#! /bin/bash
for n in {1..40}; do
    dd if=/dev/urandom of=/mnt/helios-restore/file$( printf %03d "$n" ).bin bs=1048576 count=50
done

This created 40 files on my local Windows hard drive (from my Ubuntu server), each exactly 52428800 bytes. Then I mounted a share on the NAS using CIFS and ran the same script (after updating the mount point) and all 40 generated files are exactly the same size.

I wonder if this test is sufficient to show that my hardware is reliable, or does Duplicati move data in a different way (maybe multiple files at a time, so I should run multiple scripts concurrently?).

Personally, I’m guessing there may be an issue with the SMB/CIFS implementation in specific versions of mono (I’m running 5.10.0.160), associated with particular kernel versions (I’m on 4.14.13), samba (I’ve got 4.6.12), etc… Unfortunately, I’m not really sure how to go about proving any of those.

My test restore came up fine for 2.7G (970 files) from Duplicati running on Linux to an SMB mounted windows share finished with WARNINGS about applying metadata (I’m guessing timestamps) but no actual errors. The binary compare between the restore and the source tells me they’re the same.

So I haven’t been able to replicate your failure scenario, but then maybe I had too few files… :frowning:

1 Like

That’s an interesting theory. I wonder if I can use mono to perform a write test like I did my linux dd test? I know absolutely nothing about mono but I will try to learn more to see if there’s a way to test further.

Hmm:

Important edit: Since I’ve marked this as the solution, I’ve discovered that CIFS with cache=none is extremely slow. I noticed my backups were taking longer than normal. I ran a dd test with one of my shares mounted via NFS and then the same test with CIFS (with cache=none):

NFS: (14.3MB/s - not great, but OK):

root@home:/media/duplicati_a# dd if=/mnt/nas/Backup/home-backup.tar.gz of=/media/duplicati_a/testfile status=progress
816660992 bytes (817 MB, 779 MiB) copied, 57.0038 s, 14.3 MB/s

I do believe this is the solution - mono has problems with CIFS caching. Thank you @JonMikelV for your help and for suggesting that mono might be having trouble with CIFS shares.

CIFS: (84.4kB/s) :open_mouth: yes that’s a “k”; I didn’t bother letting it finish.

root@home:/media/duplicati_a# dd if=/mnt/nas/Backup/home-backup.tar.gz of=/media/duplicati_a/testfile status=progress
22611968 bytes (23 MB, 22 MiB) copied, 268 s, 84.4 kB/s

Final answer: use NFS.


I mounted the same share on my NAS on two different mount points, with one mount point using cache=none, and the other mount point using the default cache=strict as verified with the mount command.

Then I restored the same ~4GB dataset to two different directories in this mount, using one mount point for each restore:

Restore to r-cifs/nocache, total size:

3.90 GB (4,195,885,480 bytes)

Restore to r-cifs/strictcache, total size:

3.83 GB (4,114,604,523 bytes)

Perhaps more interesting is the Duplicati has no idea anything went wrong. No errors in the second restore report, but WinMerge reports 7 files are truncated in the /strictcache version.

The solution here is to turn off caching for CIFS mounts. My corrected /etc/fstab line contains:

rsize=32768,wsize=32768,cache=none

I’ll run more tests in the coming days but I’m fairly confident in this result based on what I’ve seen elsewhere about mono and CIFS caching.

1 Like

Thanks for the awesome research and testing supporting the theory that the cached CIFS mounts issue is due to the underlying mono framework!

Honestly, I’m not sure if my mount is cached or not but I did notice I’m running mono 5.10.0.160 while the link you found is for 5.4.0.201. Do you know what version mono (and samba) you’re running? Hopefully this issue has already been resolved somewhere between 5.4 and 5.10…

In the mean time, @kenkendk is there somewhere we should consider mentioning this POTENTIAL issue with writing to CACHED CIFS mount points?

My mono version is 5.8.0.108; samba 4.3.11-Ubuntu.

I’m continuing to test different scenarios; I’m duplicating the test from last night right now, and next I’m going to try altering the rsize and wsize back to the 1MB defaults for the mount command but leaving the cache=none.

Good idea, thanks!

My unRAID appears to be based on Slackware but I think I have an Ubuntu VM floating around somewhere that I might be able to test something with…eventually.

The duplicate of last night’s test yielded some interesting (yet similar) results.

Restoring to “nocache2” resulted in a perfect set of data and no errors from Duplicati.

Restoring to “strictcache2” resulted in the restored file sizes being identical, but size on disk is smaller than the correct version, and according to WinMerge, two of the large binary files do not match, even though they are the same file sizes. However analysis shows the “size on disk” to be about 10% smaller on one of them. I have never seen a situation where file size is lower than the size on disk, since I thought the latter accounted for block size differences on the disk itself (i.e., a 5-byte file takes up 1024 bytes on disk with a 1K block size). Whatever is going on internally, the restore is borked, so it’s the same result as before. Duplicati did find an error on one of the two binary files that was corrupted.

The last test was to leave the rsize and wsize at whatever default value mount wants by removing the 32K values from the line in fstab. The default is then 1MB (1048576 bytes) according to mount. The resulting restore is again a perfect dataset, no errors in Duplicati, and WinMerge reports exact match on all files.

I will see if I can upgrade mono and test again but I’m running out of time…been devoting a lot of time to this issue lately. :slight_smile:

Mono version 5.10.0.160 (upgraded using apt-get update and apt-get upgrade), restore to CIFS share with default (strict) caching - failed. No errors reported in Duplicati, but WinMerge clearly shows 4 truncated files.

Now I just need to determine whether I should go back to CIFS with cache=none or remain with NFS. I would prefer CIFS because it offers a semblance of permissions (login credentials) while at least on my NAS, NFS can only be limited by host IP. Not that I’m too concerned about something on my home LAN spoofing the static leased IP of my server, but I would prefer the route that uses actual credentials since I have a “home-backup” user defined which only has read access to certain shares (to help guard against cryptoviruses).

Good to know it’s apparently not resolved in the latest mono. If you’re at the CLI again I’d be curious to hear what version comes out of an smbstatus.

Though if I recall correctly you only have problems when using CIFS and mono…