[Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]

Can you open an issue? It will help get developer attention.

How about narrowing down the scope of this issue, e.g. by seeing if a single large file backup can get it? Probably that’s primarily data reads as opposed to other things that may go bad such as attribute reads.

Even more likely to be a relatively basic reading test would be Duplicati.CommandLine.BackendTool.exe, simply doing a put of an SMB source file somewhere, even to a local destination folder to be even easier. Target URL (especially for file) is pretty easy, but if you want you can copy from Export as Command-line.

It would also be useful to try to correlate (or not) time of Duplicati access errors with kernel syslog errors. Yours seem to be uptime-seconds. I don’t know if syslog is configurable, but dmesg -T is human-friendly.

Deeper still is strace, where one can see that mono runs a lot of signals, interrupting system calls which possibly cause later issues. Typically slow ones such as file reads should expect to get an EINTR errno (whether or not they do in mono is a question), whereas “fast” system calls should not return an EINTR, which by the way is a 4, which might be what the -4 in the message is telling us. I think it’s using errno…

Did some testing the last couple of days with some other backup programs. Duplicacy works without problems. But interestingly restic has a similar problem. I found this post about the problem: Restic 0.10.0 causes read errors on mounted samba share · Issue #2968 · restic/restic · GitHub. Restic is written in Go and using the described workaround (export GODEBUG=asyncpreemptoff=1) the problem was solved for me. Does anyone know if there is a similar setting for mono?

That one was closed in favor of the issue cited earlier but only by link. Just for completeness, its location is
restic fails on repo mounted via CIFS/samba on Linux using go 1.14 build #2659. Their theory seems to be exactly what’s being pursued here. A signal causes EINTR result from a system call, which maybe doesn’t handle it well. The GODEBUG looks like it stops this signal. See https://golang.org/src/runtime/preempt.go.

Package runtime

asyncpreemptoff: asyncpreemptoff=1 disables signal-based
asynchronous goroutine preemption. This makes some loops
non-preemptible for long periods, which may delay GC and
goroutine scheduling.

Go 1.14 Release Notes – Runtime

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors.

Embedding Mono - Signal Handling talks about signals in mono. I don’t think there’s a way to stop all.
Although I’m not familiar with Go, it’s a compiled language, so preemption becomes harder. mono is interpreting Common Intermediate Language, and interpreters would preempt in a very different way.

It’s sort of apples and oranges. A “similar setting” for preempt doesn’t fit, and signals (the real issue) probably are something that we’re stuck with. If anyone knows mono signal uses deeply, please help.

I don’t know what parts of this lie where, but I still think it needs narrowing down some, as suggested.

In all the logs from the last week it only happens with small files. Don’t see any problems with large files.

In the syslog I can see these errors: [Thu Jan 14 04:00:57 2021] CIFS VFS: Send error in read = -4

They only happen during the backup. Last night for example the amount of warnings in Duplicati is 136. Amount of errors in syslog is 159. The timing is not exactly the same. My guess is that there is around a second delay between the two. But they absolutely relate.

There seems to be an environment which looks similar to the one for Go. It’s called MONO_ENABLE_AIO (replacement for MONO_DISABLE_AIO). Question is how can you set this for Duplicati?

I also found an advanced option “use-background-io-priority”. Going to try this as well. Don’t mind if the backup takes a little longer as long it’s finishes without errors or warning.

EDIT Parameter “use-background-io-priority” has no effect. Same errors as before.

You can set additional environment variables in /etc/default/duplicati. You’ll need to restart Duplicati after editing that file for it to go into effect.

I tested this with it set to 0 and then 1 but it made no difference for me.

MONO_ENABLE_AIO
    If set, tells mono to attempt using native asynchronous I/O services.

sounds completely unrelated to goroutine preemption control, but it did have momentary appeal because

and Windows System Error Codes (500-999)

ERROR_IO_PENDING
    997 (0x3E5)
    Overlapped I/O operation is in progress.

which is at least in the same I/O error neighborhood. It’s too bad that testing so far shows it doesn’t help.

What are “the two” being compared? Are you logging activity on the source filesystem. strace can do so.

Possibly this avoids the need to test with pure file transfer with Duplicati.CommandLine.BackendTool.exe, however that seems on the surface to conflict with “Send error in read”, and I don’t know CIFS internals…

Pushing to the other extreme might involve backing up lots of empty files on the CIFS source, with source tree walking, timestamp and attribute checking, but few to no read of empty files that are actually found…

EDIT:

I guess the same logs question applies here too. What logs, and are large files actually being backed up? Files are not read through without reason, such as their being new or having new timestamp or metadata.

EDIT:

Duplicati logging at verbose level explains why a file becomes of interest, and then has to be read through:

2020-01-18 14:54:21 -05 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-CheckFileForChanges]: Checking file for changes C:\PortableApps\Notepad++Portable\App\Notepad++\backup\webpages.txt@2020-01-17_121320, new: False, timestamp changed: True, size changed: True, metadatachanged: True, 1/18/2020 2:32:28 AM vs 1/17/2020 8:09:18 PM

I just set up an old Linux Mint VM with SMB3 from a Windows 10 system, and 2.0.5.111 is running fine.
First I tried a folder of 10,000 empty files (the test I was asking for), then added a Duplicati source tree.
I’m looking in /var/log/syslog, and seeing some mount errors from flubs, but not errors reported here…

So for now, those who can get the problem will have to do the experiments to characterize the issue.
Any thoughts on what one has to do to cause the problem (minimal test case) would be appreciated.

So maybe for some, it’s just erratic. I suppose I can poke at it further, but I’m not sure where to head.
Those who are seeing it now will probably have an easier time cutting it down to see if that helps any.

In case it matters, my mono --version is 6.12.0.90, and kernel uname -r is 4.4.0-197-generic.

Interesting, I wonder if Synology is a key factor here. I’ll do another test shortly…

Edit: When the CIFS mount is to Windows 10, I still see the CIFS VFS: Close unmatched open warnings but not the read error ones. Interesting.

Tried a couple more things. Still can’t get it. First was to run mono with --profile in the hope of generating more signals to drive the profiler. Didn’t verify signal level, but fresh backup went fine. Next, tried making my network less reliable by using ifconfig to take it down for 1 second, then up for 10 (1 proved too extreme).

For those who can get it, can I talk you into using strace? It will probably take a big log and some guessing.

Each CIFS VFS: Close umatched open warning correlates to a warning like this in Duplicati:

{
   "ClassName":"System.IO.IOException",
   "Message":"Interrupted system call",
   "Data":null,
   "InnerException":null,
   "HelpURL":null,
   "StackTraceString":"  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].FindNextEntry (System.Byte* entryBufferPtr, System.Int32 bufferLength) [0x0004c] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].MoveNext () [0x00054] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at System.Collections.Generic.LargeArrayBuilder`1[T].AddRange (System.Collections.Generic.IEnumerable`1[T] items) [0x0003d] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at System.Collections.Generic.EnumerableHelpers.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00033] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at System.IO.MonoLinqHelper.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at System.IO.Directory.GetDirectories (System.String path, System.String searchPattern, System.IO.EnumerationOptions enumerationOptions) [0x00009] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at System.IO.Directory.GetDirectories (System.String path) [0x0000b] in <d13c8b563008422a8c5aaec0a74089cc>:0 \n  at Duplicati.Library.Snapshots.SnapshotBase.ListFolders (System.String localFolderPath) [0x00000] in <de184850f9534f55ad06ab6e0b8bc838>:0 \n  at Duplicati.Library.Utility.Utility+<EnumerateFileSystemEntries>d__21.MoveNext () [0x00113] in <d08ee1948ceb4d19838c99ad6efa8ab7>:0 ",
   "RemoteStackTraceString":null,
   "RemoteStackIndex":0,
   "ExceptionMethod":null,
   "HResult":4,
   "Source":"mscorlib"
}

The CIFS VFS: Send error in read = -4 happen much less frequently for me, but they do happen.

I will try to do an strace.

Got an strace but this is difficult to sift through as the file is huge. I’m trying to spot errors in strace log that correlate to the CIFS kernel warnings. (I have timestamps on both sides but it’s still challenging.)

There are tens of thousands of SIGRT_3 and SIGRT_4 events logged too, not sure if relevant to our issue.

If that’s because the formats are different, maybe dmesg -T could get something more human-friendly.
Seems like there’s possibly a tool somewhere (or one that could be made) to convert the time formats.
I’m kind of wishing now for sub-second time accuracy in Duplicati logs, but usually that just adds noise.

My first look with strace was just to look at signals. There’s a chance that it would cut down calls noise.

strace -o /tmp/strace.log -f -t -T -p <duplicati-child-pid> -e trace=signal

In other it-still-works news, I tried SMB2, then installed Samba and shared a folder from the VM to itself.
The main weirdness I got was Duplicati constraint errors, but that’s seemingly because something was returning bad information on the filesystem listing. I get some files missing and others seen many times.

Running the find command for a given name sees this. Running it to a file doesn’t, but ls -R can see it…
Setup is 10,000 files in a folder. Main oddity is it’s a shared VirtualBox folder from Windows that Samba shared again. It looks fine before Samba gets ahold of it. Possibly my slightly old Linux has some bugs.

Yeah I am using dmesg -T to get the time of the kernel messages.

Tried tracing signals, the resulting file is still quite large. There are still 1700 events that happened during the 1 second when one of the kernel warnings appeared.

Surprised you are not seeing the issue. What kernel version are you using? I’m testing on 5.4.0.

Using Linux Mint 18.1, an almost-EOL LTS release based on Ubuntu 16.04 LTS which is also on 4.4.0.
Looking at https://repology.org/project/linux/versions, I wonder if you’re on Ubuntu 20.04 which is 5.4.0.

In other it-still-works news, I copied the VirtualBox shared folder to a local folder in the VM, exported it, tested it with ls -lR --full-time again, got sane results, backed up, got reasonable results from it.

Agreed. I’m not sure what causes differences. Have you tried changes to what you’re backing up yet?

On the timestamp conversion, I thought of trying to use /proc/uptime’s first value, to adjust, then went looking for high resolution (like dmesg gives without the -T), but got discouraged by some discussion:

proc: make high precision system boot time available
kernel printk timestamps not aligned to syslog time

but possibly even with some time uncertainties, it might be possible to cut that down by a factor of 100.
How do I convert dmesg timestamp to custom date format? has some scripts, but most are to second.

EDIT:

A test suggests time issues are real. I’d have expected output from the second line to have higher time:

# cat /proc/uptime; echo mark > /dev/kmsg; dmesg | tail -1
21170.94 20369.10
[21170.711579] mark
# 

Yes, my test VM is running 20.04… was trying to recreate OP’s situation, but checking back I see he’s actually on 20.10.

I am wondering if older kernels don’t exhibit the issue.

TL;DR seems to be, which (if true) is worrisome because issues may increase as they become common.

Although I still can’t get the CIFS VFS: Send error in read = -4, I tried upgrading my Linux Mint 18.1 to see when any issues arose. Originally it was 4.4.0-197-generic. 18.3 stayed there, but gave an offer for 4.4.0-200-generic which I took. Still fine. 19 stayed there and stayed fine, then upgrade to 19.3 broke the Cinnamon desktop, which caused lots of headaches and basically wound up leaving VM nearly unuseable. Limping along, Update Manager gave an offer for kernel 5.4.0-62-generic. I took it, and the failures began…

$ mono /usr/lib/duplicati/Duplicati.CommandLine.exe backup file:///home/me/DuplicatiBackups/cifs/ /mnt/emptyfiles/ --backup-name=cifs --dbpath=/home/me/.config/Duplicati/NKKCFCKPXB.sqlite --encryption-module= --compression-module=zip --dblock-size=50mb --no-encryption=true --disable-module=console-password-input
Backup started at 1/17/2021 9:15:28 PM
Checking remote backup ...
  Listing remote folder ...
Scanning local files ...
  0 files need to be examined (0 bytes)
  4195 files need to be examined (0 bytes) (still counting)
  7958 files need to be examined (0 bytes) (still counting)
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Installer/OSX/app-scripts/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Tools/Commandline/RestoreFromPython/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Duplicati/GUI/Duplicati.GUI.TrayIcon/SVGIcons/dark/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Duplicati/Library/Backend/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Duplicati/Library/Backend/GoogleServices/Properties/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Duplicati/Library/Backend/OAuthHelper/Properties/ => Interrupted system call
  9510 files need to be examined (53.24 MB)
  8262 files need to be examined (53.24 MB)
  6920 files need to be examined (53.24 MB)
  5608 files need to be examined (53.24 MB)
  4308 files need to be examined (53.24 MB)
  2967 files need to be examined (53.24 MB)
  1789 files need to be examined (51.95 MB)
Error reported while accessing file: /mnt/emptyfiles/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/duplicati-2.0.5.1-2.0.5.1_beta_2020-01-18/Installer/debian/debian/ => Interrupted system call
  1551 files need to be examined (40.60 MB)
  1478 files need to be examined (33.67 MB)
  1334 files need to be examined (18.66 MB)
  1024 files need to be examined (14.51 MB)
  583 files need to be examined (8.53 MB)
  294 files need to be examined (4.18 MB)
  Uploading file (21.26 MB) ...
  Uploading file (202.24 KB) ...
  Uploading file (214.11 KB) ...
Checking remote backup ...
  Listing remote folder ...
Verifying remote backup ...
Remote backup verification completed
  Downloading file (214.11 KB) ...
  Downloading file (202.24 KB) ...
  Downloading file (21.26 MB) ...
  0 files need to be examined (0 bytes)
  Duration of backup: 00:01:25
  Remote files: 3
  Remote size: 21.66 MB
  Total remote quota: 18.58 GB
  Available remote quota: 4.22 GB
  Files added: 11539
  Files deleted: 0
  Files changed: 0
  Data uploaded: 21.66 MB
  Data downloaded: 21.66 MB
Backup completed successfully!
$ dmesg -T | grep CIFS
[Sun Jan 17 21:14:42 2021] CIFS: Attempting to mount //127.0.0.1/emptyfiles
[Sun Jan 17 21:15:40 2021] CIFS VFS: Close unmatched open
[Sun Jan 17 21:15:41 2021] CIFS VFS: Close unmatched open
[Sun Jan 17 21:15:41 2021] CIFS VFS: Close unmatched open
[Sun Jan 17 21:15:41 2021] CIFS VFS: Close unmatched open
[Sun Jan 17 21:15:42 2021] CIFS VFS: Close unmatched open
[Sun Jan 17 21:15:42 2021] CIFS VFS: Close unmatched open
[Sun Jan 17 21:16:18 2021] CIFS VFS: Close unmatched open

Because it seemed like all the errors were on folders, I tested with just folders, then thinking that maybe folder errors prevented seeing the contents, added a file in each folder. It’s changed a bit, but current is:

perl -e ‘for ($n=0; $n < 1000; $n++) { $path = sprintf("%04d", $n); mkdir $path; open my $fh, “> $path/empty.txt” }’

(or it might be 100 or 10000 depending on difficulty of hitting it. This is still hard to hit with my test setup)

Testing is all on-VM with an emptyfiles folder shared by Samba and SMB3 mounted at /mnt/emptyfiles. What exactly does one have to do (without requiring special equipment) to hit more issues more easily?

Because the system was so broken, I thought I’d make a new VM, and tried some Ubuntu 20.10, but the error didn’t reproduce as well, so I made a fresh Linux Mint 19.3 VM and an error began happening again.

$ strace -f -tt -o /tmp/strace.log -C -y mono /usr/lib/duplicati/Duplicati.CommandLine.exe backup file:///home/me/DuplicatiBackups/cifs/ /mnt/emptyfiles/ --backup-name=cifs --dbpath=/home/me/.config/Duplicati/NKKCFCKPXB.sqlite --encryption-module= --compression-module=zip --dblock-size=50mb --no-encryption=true --disable-module=console-password-input --log-file=/tmp/cifs.log --log-file-log-level=verbose

had four errors after I took a 10000 folder test area and perturbed it a little by resetting up 1000 of them.

Error reported while accessing file: /mnt/emptyfiles/1145/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/8178/ => Interrupted system call
Error reported while accessing file: /mnt/emptyfiles/9252/ => Interrupted system call
Failed to read timestamp on "/mnt/emptyfiles/8902/" => Interrupted system call

Next level down was the Duplicati verbose log:

2021-01-20 07:29:25 -05 - [Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]: Error reported while accessing file: /mnt/emptyfiles/1145/
System.IO.IOException: Interrupted system call
  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].FindNextEntry (System.Byte* entryBufferPtr, System.Int32 bufferLength) [0x0004c] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].MoveNext () [0x00054] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.Collections.Generic.LargeArrayBuilder`1[T].AddRange (System.Collections.Generic.IEnumerable`1[T] items) [0x0003d] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.Collections.Generic.EnumerableHelpers.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00033] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.MonoLinqHelper.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Directory.GetDirectories (System.String path, System.String searchPattern, System.IO.EnumerationOptions enumerationOptions) [0x00009] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Directory.GetDirectories (System.String path) [0x0000b] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at Duplicati.Library.Snapshots.SnapshotBase.ListFolders (System.String localFolderPath) [0x00000] in <f08558429eba42d081f51cfe1f70cf5a>:0 
  at Duplicati.Library.Utility.Utility+<EnumerateFileSystemEntries>d__20.MoveNext () [0x00113] in <b0ec73cdc8b845289fe2e9bdf696ccd0>:0 
2021-01-20 07:30:27 -05 - [Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]: Error reported while accessing file: /mnt/emptyfiles/8178/
System.IO.IOException: Interrupted system call
  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].FindNextEntry (System.Byte* entryBufferPtr, System.Int32 bufferLength) [0x0004c] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].MoveNext () [0x00054] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.Collections.Generic.LargeArrayBuilder`1[T].AddRange (System.Collections.Generic.IEnumerable`1[T] items) [0x0003d] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.Collections.Generic.EnumerableHelpers.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00033] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.MonoLinqHelper.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Directory.GetFiles (System.String path, System.String searchPattern, System.IO.EnumerationOptions enumerationOptions) [0x00009] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Directory.GetFiles (System.String path) [0x0000b] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at Duplicati.Library.Snapshots.SnapshotBase.ListFiles (System.String localFolderPath) [0x00000] in <f08558429eba42d081f51cfe1f70cf5a>:0 
  at Duplicati.Library.Utility.Utility+<EnumerateFileSystemEntries>d__20.MoveNext () [0x00213] in <b0ec73cdc8b845289fe2e9bdf696ccd0>:0 
2021-01-20 07:30:39 -05 - [Warning-Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-FileAccessError]: Error reported while accessing file: /mnt/emptyfiles/9252/
System.IO.IOException: Interrupted system call
  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].FindNextEntry (System.Byte* entryBufferPtr, System.Int32 bufferLength) [0x0004c] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Enumeration.FileSystemEnumerator`1[TResult].MoveNext () [0x00054] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.Collections.Generic.LargeArrayBuilder`1[T].AddRange (System.Collections.Generic.IEnumerable`1[T] items) [0x0003d] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.Collections.Generic.EnumerableHelpers.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00033] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.MonoLinqHelper.ToArray[T] (System.Collections.Generic.IEnumerable`1[T] source) [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Directory.GetDirectories (System.String path, System.String searchPattern, System.IO.EnumerationOptions enumerationOptions) [0x00009] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.Directory.GetDirectories (System.String path) [0x0000b] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at Duplicati.Library.Snapshots.SnapshotBase.ListFolders (System.String localFolderPath) [0x00000] in <f08558429eba42d081f51cfe1f70cf5a>:0 
  at Duplicati.Library.Utility.Utility+<EnumerateFileSystemEntries>d__20.MoveNext () [0x00113] in <b0ec73cdc8b845289fe2e9bdf696ccd0>:0
2021-01-20 07:35:40 -05 - [Warning-Duplicati.Library.Main.Operation.Backup.MetadataPreProcess.FileEntry-TimestampReadFailed]: Failed to read timestamp on "/mnt/emptyfiles/8902/"
System.IO.IOException: Interrupted system call
  at System.IO.FileStatus.EnsureStatInitialized (System.ReadOnlySpan`1[T] path, System.Boolean continueOnError) [0x00039] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.FileStatus.GetLastWriteTime (System.ReadOnlySpan`1[T] path, System.Boolean continueOnError) [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.FileSystemInfo.get_LastWriteTimeCore () [0x00011] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.FileSystemInfo.get_LastWriteTimeUtc () [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.FileSystemInfo.get_LastWriteTime () [0x00000] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.FileSystem.GetLastWriteTime (System.String fullPath) [0x00009] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at System.IO.File.GetLastWriteTimeUtc (System.String path) [0x00006] in <d13c8b563008422a8c5aaec0a74089cc>:0 
  at Duplicati.Library.Snapshots.SnapshotBase.GetLastWriteTimeUtc (System.String localPath) [0x00000] in <f08558429eba42d081f51cfe1f70cf5a>:0 
  at Duplicati.Library.Main.Operation.Backup.MetadataPreProcess+<>c__DisplayClass2_0.<Run>b__0 (<>f__AnonymousType7`3[<Input>j__TPar,<StreamBlockChannel>j__TPar,<Output>j__TPar] self) [0x000f8] in <8f1de655bd1240739a78684d845cecc8>:0 

A heuristic to find the four errors is below. Three were getdents to read directory entries, and one was an lstat to get file status. Heuristic method handles system calls which split into “unfinished” and “resumed”, but assumes the thread (first column) doesn’t intermix with getdents from other threads (if that happens, then you have to go looking manually, e.g. with less). In all cases a tgkill (not shown) did the interruption.

You can see from the man pages that neither getdents nor lstat are supposed to give EINTR, yet they did.

$ fgrep 'getdents' /tmp/strace.log | egrep -B 1 'getdents.*EINTR' | fgrep 'getdents'
13739 07:29:25.548372 getdents(16</mnt/emptyfiles/7983>, /* 0 entries */, 1048576) = 0
13737 07:29:25.598998 getdents(12</mnt/emptyfiles/1145>, 0x7f61ec0261d0, 1048576) = -1 EINTR (Interrupted system call)
13741 07:30:27.171411 getdents(16</mnt/emptyfiles/8178> <unfinished ...>
13741 07:30:27.203564 <... getdents resumed> , 0x7f61f80307d0, 1048576) = -1 EINTR (Interrupted system call)
13744 07:30:39.190608 getdents(16</mnt/emptyfiles/9252> <unfinished ...>
13744 07:30:39.226441 <... getdents resumed> , 0x5602c7c3b0e0, 1048576) = -1 EINTR (Interrupted system call)
$ egrep 'lstat.*EINTR' /tmp/strace.log
13736 07:35:40.582932 lstat("/mnt/emptyfiles/8902", 0x7f61e54fd2c0) = -1 EINTR (Interrupted system call)
$ 
1 Like

Nice work troubleshooting!

Although I still can’t get the CIFS VFS: Send error in read = -4

For me this error is quite rare compared to the “Close unmatched open” one, which I agree always seems to be only on folders.

My test data set has about 25k files in it totally 9GB, so the files are pretty small on average. I think I’d get the “Send error in read” maybe half a dozen times during the first full backup. Pretty sure the “Send error in read” message was always tied to an actual file, not a folder.

What might be helpful is if we could come up with some very basic mono application that triggers this problem, then open an issue against the mono project. With their eyes on it maybe the devs could decide if it’s a mono bug or not.

Directory.GetFiles Method example is a bit less basic, but can trigger given the right signal environment.
The Mono log profiler (in mono-complete package and other spots) signals enough. I’m using its syntax:

$ mono --profile=log:help some.exe
...
sample[-real][=FREQ] enable/disable statistical sampling of threads
                     FREQ in Hz, 100 by default
                     the -real variant uses wall clock time instead of process time

mono --profile=log:sample=1000 Microsoft.GetFiles.exe /mnt/emptyfiles fails on Linux Mint 19.3 with kernel 5.4.0-62-generic, but works on Manjaro Linux with kernel 5.9.16-1-MANJARO, and works on openSUSE Tumbleweed with kernel 5.10.9-1-default. I’d guess the reason is they have a kernel patch:

[PATCH] cifs: allow syscalls to be restarted in __smb_send_rqst()

Let me the kernel decide to whether or not restart the syscalls when
there is a signal pending in __smb_send_rqst() by returing
-ERESTARTSYS. If it can’t, it will return -EINTR anyway.

I wasn’t (and still am not) sure how general a fix this is, but for whatever reason the getdents/getdents64 system calls underneath the readdir that I think mono is using are doing better. Some distro kernels may eventually catch up. The two distros I picked for their test are rolling releases known to stay quite current.

Linux 5.10-rc7 shows it going into Linus’ kernel candidate Dec 6, 2020.
Linux 5.9.14 shows it going into that kernel.org kernel on Dec 11, 2020.

For additional testing, I also used a C program to send signals, with directory readers written in C and go.
I’m not sure what go has internally, but in the C program I used SA_RESTART in sigaction, so in patch’s context, I tell the kernel to restart, and then the patch allows linux-cifs to ask the kernel, so it does restart.
Both of these programs failed on the older Linux Mint system, and worked on the two leading-edge ones.

Ubuntu 20.10 with kernel 5.8.0-41-generic fails. This might be a Jan 18, 2021 kernel, but I don’t know how hard Ubuntu works to keep their kernel up to date with the latest kernel patches. SUSE documents nicely.