Backups start failing after a week or so

Sorry, I was not fully clear - the files do not disappear, the NFS share does not disappear, everything is there even before I ran a “mount -a” for good measure. For example, it performed its backup last Friday without issue, but then come Monday morning and it goes to run the same job, it errors. No amount of repair, delete & repair, purge broken or anything else would fix the job. At one point it told me I had extra files, the ones from Friday as it turned out, so I removed them and tried again, and I was back to missing of different files.

Since I wrote the original post I have now uninstalled and re-installed Duplicati (including Mono) hoping it might remove the weird BouncyCastle messages, it did not, plus I deleted the backup job, database and backup files, created a new one and the job ran first time successfully, but still with those messages. I will now see how it behaves in the coming days.

Ok, so what you’re saying is the files Duplicati claims are missing are actually present in the NFS share?

Are you backing up to Backblaze B2? It started failing last week because they updated the SSL certificate and Duplicati doesn’t accept the new certificate. See separate thread on that.

Yup, hopefully I’ll no more soon, as either way recreating the job will fix or do nothing for the problem. I’ll make sure to properly document what goes on.

Nope, NFS share as stated

Small related question while I continue to diagnose this, is it possible to connect to an NFS share with the backup job in the same way as I would do with an SMB share? At the moment I’m using an auto-mount in the OS itself to mount during boot, which has it’s own issues, so I was wondering if I could get Duplicati to do it instead in the same way as I do for Windows machines.

No, doesn’t look like Duplicati supports NFS directly. It’s not one of the supported back ends.

A similar test might be to uninstall your current 2.0.6.100 and downgrade to the previous Canary 2.0.5.114_canary_2021-03-10, Alternatively, do you recall if 2.0.6.100 is when messages began?
A somewhat less disruptive real test would be to unzip 2.0.5.114 and add a file to Duplicati area:
/usr/lib/duplicati/BouncyCastle.Crypto.dll

Theory behind this is that MailKit.dll 2.3.1.0 has a dependency on MimeKit >= 2.3.1 which has a
dependency on BouncyCastle >= 1.8.5 which Duplicati shipped until 2.0.6.100 removed its use,
possibly not realizing that there was an indirect use through MailKit which is used to send email.

Weighing against the theory is that 2.0.6.100 has been out four months, so who gets this or not?

Regarding NFS, no ideas yet, but did anything known change around then that might be related?

Nor does it support SMB. Storage Type is Local folder or drive I guess. OS does the rest.

Do you mean the allegedly missing files are there, or just something is there that looks a backup?
An Information-level log is good to get. It shows the remote file actions and the missing file names.
Use About → Show log → Live → Information or log-file=<path> and log-file-log-level=information.

Tried copying the BouncyCastle.Crypto.dll from the previous RPM file and the message has now stopped appearing for any jobs, so it does seem something needs fixing in MimeKit like you thought.

I do however get this which I think was logging before:

{"ClassName":"System.Net.WebException","Message":"Error: TrustFailure (Authentication failed, see inner 
exception.)","Data":null,"InnerException":{"ClassName":"System.Security.Authentication.AuthenticationException","Message":"Authentication failed, see inner 
exception.","Data":null,"InnerException":{"Message":"Ssl error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED\n  at 
/build/mono-2liaiA/mono-5.18.0.240+dfsg/external/boringssl/ssl/handshake_client.c:1132","Data":{},"InnerException":null,"StackTrace":"  at 
Mono.Btls.MonoBtlsContext.ProcessHandshake () [0x00054] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at Mono.Net.Security.MobileAuthenticatedStream.ProcessHandshake 
(Mono.Net.Security.AsyncOperationStatus status, System.Boolean renegotiate) [0x000c6] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at (wrapper 
remoting-invoke-with-check) Mono.Net.Security.MobileAuthenticatedStream.ProcessHandshake(Mono.Net.Security.AsyncOperationStatus,bool)\n  at 
Mono.Net.Security.AsyncHandshakeRequest.Run (Mono.Net.Security.AsyncOperationStatus status) [0x00006] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at 
Mono.Net.Security.AsyncProtocolRequest.ProcessOperation (System.Threading.CancellationToken cancellationToken) [0x0012a] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  
at 
Mono.Net.Security.AsyncProtocolRequest.StartOperation (System.Threading.CancellationToken cancellationToken) [0x000a4] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 
","HelpLink":null,"Source":"mscorlib","HResult":-2146233088},"HelpURL":null,"StackTraceString":"  at 
Mono.Net.Security.MobileAuthenticatedStream.ProcessAuthentication 
(System.Boolean runSynchronously, Mono.Net.Security.MonoSslAuthenticationOptions options, System.Threading.CancellationToken cancellationToken) [0x00336] in 
<a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at Mono.Net.Security.MonoTlsStream.CreateStream (System.Net.WebConnectionTunnel tunnel, System.Threading.CancellationToken 
cancellationToken) [0x0018c] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at System.Net.WebConnection.CreateStream (System.Net.WebOperation operation, System.Boolean 
reused, System.Threading.CancellationToken cancellationToken) [0x001f5] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 
","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233087,"Source":"mscorlib"},"HelpURL":null,"StackTraceString":"  at 
System.Net.WebConnection.CreateStream (System.Net.WebOperation operation, System.Boolean reused, System.Threading.CancellationToken cancellationToken) [0x00275] in 
<a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at System.Net.WebConnection.InitConnection (System.Net.WebOperation operation, System.Threading.CancellationToken 
cancellationToken) [0x0015b] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at System.Net.WebOperation.Run () [0x000b7] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at 
System.Net.WebCompletionSource`1[T].WaitForCompletion () [0x000b1] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at System.Net.HttpWebRequest.RunWithTimeoutWorker[T] 
(System.Threading.Tasks.Task`1[TResult] workerTask, System.Int32 timeout, System.Action abort, System.Func`1[TResult] aborted, 
System.Threading.CancellationTokenSource 
cts) [0x00118] in <a9a08e39ba304bd0a84c49bd158dfc02>:0 \n  at Duplicati.Library.Utility.AsyncHttpRequest+AsyncWrapper.GetResponseOrStream () [0x0004d] in 
<d644d641ef094b6cbc223bc3a32d53af>:0 \n  at Duplicati.Library.Utility.AsyncHttpRequest.GetRequestStream (System.Int64 contentlength) [0x00068] in 
<d644d641ef094b6cbc223bc3a32d53af>:0 \n  at Duplicati.Library.UsageReporter.ReportSetUploader+<>c.<Run>b__3_0 (CoCoL.IReadChannel`1[T] chan) [0x000c4] in 
<5d01de97f88c47219817ca3929a597f5>:0 
","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233079,"Source":"Duplicati.Library.Utility"}

With regards to NFS, I assumed that using an SMB share like this meant that it was supported which is why I asked if I could do the same with an NFS share

BTW, I think I might have found the NFS file issue which could be a permissions problems with the files themselves, so I have reset the owner/rights on all share and will see how it goes. How they got messed up is beyond me.

SMB is not directly supported by Duplicati, but as you noted on a Windows system you can use UNC path as the destination because the underlying OS handles the work. UNC paths do not work on non-Windows systems. So that’s why we say it doesn’t directly support SMB.

What you do on Windows isn’t so clear, but because you’re talking about mounting, maybe you want this:

How To Automatically Mount NFS File System with AutoFS

Basically mount on access rather than mount at boot? If so, there’s lots more information on the Internet.

I don’t personally use this and I don’t know how common it is, but I don’t recall hearing it hurts Duplicati…
Another thing that people can do is to use Duplicati scripting options to mount, do backup, then unmount.
I suppose one consideration in all this is which solution gives the desired ability to deal with mount errors.

As a test, you can try changing Settings for Usage statistics:

Setting this to None will disable reporting usage statistics at all.

which looks like it will disable upload. Does it always fail (maybe once per backup) or only sometimes?
What OS is this? This error looks like one that’s begun due to mono, OS, and Let’s Encrypt certificates:

Http send report errors / duplicati-monitoring has one of the better workaround lists, but there are others.

So far so good with my NFS share, I think the issue was with some weird permissions on some of the files, though doesn’t explain how it would sometimes work.

I disabled the reporter and the message has gone - it was actually the only one of my machines with it still enabled.

Thanks for the tip on that other thread, as the SSL message does appear to be due to the old Let’s Encrypt CA certificate still being present - the OS on the problem machines is Raspberry Pi OS (Buster) but it’s ca-certificates package is still dated 2020 even though there is definitely a newer one for Debian now. After I removed the certificate and did the update-ca-certificate it went away along with the block to being able to use an SSL certificate (not an LE one) for the web interface which was nice.

2 Likes

After a backup pause of 14 days I had the error “Failed: Found x files that are missing from the remote storage, please run repair” as well.

I’m using 2.0.6.1_beta_2021-05-03 under WIN10. My backup destination is a USB3 Harddisk.

I tried to repair and executed purge-broken-files,but with no avail.
As solution I started over with a new backup job.

For years I had never problems with dupicati. They strarted with 2.0.6.1.
Is it an idea to downgrade to an ealier version?

I don’t think there is any known bug with the current versions that would cause files to be “missing” on the back end, so I don’t think I’d recommend downgrading.

Maybe there was corruption on that external USB drive that caused files to be lost?

Ok.

Maybe there was corruption on that external USB drive that caused files to be lost?
Of course I can never be 100% sure with an USB device.
Than I’ll assume this possibility.

Does this mean the pause button was used sometime in the middle of a backup, or no backups were run?

There is one odd issue where Duplicati deletes dindex files during a compact, gets some error, then loses track of its deletions. This bug is present in 2.0.5.1 though, so going back to that won’t help avoid the issue.

Unfortunately default logging doesn’t give lost file names. About → Show log → Live → Warning can show. Problem mentioned above would be deleted-then-forgotten-deleted dindex files, but not enough info to say what happened, e.g. did Duplicati think it wrote the file (per log), then the file didn’t actually make it to drive?

At start of backup, or at end? If at start, look carefully at the previous backup to see if anything went wrong.
Ideally you set log-file=<path> at log-file-log-level=Information so that you have evidence of all file activities.

If problem is reproducible occasionally, having records like these will help to understand what happened…

No, the backup was uninterrupted.

Few seconds after start.

I set log-file-log-level=verbose. Perhaps “infomation” is enough to have a better survey what happed during the last backup.

But indeed is it not unprobable that the USB drive had a filesystem damage. I’ll run a chkdsk before executing backup.to get a better feeling when a repair took place…

So I’m guessing that the 14 days was just the gap between backups. Does pre-gap backup log look good?

That’s when it looks at backup to see what’s there, which should (in lack of oddities) what prior backup left.

Information should show all the file transfers. If it weren’t a local drive, Retry is better. Networks error more.

I think the issue was with some weird permissions on some of the files, though doesn’t explain how it would sometimes work.

You’re right that doesn’t explain it. It wouldn’t do that so its not it. Duplicati has some code issues at the very least that can work for some time then fail. Other computers won’t have a problem. Some will.

In programming, what happens can be explained as a possible race condition or possible errors that things may throw and at code where Duplicati doesn’t deal with those errors so things can happen or not happen.

What happened is likely a Duplicati problem that you just happened to temporarily or hopefully better workaround.

I did extensive debugging of an issue with Duplicati and it has some problems. I’d expect this possibility for years depending on if and when and if they can even find all the issues here. The Duplicati code is really complex so its not going to be easy for them.

Can you describe the permission situation some? I doubt Duplicati goes out of its way to change these, but because we’re rather low on evidence, might as well look at all that’s known before rejecting it as irrelevant.

I guess I might as well point, although what I really want is for anybody who thinks they might be seeing this somewhat often to maintain a log-file=<path> at log-file-log-level=Information or maybe slightly larger Retry.

Error during compact forgot a dindex file deletion, getting Missing file error next run. #4129 . Alternatively but not as good, use About → Show log → Live → Warning after failure to at least see what files are “missing”.

would also benefit from knowing the type of files, e.g. dblock or dindex, and where they were in the uploads. After the backup itself, version deletion can cause compact to run. This does downloads and uploads while working. It can at least lose track of deletions, per the issue above. Perhaps it can lose track of uploads too.

Keeping a separate log file not only shows whether compact ran, but shows odd endings and all file names.

Posting a link to a database bug report of the database with the problem can possibly provide evidence too, but it’s subject to the database forgetting some things on its way out (transaction rollback) so a log is better.

Finding reported issues is a lot easier when details are available, thus I ask to see if this is the known issue. Permission problem (whether or not it’s a cause) will definitely will be hard to find without better description.