Connection dropout appears to corrupt database

First of, thank you kindly for making this excellent project available and thank you to everyone taking the time to provide support. I’ve been lurking on the forums for a long time and have learned a lot about how Duplicati works.

The problem:

  1. Sometimes while running a large backup to WebDAV, my network connection appears to drop for 10-30 seconds.
  2. This appears to cause a missing file error in AppData.
  3. In turn corrupting the database.
  4. None of the common recovery options seems to work. Only deleting and recreating the database (a ~30 hour operation for me).
  5. The backup will then run fine for days, before succumbing to the same issue again.

Any tips/pointers on things I should try in order to (1) avoid the issue with missing files or (2) recover without recreating the database?

Unsuccesful attempts to fix:
repair database (Repair not possible, missing 513 blocks.)
purge-broken-files (Unable to start the purge process as there are 6 orphan file(s))
rebuild-missing-dblock-files (Found 16 files that are missing from the remote storage, please run repair)
deleting the faulty backup (denied due to orphan files)
Excluded temporary folder from Windows’ antivirus.
Moved temporary folder to a different location.
Setting http-operation-timeout = 1 day

My setup:
Version: / also attempted
Source: Local files
Destination: Nextcloud via WebDAV (SSL enabled)

The complete errors:

Running the backup:
[Error-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteFiles]: Found 16 files that are missing from the remote storage, please run repair

[Error-Duplicati.Library.Main.Operation.RepairHandler-CleanupMissingFileError]: Failed to perform cleanup for missing file:, message: Repair not possible, missing 515 blocks.2021-03-24 21:04:00 +01 - [Error-Duplicati.Library.Main.Operation.RepairHandler-CleanupMissingFileError]: Failed to perform cleanup for missing file:, message: Repair not possible, missing 513 blocks.

The operation PurgeBrokenFiles has failed with error: Unable to start the purge process as there are 6 orphan file(s)

2021-03-24 20:42:52 +01 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteFiles]: Found 16 files that are missing from the remote storage, please run repair

What I believe may be causing the issue:
System.IO.FileNotFoundException: Could not find file ‘C:\Users<username>\AppData\Local\Temp\dup-5b1345dc-9fb2-473a-b11b-f4f616fa6849’.

Thank you kindly for your time and any pointers.

1 Like

From what I recall, there are other posts on this affecting large backups. I don’t know why that would be.

number-of-retries (default 5) and retry-delay (default 10 seconds) should handle that. Do you have logs?
log-file=<path> with log-file-log-level=retry could be left to run for the next time that connection drops out.
RetryAttempts in Complete log for the job should also give you count for a backup that’s able to finish.

Bad for you, but good for debug. Nobody’s been able to get a good look at this. If you attempt to make the actual backup more reliable by changing the retry parameters, can you do another to explore the failures?

I think nobody’s been willing to post a sanitized database from Creating a bug report, so little can be said.
You could break this unfortunate trend. In order to study and fix problem, problem must be seen up close.
Ideally someone finds steps to reproduce a problem, and then it can be looked at closer in lab conditions.

If you’ve noticed any places where good information was provided on this issue, feel free to post the links.

Yes this error has some reports, but what caused it? I’m not sure if all end as yours did. Need some data.
Duplicati uses lots of temporary files, and the names are all very similar. The location in the backup run is something of a clue. One guess is that this happens at the end of backup, but this is where log info helps.

Do you have any feel for when yours went bad, e.g. do you know start and error time and typical run time?

1 Like

Thank you kindly for the response and the specific tips, @ts678. Much, much appreciated!

I am going to “fork” my current setup:
One with the settings you suggested.
A second with unchanged settings, other than an increased log-level.

I will report back with the details once the issue re-appears.

It definitely looks to trigger around the end of the backup as you suggest.

I wonder if copying the local database before every backup is a strategy. Would be attractive to be able to simply roll it back, rather than do the +30 hour recreate, but I may be complicating things further.

I think you can sometimes restore from a previous database (if you kept a copy), but it will see unknown remote files, and a Repair will fix that by removing them (which might mean you just lose latest change).

If you delete versions, the old DB still expects them, and Compacting files at the backend may add new dblock files and delete old ones. If you have space, you might avoid that by letting old versions build up…

[Idea] Backup Duplicati database to avoid recreate has some of my thinking on this – and also my script.
Possibly it won’t be needed if cranking retries way up can ride through whatever outages you’re having…

This would be wonderful. If it’s really a connection thing, you can also try reducing your retry-count for this.
That’s how I run some of the time, just to invite errors, but if error needs a large backup, mine may not be.

1 Like

Thank you kindly for the suggestions once again. I have implementet your script to copy the database before running the backup. Per your suggestion, I also set the log-level to ‘retry’ which may have revealed something new. After running the backup again, it appears to be stalling at “Verifying backend data…”. The log shows:

2021-03-28 14:16:44 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (38.48 MB)
2021-03-28 14:16:44 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()
2021-03-28 14:17:28 +02 - [Retry-Duplicati.Library.Main.BackendManager-RetryList]: Operation List with file attempt 1 of 288 failed with message: Root element is missing.
System.Xml.XmlException: Root element is missing.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.Load(Stream inStream)
at Duplicati.Library.Backend.WEBDAV.ListWithouExceptionCatch()
at Duplicati.Library.Backend.WEBDAV.List()
at Duplicati.Library.Main.BackendManager.DoList(FileEntryItem item)
at Duplicati.Library.Main.BackendManager.ThreadRun()

I set the retry timer to 5 minutes and maximum attempts at 288, so this would probably go on for 24 hours. Hitting “Stop” didn’t break the loop and I eventually killed the process in Task Manager. Re-running the backup takes me back to the “Root element is missing” error, so for now I’m stuck here. Judging by the number of search result hits (1), this is a pretty rare error and maybe not related to the other issue.

Duplicati reported what it saw as invalid XML. Question is – what did it see? Wikipedia on Root element.

Is this remote, with SSL/TLS for security? Seeing received data would be far simpler without encryption.


–debug-propfind-file possibly could help, but I haven’t used it, and I also don’t have any WebDAV to try…
If testing it, you might want to start with a small backup, and find out it seems to have the file listing data.

Ah I see after several attempts at seeing what they’re doing.

So its hitting Exception

catch (Exception ex)
        lastException = ex;
        Logging.Log.WriteRetryMessage(LOGTAG, $"Retry{item.Operation}", ex, "Operation {0} with file {1} attempt {2} of {3} failed with message: {4}", item.Operation, item.RemoteFilename, retries, m_numberofretries, ex.Message);

The item.RemoteFilename there appears to be meaningless as they are running this on DoList() > m_backend.List().ToList(), which is not on item.RemoteFilename.

So the root element is missing earlier at Duplicati.Library.Main.BackendManager.DoList(FileEntryItem item) > at Duplicati.Library.Backend.WEBDAV.List() > at Duplicati.Library.Backend.WEBDAV.ListWithouExceptionCatch()

Looks like a WEBDAV releated issue.

You can work around this by using something other than WEBDAV at this time. I don’t know what else you can do except wait and see if it gets fixed with Duplicati.

Suppose it could be a WEBDAV or setup issue really I don’t think that should cause an xml returned for Duplicati to have a missing root element unless the xml is not coming from Duplicati and the problem is elsewhere.

Also to add, their ListWithouExceptionCatch() is fairly complex but goes into doc.Load() which looks about where the problem is happening. That takes in remote file for sure.

So I’m going with PROPFIND aka WEBDAV properties xml retrieval.

ListWithouExceptionCatch() > req.Method = “PROPFIND”; > doc.Load()

Looks like the file could be out m_debugPropfindFile > “debug-propfind-file” > which is I don’t know but should be webdav properties. Or it might be ““allprop” request should return all properties.”

So looks to me like a WebDav properties issue on the NAS side. It doesn’t really look like Duplicati reading of it would be failing.

If there’s some specific file or something, I can’t easily tell what its doing exactly without a debugger or more logs. Its not readily apparent to me.

Suppose going with the original intent that a connection dropout can cause it depending on what it is. If it was definitely working fine before the dropout then it could very well be.

See above debug-propfind-file note too, but to add to the debug options you can test a list using Duplicati.CommandLine.BackendTool.exe. You can give it Target URL from Export As Command-Line.
If need be, it’s also possible to do network tracing after SSL/TLS decryption, but it gets a bit involved…

Thank you kindly for the responses and the very informative details. I think you are correct that something is faulty with the WebDAV instance.

I wish I could disable SSL for testing purposes, but the instance is hosted outside my control.

Investigating as per your advice, I now found that even “Test connection” throws the “Root element is missing”.

Things may go awry before “debug-propfind-file” has a chance to trigger. At least I didn’t see any output to this file while testing.

Testing the list command with Duplicati.CommandLine.BackendTool.exe triggers the “Root element is missing” message before listing any files.

Finally, I mapped the WebDAV path directly in Windows - which absolutely should work - and this too timed out before listing any files…

I think I owe Duplicati an apology for this one.

To sum up:
I have several Duplicati backups that connects to the same Nextcloud instance via the same settings using WebDAV. They all work, except the one above which suddenly started throwing the “Root element is missing” error.
The only obvious difference between them, is that the faulty backup is close to 4TB of data and the others are <10GB each.
Perhaps the Nextcloud instance is choking on listing the 80.000 or so files (volume size of 50MByte).

I think my next attempt will be to recreate the backup using a much larger volume size.

Is there any information I can provide that would be useful for Duplicati development? Realising that this really does look like an external issue.

Once again, thank you for all your help.

It could be amount of files. Maybe its interacting badly with the way Duplicati does it.

On the first run the PROPFIND_BODY is empty. Writes an empty body to WebDav…

 private static readonly byte[] PROPFIND_BODY = new byte[0];
 using (System.IO.Stream s = areq.GetRequestStream())
                 s.Write(PROPFIND_BODY, 0, PROPFIND_BODY.Length);
             var doc = new System.Xml.XmlDocument();
             using (var resp = (System.Net.HttpWebResponse)areq.GetResponse())

An empty PROPFIND request body MUST be treated as a request for the names and values of all properties.

This should return the names of all files and folders… If there’s a lot who knows how that will go with 4TB.

But they to have a depth of 1

req.Headers.Add("Depth", "1");

If you want to list the contents of a directory (collection) you must use the PROPFIND method instead, which should return an XML document with all the information.

The m_debugPropfindFile only runs instead if it is not null or empty

if (!string.IsNullOrEmpty(m_debugPropfindFile))

Its also possible WebDav has some other limitation that’s only happening during 4TB but you don’t have with the lesser backups or who knows.

Rather complicated thing. Debugger would show what’s happening quite fast.

There are other WebDAV clients you could try, to see how they handle the situation (or report the error).

Although it’s not usually a problem except on OneDrive, Duplicati has a couple of variables you can test:

http-operation-timeout is the one that OneDrive can trip over because OneDrive’s default timeout is low.
http-readwrite-timeout doesn’t usually cause problems, but might as well play with it while you’re testing.
I don’t think most of the small command line tools take these options. GUI and CommandLine client can.

One might think a timeout should be reported differently, but who knows. I think there have been cases in some protocols which were supposed to give a nice JSON response but instead server sent HTTP error.

Improved error messages for non-json errors #4195 was one such bug. @Xavron is looking in the code already, and I wonder if there’s some other way that the error message might be a misreport of an issue?

That’s the problem with problems seen in non-lab conditions – how to examine it with one’s favorite tools.

Maybe it’s time to do the network tracing I mentioned, but don’t post casually because it contains secrets.
Doing this on a CLI tool will be best. If I recall correctly, you already have a .config file for each program.
Editing that to insert the information Microsoft shows will let you see what comes back. Try that on some working situation first, see what sort of XML to expect, then try it on the failing one and see what it said…

Yeah, it keeps on bugging me. Very curious as to the real issue. But, I will probably need to give it up after this post.

Was also thinking of that also :slight_smile: Seems an empty response could cause it. Not sure if you already covered that.

Also, a blank XML file will return the same Root elements is missing exception.

Edit: So yes, it does. A little test shows



{System.Xml.XmlException: Root element is missing. at System.Xml.XmlTextReaderImpl.Throw (System.Exception e) [0x00027] in /builddir/build/BUILD/mono- at System.Xml.Xm…}

This is different from eg no files in a folder. It does this on empty contents going into xml document load.

An empty xml property value doesn’t appear to do it. eg <a:getcontenttype></a:getcontenttype> appears to be OK.

Various other things show position exceptions if closing removed or misspelled. Unexpected end of file possible if latter half hacked off, undeclared prefix is some parts hacked off the beginning and end.

Currently not seeing anything else. Even this passes fine for me python - Detect empty XML root element - Stack Overflow

The next closest I can get is this with multiple root elements:


So something else entirely might be happening if nothing is being returned. The code might continue into this code simply because that’s the next thing it does regardless even if something else happened. Just happens to be empty.

Wonder if something like this could happen here and the response is empty because of eg high server load Received an empty WebDAV response – BitTitan Help Center

Might also be possible if file is not found but there should be an error in the response for that.

Or if it was going into the other block it could be FileStream giving m_debugPropfindFile empty contents then going into xml document Load.

I will look into this further if I get webdav working on my nas server something which failed to work the last time I tried.

I intentionally disabled webdav in the middle of a backup upload then killed Duplicati and it continued fine after. Its possible that other connection situations could have a negative affect. There are various possible errors and things that can potentially happen and some can take a while to trigger so I can’t rule that out yet.

Since I have it working now will continue to test it. I will probably go silent on this unless I find something important to solving this.

One last thing is that in advanced in Duplicati there is a thing to enable better logging of propfind. You could try edit on the backup > options > advanced > dump the prop find to log. Haven’t tried it yet. Just noting it. Do not upload this. It might be sensitive, I don’t know what it will contain. It might also be too big to easily open.

On a different note, there is potentially lot of work on both ends at 80,000 files with webdav. That is definitely true that its dealing with all that and may even need to convert to xml from sql depending. So possibly a lot of lengthy loops and cpu usage happening that something else might be better suited for. Dropping it down a lot by increasing the volume size as you are trying could possibly be a wise choice either way. :+1:

1 Like

The prior direction was to “fork” the setup, trying to fix one, and leaving the other for debug with database history and increased log level. I don’t know what became of that, but the intent was to better understand problem, which ideally should be handled better even if the cause was external. Did you/can you pursue?

Although causes and failure paths might be different, we have two of these today that we’re wrestling with without enough information. The effort you had going was one of the more ambitious troubleshooting tries.

Thanks for any help you can give in seeing how it fails.

I apologize for the not responding earlier. I had some trouble running both backups, since during one of the attempts I switched to one of the canary releases with a newer database version. I do have all configs, logs and should be able to recreate and trigger the error.

Right this moment, I am running the same backup, but with a remote volume size of 1 GByte (up from the default 50 MByte). As luck would have it, I got a similar error to the one that seemed to trigger the cascading effect mentioned in the original post above. The major difference being, that this time Duplicati looks to have successfully corrected the error.

The flow:

At some point during the upload, the Nextcloud server responded with an internal server error (500).
I happened to spot it “live” and saw that the Nextcloud provider showed a “Down for maintenance” when attempting to access the server.

Duplicati retries and the server eventually recovers, goes down and recovers again. This seems to trigger renaming/missing of files that I do not quite understand.
The backup continues for hours before succumbing to:

Found 41 files that are missing from the remote storage, please run repair

I select Repair

The backup storage destination is missing data files. You can either enable --rebuild-missing-dblock-files or run the purge command to remove these files. The following files are missing: […]

I re-run the backup with “–rebuild-missing-dblock-file” enabled.

Found 41 files that are missing from the remote storage, please run repair

From commandline I run: purge-broken-files

Listing remote folder …
Uploading file (4.70 MB) …
Deleting file …
Return code: 0

Unlike the “bad” backup, this went smoothly and the backup is restarted.

I have attached a log copy of the above: (72.9 KB)

Backup to WeDAV / Nextcloud never completes is getting these too. I don’t know whose Nextcloud it is.
Yours is interesting because of the “Down for maintenance” shown by the provider of your Nextcloud…

Does the provider provide any support? They possibly know oddities of their Nextcloud better than I do.
In addition to the (500) Internal Server Error. the server returns (503) Server Unavailable.

Both should be handled by Duplicati retries (until exhausted). What’s really terrible is server loses files which have been accepted without error. Duplicati can (and did) alert you, but it can’t Repair this well.

Fortunately there is the option (which you are using) to purge backup of files damaged by loss, and try backup again, which it looks like you’re doing, and though it’s only a few hours run, looks good so far…

The renaming that’s normal is that if a file upload fails, a new name is chosen for the next try, instead of guessing the results of the file under its errored try. Did it all get up, or none, or some? It doesn’t matter.
Same content is uploaded into a file with new name. Old file name that had upload error is deleted later.

2021-04-07 15:58:42 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: (1,023.98 MB)
2021-04-07 16:35:32 +02 - [Retry-Duplicati.Library.Main.Operation.Backup.BackendUploader-RetryPut]: Operation Put with file attempt 1 of 288 failed with message: The remote server returned an error: (500) Internal Server Error.
2021-04-07 16:35:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Retrying: (1,023.98 MB)
2021-04-07 16:40:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Rename: (1,023.98 MB)
2021-04-07 16:40:32 +02 - [Information-Duplicati.Library.Main.Operation.Backup.BackendUploader-RenameRemoteTargetFile]: Renaming "" to ""
2021-04-07 16:40:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: (1,023.98 MB)
2021-04-07 17:10:50 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.98 MB)
(long time)
2021-04-10 15:00:08 +02 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoveUnwantedRemoteFile]: removing remote file listed as Deleting:
2021-04-10 15:00:08 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: ()
2021-04-10 15:00:10 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: ()

Your missing files are both long (dblock) and short (dindex). To pick on a short one, here’s report on it:

2021-04-10 15:00:50 +02 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingFile]: Missing file:

Here is its upload, which was reported as a success, yet when directory was listed later, file is missing.

2021-04-07 20:57:35 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: (669.18 KB)
2021-04-07 20:57:36 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (669.18 KB)

Thank you, once again, for the very detailed answers.

That is quite worrisome. Strangely, when the Nextcloud client first reported that the instance was offline, the upload via WebDAV looked to continue (successfully) for quite some time. This is of course way outside of anything to do with Duplicati. From my vantage point, it looked like Duplicati did everything right.

On retries, it never logged more than “attempt 6 of 288” (it’s good you increased retries from default of 5).

This possibly could do better. I haven’t had luck. I don’t know the code, but here’s where it got deprecated:

Removed automatic attempts to rebuild dblock files as it is slow and rarely finds all the missing pieces (can be enabled with --rebuild-missing-dblock-files).

I checked “successful” upload time of 20 missing files. There may have been a few hours of loss:

2021-04-07 20:55:30 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.90 MB)
2021-04-07 19:55:59 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,024.00 MB)
2021-04-07 20:57:34 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,024.00 MB)
2021-04-07 20:57:29 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.99 MB)
2021-04-07 20:46:13 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.95 MB)
2021-04-07 21:10:44 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.91 MB)
2021-04-07 21:23:01 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.95 MB)
2021-04-07 21:23:47 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.93 MB)
2021-04-07 21:25:36 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.97 MB)
2021-04-07 20:27:26 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (669.39 KB)
2021-04-07 20:26:58 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (669.43 KB)
2021-04-07 21:41:24 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.94 MB)
2021-04-07 21:55:22 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.93 MB)
2021-04-07 20:46:14 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (722.47 KB)
2021-04-07 21:47:57 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.91 MB)
2021-04-07 20:55:31 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (681.68 KB)
2021-04-07 21:53:46 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.99 MB)
2021-04-07 20:57:31 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (669.15 KB)
2021-04-07 22:07:21 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (1,023.92 MB)
2021-04-07 20:57:36 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: (669.18 KB)

I less carefully sampled a few “successful” uploads in the hours of loss, and found those files missing.
Feel free to do your own log study, or ask your provider what might have happened in those few hours.

1 Like