Operation has timed out


#1

I’m using Duplicati 2.0.3.3_beta_2018-04-02 to backup to a webdav instance on nextcloud. I have Linux and Windows machines both backing up to the same server. The server is a tinkerboard. I keep getting messages “Failed to process file … => The operation has timed out.”. However Duplicati is telling me that the backup was successful. So I have 2 questions:

  1. Was the backup really successful?
  2. How do I get rid of these timeout errors?

#2

It depends on when / where the messages occurred. My GUESS is that they’re happening at the end of your job run - most likely during the verification stage which happens AFTER the backup has completed.

Are the errors or warnings? Errors usually mean “something catastrophic happened and the backup failed” while warnings are more along the lines of “something unexpected happened but the backup continued anyway”.

If it’s really a timeout then there’s most likely a performance issue with either your destination (NextCloud, in your case) or your network connection.

This is a test (not a fix) but if you try running the job with --backup-test-samples=0 do the messages stop? Note that this actually DISABLES a testing step in Duplicati the verifies previously created backup files, so it’s generally a better idea to figure out the underlying issue than to simply turn off this feature.

--backup-test-samples
After a backup is completed, some files are selected for verification on the remote backend. Use this option to change how many. If this value is set to 0 or the option --no-backend-verification is set, no remote files are verified
Default value: “1”


#3

Here’s the output of one of the jobs. Looking at the output I can’t tell if this is during verification or not. I’ll add that option to one of the client and see how that goes.

DeletedFiles: 218
DeletedFolders: 2
ModifiedFiles: 189
ExaminedFiles: 169856
OpenedFiles: 381
AddedFiles: 192
SizeOfModifiedFiles: 1935136809
SizeOfAddedFiles: 895738452
SizeOfExaminedFiles: 249808160490
SizeOfOpenedFiles: 3863815236
NotProcessedFiles: 0
AddedFolders: 2
TooLargeFiles: 0
FilesWithError: 0
ModifiedFolders: 0
ModifiedSymlinks: 0
AddedSymlinks: 0
DeletedSymlinks: 0
PartialBackup: False
Dryrun: False
MainOperation: Backup
ParsedResult: Error
VerboseOutput: False
VerboseErrors: False
EndTime: 7/25/2018 4:03:48 AM (1532509428)
BeginTime: 7/25/2018 2:36:30 AM (1532504190)
Duration: 01:27:18.7620730
Messages: [
    Deleting 2 remote fileset(s) ...,
    Deleted 2 remote fileset(s),
    Compacting because there are 7 fully deletable volume(s),
    Deleted 14 files, which reduced storage by 350.07 MB,
    removing file listed as Temporary: duplicati-b2457d8e2e95b4ec49e2e4c2b03fb1e6f.dblock.zip.aes,
    removing file listed as Temporary: duplicati-iee847c34fcee490b92e7e6c99101bec8.dindex.zip.aes
]
Warnings: []
Errors: [
    Failed to process file duplicati-b1c3cb1009a2a40bb8c01b8ae32b82df8.dblock.zip.aes => The operation has timed out.
]

#4

I’ve run one system with verifications disabled and I’m not seeing the timeout errors. That suggests that the timeout is part of the verification. So how do I fix this?


#5

That depends on where the timeout is happening. If it’s timing out while attempting to download the file, then it’s likely one or more of these:

  • download was interrupted w/out Dulicati noticing and aborting (there is a known issue with interrupted downloads, but it usually results in Duplicati getting ‘stuck’ on a step rather than erroring out)
  • general connectivity issue (unlikely since you’d probably see issues with other web applications)
  • slow internet (unlikely since sounds like this might be on your own LAN?)
  • ran out of bandwidth at destination (not likely, since this sounds like your destination)

Just to confirm the error “comes back” with the test, do you mind doing a manual test (the same thing Duplicati does automatically after each job) with the following:

  • use the “Commandline…” link on the job menu
  • select a Command of “Verify” (or “Test”)
  • REPLACE the “Commandline arguments” with then number 1 (this tells Duplicati to only do a single test)
  • scroll to the bottom of the page and click the ‘Run “verify” command now’ (or ‘Run “test” command now’)

You should see something like the following:

Running commandline entry
Finished!
            
  Listing remote folder ...
  Downloading file (310.97 KB) ...
  Downloading file (22.40 KB) ...
  Downloading file (50.05 MB) ...
Examined 3 files and found no errors
Return code: 0

#6

Across my LAN everything is fine. This is my off-site backup. So it’s across the Internet to a friend’s house. He has plenty of bandwidth (business class cable). Last night I tried changing my http timeout to 20 minutes on one system and that one completed without any warnings.

So far backups from my friend’s house to my house are fine and backups from other people’s houses to my house are fine. It’s only backups from anywhere to my friend’s house that I run into timeout issues.

I tried your test and during the test I watched the load at my friend’s house and the computer isn’t loaded at all. The network load is very low as well. The output says downloading file 27.04MB, but it’s taking forever to run. 27MB should take seconds with the bandwidth here. Watching the server log I didn’t see a request for the file until 5 or more minutes after I started the command.

Listing remote folder ...

Downloading file (27.04 MB) ...

Operation Get with file duplicati-20180729T002811Z.dlist.zip.aes attempt 1 of 5 failed with message: The operation has timed out. => The operation has timed out.

Downloading file (27.04 MB) ...

Downloading file (36.08 KB) ...

Downloading file (49.99 MB) ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Operation Get with file duplicati-b5694d4d1cba646d3acec1bf789df708e.dblock.zip.aes attempt 1 of 5 failed with message: The operation has timed out. => The operation has timed out.

Downloading file (49.99 MB) ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Connection error, retry in 2 sec ...

Operation Get with file duplicati-b5694d4d1cba646d3acec1bf789df708e.dblock.zip.aes attempt 2 of 5 failed with message: The operation has timed out. => The operation has timed out.

Downloading file (49.99 MB) ...

Operation Get with file duplicati-b5694d4d1cba646d3acec1bf789df708e.dblock.zip.aes attempt 3 of 5 failed with message: The operation has timed out. => The operation has timed out.

Downloading file (49.99 MB) ...

Operation Get with file duplicati-b5694d4d1cba646d3acec1bf789df708e.dblock.zip.aes attempt 4 of 5 failed with message: The operation has timed out. => The operation has timed out.

Downloading file (49.99 MB) ...

Examined 3 files and found no errors

Return code: 0

#7

It definitely sounds like there’s an issue with either your house getting out to the internet or getting into your friends house.

If you have a cloud drive (Google Drive, Dropbox, etc.) that you could try running a test backup to that might isolate whether or not there is an issue at your end.

My GUESS is that your friend has a low upload speed on their internet connection. Since your Duplicati is pulling files from his machine to be tested it’s limited to his upload speed (which is usually slower than most download speeds).

If you try a test again but with the following in the “Commandline arguments” field, it should include lines that say something like Downloaded and decrypted 49.96 MB in 00:00:02.6709740, 18.70 MB/s which might help isolate speed vs. stability:

--full-remote-verification=true
--console-log-level=profiling
--console-log-filter=-*.Database.*;-*.SendMail*;-*RemoteOperationGet*;-*.BasicResults*

#8

It seems the options aren’t quite right:
The supplied option --console-log-level is not supported and will be ignored
The supplied option --console-log-filter is not supported and will be ignored


#9

I switched to the 20 minute timeout and things are better, but still getting some problems. Here’s a stack dump.
Failed to process file duplicati-b1748435d55d84148afbd14825942e4d3.dblock.zip.aes

System.Net.WebException: The operation has timed out.
 at System.Net.HttpWebRequest+<RunWithTimeoutWorker>d__244`1[T].MoveNext () [0x000ba] in <fc308f916aec4e4283e0c1d4b761760a>:0 
 --- End of stack trace from previous location where exception was thrown ---
 at Duplicati.Library.Main.BackendManager.GetForTesting (System.String remotename, System.Int64 size, System.String hash) [0x00065] in <ae134c5a9abb455eb7f06c134d211773>:0
 at Duplicati.Library.Main.Operation.TestHandler.DoRun (System.Int64 samples, Duplicati.Library.Main.Database.LocalTestDatabase db, Duplicati.Library.Main.BackendManager backend) [0x003f7] in <ae134c5a9abb455eb7f06c134d211773>:0

#10

The 5 minutes of delay at the remote server log might be worth looking underneath with the netstat command to try to see how far things got at a lower level. You can do these tests at either end, but starting at remote might make sense. Read the man page, but probably “netstat -s --tcp” “passive connection openings” would say whether inbound request connected. Windows operation is different, but it can give a similar report. You can also do some form of “netstat -an” to look for something like SYN_SENT which means that client is trying to start a TCP connection. If it gets it, it goes ESTABLISHED. Beware of old connections. Sometimes those are reused for efficiency purposes. You might also filter for server address, to see what’s laying around and what’s changing. You could also run ping to the delaying server to test for packet loss.

If you’re willing to install Process Explorer on Windows, you can find the Duplicati server to watch its network action. The view is pleasantly auto-refreshed (with color coding) and is limited to what you care about – the Duplicati server.

27MB might not take seconds. Cable modem upload speeds can be very slow. Do you have Internet speed results from remote? Still, if uploads there are slow, backups to your house should be slow, as should any restores from your friend’s house, and that’s what worries me. If verifications aren’t reliable, are you also seeing failures doing restores from friend’s?

Possibly the profiling log will be more clear about what’s going on. I can’t tell from the logs so far whether the problem is a failure to connect, or a connection failure partway through, or slowness causing Duplicati to give up and call a halt to the connection. Studying code might help, but I’d prefer to try some easier things first.

Doing an analysis at a packet capture level would be very revealing, if you have networking experts nearby. Possibly you already have tcpdump on Linux. Explaining how to drive it and interpret results would be hard.


#11

The packet capture shows duplicate ACKs showing up. I’m not sure where those are coming from.

I turned on profile logging last night and got this exception.

Operation Get with file duplicati-b74195b886819479bbcf08ac071787791.dblock.zip.aes attempt 1 of 5 failed with message: The operation has timed out.

System.Net.WebException: The operation has timed out.
 at System.Net.HttpWebRequest+<RunWithTimeoutWorker>d__244`1[T].MoveNext () [0x000ba] in <fc308f916aec4e4283e0c1d4b761760a>:0 — End of stack trace from previous location where exception was thrown —
 at System.Net.WebConnectionStream.Read (System.Byte buffer, System.Int32 offset, System.Int32 count) [0x00077] in <fc308f916aec4e4283e0c1d4b761760a>:0
 at Duplicati.Library.Utility.Utility.CopyStream (System.IO.Stream source, System.IO.Stream target, System.Boolean tryRewindSource, System.Byte buf) [0x00035] in <0828ce86ffa94a4bbbb2da4331bcc67b>:0
 at Duplicati.Library.Backend.WEBDAV.Get (System.String remotename, System.IO.Stream stream) [0x00116] in <ec9f949b0df54518b0942b8af2b00a3f>:0
 at Duplicati.Library.Main.BackendManager.coreDoGetPiping (Duplicati.Library.Main.BackendManager+FileEntryItem item, Duplicati.Library.Interface.IEncryption useDecrypter, System.Int64& retDownloadSize, System.String& retHashcode) [0x00270] in <ae134c5a9abb455eb7f06c134d211773>:0
 at Duplicati.Library.Main.BackendManager.DoGet (Duplicati.Library.Main.BackendManager+FileEntryItem item) [0x002ed] in <ae134c5a9abb455eb7f06c134d211773>:0
 at Duplicati.Library.Main.BackendManager.ThreadRun () [0x000e5] in <ae134c5a9abb455eb7f06c134d211773>:0

#12

Thank you for pursuing this to deeper depths. Could you please add further context for your test, for example what operation is running, in what direction, how far it got (if you can tell), where the capture was taken, where the duplicate ACKs are sent from, whether they are at TCP start etc.? Meanwhile, I’ll take an educated guess.

There are many Internet resources around for debugging network issues, and context helps to know which ones apply. I’ll somewhat randomly provide one on TCP startup, and another on duplicate ACKs generally.

Is this network issue during the mysterious 5 minute delay before the server notices, during the verification that fails, or somewhere else? From some guessing, the stack doesn’t talk about a connection problem (but I’m not positive it would) so possibly you just got a super-slow transfer. Since you’re looking at profiling (you can also watch live in About -> Show log -> Live), you probably have file names, sizes, timings, and speeds.

The implication from seeing Duplicati doing a dblock Get is that you’re back to chasing verification problem after successful backup to friend’s house, and are maybe having trouble (generally, or just on this dblock?) getting it over because friend’s house sends some data, your house acknowledges, then that ACK gets lost. Friend’s house tries to recover by sending again at the same point in the TCP sequence. Repeat until dead.

The problem is that data and acknowledgments flow both ways (though the actual data amount will be higher in the direction the dblock is going), and one can have this sort of retry going on either way. Need context…


#13

From some more study and a closer reading of the second link I gave you, I might have stated the details of the retries incorrectly earlier. These might be from packet loss or reordering upstream from friend’s house to yours, with your house complaining about it to try to fix things. Looking at context and sequence numbers will help say.

This is going beyond Duplicati, and might wind up in some equipment or ISP. If you can figure out which way the transfer is slow, e.g. friend’s to yours, perhaps you can set up a simpler test case before you go seeking the fix. Telling an ISP that Duplicati has trouble will probably not mean much to them, so go for something very common. Be careful about what you set up, because the ISP might have prohibitions on use, e.g. on running web servers


#14

My apologies, I forgot you were using 2.0.3.3 which would use --log-level=profiling and doesn’t support filtering.

Note that I originally suggested disabling the verification tests (which stops requested dblock files for that purpose) but depending on your retention-policy settings dblock files may end up getting requested as part of compacting (as happened in your second post).

To avoid this you could either make sure your retention-policy is set to “Keep all backups” or you set --no-auto-compact=true which will turn off the automatic cleaning of old / expired versions.

Again, this isn’t a fix so much as a way to let you keep moving forward with backups while the issue is still researched.

I realize this may be a big ask, but if your and your friend’s computer are ever on the same network you should try a backup (or test / verify) over the local network to help isolate between the computers and the internet connection.

--no-auto-compact
If a large number of small files are detected during a backup, or wasted space is found after deleting backups, the remote data will be compacted. Use this option to disable such automatic compacting and only compact when running the compact command.
Default value: “false”


#15

Thank you for the suggestions. I haven’t had time to work with the packet captures, perhaps more today. As far as having the computer on the same network, I did test the computer at my house on the network before taking it over to my friend’s house and it worked fine at my house. It does seem like it’s something about my friend’s network.

Here’s a summary of the connections

  • My house to my house works - not surprising as it’s on the local LAN
  • My house to my friend’s sometimes works. We are both on Comcast, I am on residential and my friend is on business class. We are both in the same city, so fairly close in both physical and network terms
  • my parent’s house to my house works - this is across the Internet and between different ISPs (much slower connection as my parents are on effectively a DSL connection)
  • my parent’s house to my friend’s house gives timeouts
  • my server in another state to my house always works - this is across ISPs
  • my server in another state to my friend’s timeouts

#16

was what caught my eye before, however I wasn’t sure if anywhere was just you or if others were backing up (which would help remove questions about equipment NOT at friend’s house). How about doing a speed test from there? After I talked about other tests, I thought that upload speeds might be a familiar complaint to ISP. Slow speeds can be caused by packet loss, and most ping programs give stats (much easier than tcpdump).

Here is one study on speeds and packet losses you might expect. I’ve seen another recently, but can’t find it.


#17

I just ran a command line speed test from GitHub - sivel/speedtest-cli: Command line interface for testing internet bandwidth using speedtest.net
Here are the results:
Retrieving speedtest.net configuration…
Testing from Comcast Business (XXX.XXX.XXX.XXX)…
Retrieving speedtest.net server list…
Selecting best server based on ping…
Hosted by Midco (Cambridge, MN) [67.54 km]: 32.904 ms
Testing download speed…
Download: 29.15 Mbit/s
Testing upload speed…
Upload: 6.26 Mbit/s


#18

And here is the same test from my home connection.
Retrieving speedtest.net configuration…
Testing from Comcast Cable (XXX.XXX.XXX.XXX)…
Retrieving speedtest.net server list…
Selecting best server based on ping…
Hosted by Fibernet Monticello (Monticello, MN) [32.94 km]: 35.534 ms
Testing download speed…
Download: 80.68 Mbit/s
Testing upload speed…
Upload: 6.26 Mbit/s


#19

Not bad (but not sure what, if anything, is promised). Maybe a full 50 MB dblock would take a bit over a minute.

One possible difference when initiating the connection from inside (on LAN) is that NAT can automatically work.

Coming in from outside (Internet) might require some help from application via Universal Plug and Play, or you.

You now have a lot of suggestions to try to isolate what’s going on, but it’s not particularly clear to me so far…


#20

Last night I did a capture of a verify command. Both hosts have public IP addresses. I opened the 2 captures and, as expected, the first packet in each capture has the same port numbers and timestamp. However I find that the sequence numbers don’t line up.