Backup job doesn't detect connection loss, stalled

ninja6o4 · February 22, 2018, 12:07am

Using 2.0.2.1 on various Windows 7/10 systems.

I’ve noticed this a few times, and it’s a bit troublesome.
As I’m migrating systems over to Duplicati, not all sites have the best internet/wifi, and I’ve also had to restart the FTP server on occasion.

However, more often then not, when the FTP/internet connection is lost or interrupted (from either end), Duplicati doesn’t seem to detect this or just appears to sit in progress all night. Eventually I notice it, kill the process remotely and it restarts itself, resuming the upload as if nothing happened.

Any thoughts on what I could check for this?

Pectojin · February 22, 2018, 12:15am

Hmm, Duplicati should report warnings when it fails to connect and then retry a couple of times. The exact behavior probably depends on where it is in the process but there should definitely be warning logs on the live data page

ninja6o4 · February 22, 2018, 12:17am

It’s typically in the middle of the FTP transfer. I’ve made a couple adjustments and will see how it handles it on the next run.

As far as the live log goes, there is typically no updated info, just shows the typical progress info, but then nothing while it’s stalled. No errors, retry attempts, etc.

The one I just ran that prompted this post had lost its wifi connection for a minute or two and then came back. I killed the FTP session from my server end, but Duplicati never saw this (presumably because it had lost/regained internet), and just sat there until I killed it in task manager and started it again.

JonMikelV · February 22, 2018, 9:43pm

Is this different from the issue you “resolved” with a 10MB volume size here?

When this happens do you see anything “telling” in the “Sever state properties” of the About → “System info” tab?

PhoenixAlpha · March 10, 2018, 9:26am

I’m running Version 2.0.2.14 and it’s exhibiting the same issue.

In my case the internet connection at the remote (receiving) end sometimes drops in the middle of a backup, sometimes for several minutes. Everytime this happens the backup jobs on the backup clients just hangs and never continues when connectivity resumes nor times out and just remains stuck at the interrupted point. We need to manually cancel the job and Duplicati will then start the next job.

I hope there is a fix for this.

JonMikelV · March 11, 2018, 1:45pm

Hi @PhoenixAlpha, welcome to the forum!

What are the last log entries (both in the job and “main” log) before the stall? Are you also using an FTP destination?

Until we can narrow down a specific cause for these stalls you may want to look at adjusting the retry count and delay parameters such as the ones below to better fit your environment. Note that depending on the actual issue they may not help, but it’s worth a try.

--retry-delay (default 10 seconds)
After a failed transmission, Duplicati will wait a short period before attempting again. This is useful if the network drops out occasionally during transmissions.
Default value: “10s”

--number-of-retries (default 5)
If an upload or download fails, Duplicati will retry a number of times before failing. Use this to handle unstable network connections better.
Default value: “5”

PhoenixAlpha · March 15, 2018, 5:31am

Hi,

The remote end is an SFTP server.

I will try your suggestions and if it happens again, will try and get you the log entries.

Thanks.

P.

Beller0ph1 · March 17, 2018, 6:11pm

I’m having a similar issue backing up ~350 GB to Amazon drive on Windows 10, Duplicati 2.0.2.1_beta_2017-08-01. I’ve been running the “Profiling” live log in the GUI to see if I can catch an error when it stops uploading. There is no error, it just stops.

Here is the server current Amazon Drive server state from the System Info page:

lastEventId : 966
lastDataUpdateId : 23
lastNotificationUpdateId : 8
estimatedPauseEnd : 0001-01-01T00:00:00
activeTask : {“Item1”:10,“Item2”:“8”}
programState : Running
lastErrorMessage :
connectionState : connected
xsfrerror : false
connectionAttemptTimer : 0
failedConnectionAttempts : 0
lastPgEvent : {“BackupID”:“8”,“TaskID”:10,“BackendAction”:“Put”,“BackendPath”:“duplicati-b9a0e362230f04120b16de8f8ad80fa47.dblock.zip.aes”,“BackendFileSize”:52339133,“BackendFileProgress”:35848192,“BackendSpeed”:11312,“CurrentFilename”:“D:\My Documents\Archive\From Old HDD\Liz\Desktop\CDs\worship songs\Christian - The Air I Breathe.mp3”,“CurrentFilesize”:6676865,“CurrentFileoffset”:819200,“Phase”:“Backup_ProcessingFiles”,“OverallProgress”:0,“ProcessedFileCount”:38267,“ProcessedFileSize”:83854368440,“TotalFileCount”:53058,“TotalFileSize”:428380769688,“StillCounting”:false}
updaterState : Waiting
updatedVersion :
updateReady : false
updateDownloadProgress : 0
proposedSchedule :
schedulerQueueIds :
pauseTimeRemain : 0

I have amazon-consistency-delay at 60s, number-of-retries at 25 and retry-delay at 30 seconds.
Is there a different log file I should be generating?

Beller0ph1 · March 17, 2018, 6:14pm

OK, found a new entry in the Profiling Live log:

Mar 17, 2018 1:08 PM: Cannot open WMI provider \localhost\root\virtualization\v2. Hyper-V is probably not installed.
Mar 17, 2018 1:08 PM: Using WMI provider \localhost\root\virtualization\v2
Mar 17, 2018 1:08 PM: Cannot find any MS SQL Server instance. MS SQL Server is probably not installed.
Mar 17, 2018 12:13 PM: CommitAddBlockToOutputFlush took 00:00:00.091

Is there something missing from my Windows install? I’ve seen this error before, but the backup continued without issue.

JonMikelV · March 20, 2018, 2:34pm

Unless you are specifically working with virtual machines I don’t think this is anything to worry about.

Just to verify, your issue is the same as the OP where a loss in connection to your destination causes a stall in the backup from which it never recovers?

Beller0ph1 · March 20, 2018, 9:12pm

No virtual machine here.

I don’t know if I’m losing connection, but the symptoms are the same. A stall in the Backend Event:PUT function, with no warning or change in files remaining or size to upload.

JonMikelV · March 21, 2018, 8:45pm

I don’t suppose this happens at about the same place / file on the backup every time, does it?

Beller0ph1 · March 22, 2018, 12:33am

, no. Still trying to do my first upload to Amazon drive. Although it’s been running fine for > 1 day, with little issues. I had to restart it once over the last 2 days due to a “Stream Request Timeout”. But I haven’t seen the pause in about 2 days.

JonMikelV · March 22, 2018, 2:26am

Yay?

herodotus · March 23, 2018, 2:07pm

Hi,

I believe that I’m experiencing the same issue. I’m on Arch Linux, running 2.0.3.2_canary_2018-03-16. I’m in the middle of trying to do a large backup to S3 over a slow and not entirely reliable Wimax connection (urgh). I have to restart the duplicati service several times a day because it just gets “stuck” as described. From the front-end, the symptoms are that the progress bar stops displaying an upload rate, and checking my wifi connection reveals that there’s no longer any outbound traffic.

I’ve turned on verbose logging to a file and I see no error reported, the last entry is a “Checking file for changes”, and then nothing. Sever state properties look like this:

lastEventId : 77
lastDataUpdateId : 2
lastNotificationUpdateId : 0
estimatedPauseEnd : 0001-01-01T00:00:00
activeTask : {"Item1":2,"Item2":"2"}
programState : Running
lastErrorMessage :
connectionState : connected
xsfrerror : false
connectionAttemptTimer : 0
failedConnectionAttempts : 0
lastPgEvent : {"BackupID":"2","TaskID":2,"BackendAction":"Put","BackendPath":"duplicati-i312d7356901a42eab64815f9ea69828a.dindex.zip.aes","BackendFileSize":38909,"BackendFileProgress":0,"BackendSpeed":-1,"BackendIsBlocking":true,"CurrentFilename":"/home/lucian/Music/Cat Stevens/Remember_ The Ultimate Collection/08 Matthew & Son.m4a","CurrentFilesize":4871046,"CurrentFileoffset":2048000,"Phase":"Backup_ProcessingFiles","OverallProgress":0,"ProcessedFileCount":51515,"ProcessedFileSize":101522367438,"TotalFileCount":63792,"TotalFileSize":165070252186,"StillCounting":false}
updaterState : Waiting
updatedVersion :
updateReady : false
updateDownloadProgress : 0
proposedSchedule : [{"Item1":"2","Item2":"2018-03-24T12:00:00Z"}]
schedulerQueueIds : [{"Item1":3,"Item2":"2"}]
pauseTimeRemain : 0

My system logs for the period don’t show the connection dropping but it’s possible that there was some more substantial packet loss/latency, and this would be on the other side of the router so wouldn’t necessarily cause the wifi connection to fail explicitly. I’m basically running with the default settings for everything. I can try dropping the upload volume size but it’s still concerning to me that duplicati is just hanging in this situation rather than either retrying its way out of it or failing explicitly.

JonMikelV · March 24, 2018, 1:16pm

Yes, I agree that’s not a good scenario.

I suspect Duplicati is making a call to a standard command to do something then waiting for that command to end and it’s the command that is never finishing or telling Duplicati something went wrong.

Your “last log file entry” should help narrow things down a bit. Hopefully we can get some confirmation from others of the same last entry.

Beller0ph1 · April 1, 2018, 12:41pm

It seems this happens more when processing large files (e.g. disk ISOs, digital videos…). I was running a block of smaller files that went smoothly. Now it’s processing these larger files and hanging with no warning.

JonMikelV · April 2, 2018, 5:26pm

That’s odd as the file processing side of things happens before transfers to the destination do, so a loss of destination connection shouldn’t care about file size.

Unless you’re trying to back up a network connected SOURCE - is that maybe how your job is set up?

Beller0ph1 · April 3, 2018, 10:46pm

Nope, all the files are local.
Here is the snapshot of the server state properties window:

  lastEventId : 1321
lastDataUpdateId : 6
lastNotificationUpdateId : 0
estimatedPauseEnd : 0001-01-01T00:00:00
activeTask : {"Item1":2,"Item2":"8"}
programState : Running
lastErrorMessage :
connectionState : connected
xsfrerror : false
connectionAttemptTimer : 0
failedConnectionAttempts : 0
lastPgEvent : {"BackupID":"8","TaskID":2,"BackendAction":"Put","BackendPath":"duplicati-bc3a7e67783e94bd5abb5956f63bf0b1d.dblock.zip.aes","BackendFileSize":52395165,"BackendFileProgress":52395165,"BackendSpeed":1722,"BackendIsBlocking":true,"CurrentFilename":"D:\\My Videos\\My Videos\\Hecker Wedding Redone.iso","CurrentFilesize":4232478720,"CurrentFileoffset":2153881600,"Phase":"Backup_ProcessingFiles","OverallProgress":0,"ProcessedFileCount":51506,"ProcessedFileSize":265139777533,"TotalFileCount":53175,"TotalFileSize":430253443518,"StillCounting":false}
updaterState : Waiting
updatedVersion :
updateReady : false
updateDownloadProgress : 0
proposedSchedule : []
schedulerQueueIds : []
pauseTimeRemain : 0

The “BackendSpeed” is very slow.

JonMikelV · April 4, 2018, 4:48am

I suppose a “slow” connection on its own might not cause issues but if combined with a slightly resource constrained machine that gets overly busy for “long” periods when processing large files (likely doing block hash lookups) could run into issues.

But I’m not really sure how to very that guess.

Now that there’s a new beta version out, I’d suggest updating to that and see if any general performance improvements resolve the issue.