Solved: SFTP "finishes" but does not complete, use FTP instead?

Hi, just migrated from crashplan “friend to friend” backup and are trying to use Duplicati for the same purpose. Have run in to a problem with the off-site backup (local backup to ext HD works fine).

The off-site does not finish although all files are uploaded and the progress bar says “Finished”. The “Home” screen says “Last successful run: Never”. After I see the progress bar “Finished” an error message appears within about 2min:

“Socket read operation has timed out after -1 milliseconds”

If I check the log it looks like this after every backup run:

17 nov 2017 15:46: Failed while executing “Backup” with id: 4

Renci.SshNet.Common.SshOperationTimeoutException: Socket read operation has timed out after -1 milliseconds.
vid Renci.SshNet.Session.WaitOnHandle(WaitHandle waitHandle, TimeSpan timeout)
vid Renci.SshNet.Session.Renci.SshNet.ISession.WaitOnHandle(WaitHandle waitHandle)
vid Renci.SshNet.Channels.Channel.Close(Boolean wait)
vid Renci.SshNet.Channels.ChannelSession.Close(Boolean wait)
vid Renci.SshNet.Channels.Channel.Close()
vid Renci.SshNet.SubsystemSession.Disconnect()
vid Renci.SshNet.SftpClient.OnDisconnecting()
vid Renci.SshNet.BaseClient.Disconnect()
vid Renci.SshNet.BaseClient.Dispose(Boolean disposing)
vid Renci.SshNet.SftpClient.Dispose(Boolean disposing)
vid Duplicati.Library.Backend.SSHv2.Dispose()
vid Duplicati.Library.Main.BackendManager.Dispose()
vid Duplicati.Library.Main.Operation.BackupHandler.Run(String[] sources, IFilter filter)
vid Duplicati.Library.Main.Controller.<>c__DisplayClass16_0.b__0(BackupResults result)
vid Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
vid Duplicati.Library.Main.Controller.Backup(String[] inputsources, IFilter filter)
vid Duplicati.Server.Runner.Run(IRunnerData data, Boolean fromQueue)

Has anyone got SFTP to work or is FTP a better choise with Duplicati? What about security using FTP with plain text login, are hackers a problem I wonder?

I run windows 10 on both server and client and using Bitvise as SFTP server, port open of course. All files are uploaded and even restore work, but every backup fails with “Socket read operation has timed out after -1 milliseconds”. Strange? Have also reinstalled Duplicati.

I also have the “Missing XSRF token” every time I open the GUI, disappears on refresh, running Chrome, guess it has nothing to do with the SFTP/SSH problem.

Is there a solution?

Thank you for an innovative software!

Welcome to the forum, @mp80 - thanks for using Duplicati!


According to the usage reporter SFTP was the 3rd most popular destination (behind file storage and Google Drive) in October, so it’s definitely working.

That being said, I recently started getting that same error (and timeout) with some of my SFTP backups. I’m not sure what caused it, but since adding the --accept-any-ssl-certificate parameter to my job seemed to make it go away I’m guessing my SSL certificate expired or something like that.

Using FTP will possibly make that error go away, however you’ll be passing your FTP credentials over the internet in plain-text. (The Duplicati content is already encrypted before leaving your computer, so that doesn’t matter so much.)


It’s possible you’re having a problem fetching back dblock (archive) files for the random verification that normally happens after each backup run. You can test if this is the case by either setting --backup-test-samples to 0 or --no-backend-verification to true (check the box).


Correct. The XSRF error has nothing to do with your destination and is instead related to the Duplicati web GUI not getting a correct response from the Duplicati server (either service or Tray Icon). You didn’t mention what version of Duplicati you are running, but an improvement in this area was added to the later canary versions.

Thank you for your response and the ideas!

Tried them one at a time but no luck, the error appears within 2 min from “finished” in the status bar. The Duplicati logs is not updated until the error appears.

However I set up a Backup the other way (from server to client computer) and that worked out fine, that means SSH with Bitvise and Duplicati works as you have already pointed out. I’m using 100 mb blocks and about 250gb backup size, running the 2.0.2.1_beta_2017-08-01. Both PCs running win 10 and the same Duplicati and Bitvise versions.

I fetched the logs from Duplicati and Bitvise for both the functioning- and not functioning setup. They look the same until the “dindex.zip.aes” is downloaded successfully.

After “dindex.zip.aes” the functioning one gets the “dblock.zip.aes” right away. The other one gets a socket error and times out. After 67s Duplicati tries again and succeeds getting the file. Guess it “finishes” after “dindex.zip.aes” is downloaded and after 67s it gets the “dblock.zip.aes” and shows/writes the error message (said the error came within 2min after finished so no coincidence).

Both setups have “dblock.zip.aes” of 104mb but differ some kb in size, should they be exactly the same size regardless of backup size, maybe the kb size difference depends on cluster size on the storage…

Strange it does not work when setting the --backup-test-samples to 0, it should not download the “dblock.zip.aes” as I understand it?

Have to experiment some more…

EDIT: the logs where “dblock.zip.aes” came from were downloaded before I tried your --backup-test-samples to 0, I have not checked logs after adding the --backup-test-samples to 0, just experienced the same error.

Update: Did the "–no-backend-verification " setting again and now ticked the checkbox and voila it completes and says Backup: 253,94 GB / 10 Versions.

Have turned the setting off again to see if it continues to work now, I guess it is good to have the verification on, or is that just extra safety?

A manual “Verify files” has always worked…

I also tried to run the GUI in MS Edge instead of Chrome, no more missing XSRF token, have created a shortcut that always starts Duplicati in Edge as a workaround,

I wouldn’t have thought browser would make a difference but good to know!

I’m glad to hear the backup is completing for you now and am interested to hear how the re-disable test works out. Personally, I’d try turning back on the verification if possible, but since it sounds like you manually test things that’s not such a big deal.

Hi again, sorry to say but when removing the “no backend verification” the problem immediately came back and also the message “Detected block that are not reachable in the block table” appeared when turning on verification again, repair fixed it.

Tried a smaller backup with only about 200 pictures and have the same behavior, never completes but the files are uploaded and can be restored without any problems.

Have reinstalled Duplicati and SSH server but no difference. Strange that it works the other way, from the server to my PC with the same setup. Beginning to think it’s network-setup related, I only have 100/10 but the server has 100/100. The “connection test” in Duplicati works and I can access the SSH server from both sides without problems and the servers are running on different ports with public IP’s. I don’t have time to fiddle with this right now but I feel I have to fix it to be sure the backups are made…

Regarding the XSRF problem the Edge browser didn’t fix it, it appears randomly instead, compared to every time in Chrome. Refresh fixes it anyway.

The XSRF error historically happens when working on resource constrained hardware. For example, I get it sometimes on my 6 year old laptop, almost always on my crappy virtual machines, but never on my full desktop.

The only thing that’s coming to mind regarding the verification error is that we’re back to something not being set up correctly or taking too long on your SFTP server. I apologize if I missed this in your posts, but did you ever update from the beta to a canary version?

I ask because if the issue is simply timing related (your SFTP destination doesn’t respond to Duplicati fast enough) then you might find the --http-operation-timeout parameter in the the newer canary versions worth testing:

Though I’ll be honest, I’m not sure that applies to a socket read timeout…

Thank’s alot for all the suggestions, I found some time to make some more test and now it works!! I have tried a new backup and also made the the old backup run a couple of times with success every time!

To pin down the problem I disabled the windows 10 pro (creators fall edition) firewall on the SSH-server, not hoping for any change, but the backup completed immediately (went directly to 15 versions so the backups have worked but not marked as completed, as I suspected). Tried firewall on/off and the same behavior every time (on: finishes backup, verifies ok and then presents error message after 68 seconds instead of marking as completed).

The culprit was the windows firewall together with the automatic rules set by Bitvise SSH on that particular machine, the rules seem ok (Bitvise adds a rule that opens one TCP port when Bitvise is running and routes that to the whole subnet).

The solution was to turn off the automatic opening of the windows firewall by Bitvise SSH and add the rule manually instead (allowing both TCP/UDP if that matters). I agree it sounds strange but it is repeatable on that Bitvise SSH server. Backing up the other way the automatic opening works fine, I even double checked the Bitvise SSH settings on both machines and they were identical except the port used.

I can’t explain why the backup doesn’t get marked as finished when automatic rules are handling the firewall on only one server…this is an example of a problem not easy to pin down. Both servers running the same software’s.

One more question: if a scheduled backup fails due to server being off-line, when does it retry, only at the specified time each day/night?

Thanks!

2 Likes

Wonderful detective work there! I hope you don’t mind but I flagged your post as the solution. Firewall rules can be goofy - I had to dump COMODO on one of my machines because it absolutely killed my unRAID interface for some odd reason!

By default, I believe the --retry-count is set to 5 and the --retry-delay is set to 10 seconds - so you get just under 1 minute’s worth of retries before the job errors out. Once that happens it gets put back in the queue for the next scheduled run time.

Note that if you only have a single job then using the above two parameters should work well for you. Just be aware that if the destination NEVER becomes available then the total duration of --retry-count * --retry-delay would be treated as a single job run, so if use --retry-count=24 and --retry-delay=60MIN then a potential 2nd job would never get a chance to run.

Am I correct in assuming you’re looking for something similar to what I proposed here?

If your machine if turned off during the scheduled time, it will retry as soon as Duplicati starts. If Duplicati was running, but without network connection, it will run again on the next scheduled time (same as if it ran successfully).