SFTP timeout -- never happens?

My home internet connection went down for a few minutes while an SFTP backup was in progress.

When I looked this morning, the last log entry was that Duplicati was beginning to read from the remote store. That was over five hours previous. Duplicati was still waiting for the read to complete.

When I clicked Stop Now in the main interface, Duplicati said “Stopping after current file” (and I definitely clicked Now, not After Current File).

Using Quit from the system status area icon and then restarting Duplicati from the start menu just put me back to the same place… but I realized that probably only stopped the UI task, not the backup task.

I finally rebooted the machine, and that made it stop. When I restarted Duplicati, it appears to have restarted the interrupted task, finished it, and moved on to the next missed tasks.

Why didn’t Duplicati time out after hours of waiting for a response? Is there somewhere I can set a reasonable timeout for an SFTP server?

Try to set the ssh-operation-timeout option
I’m not sure if it applies to all operations (good for your case) or only to connection (not so good).

Haven’t seen SSH do this. Suppose a few things are possible including the stuck UI issue due to code problems where the code never exits the loops and thus never ends the current backup.

SSH times out and exits from what I’ve seen. The stuck UI issue might not exit requiring Duplicati to be force ended.

I’d say its likely possible that this error leads to the stuck UI issue and timeout isn’t the issue. The appearance tends to look similar but there’s a huge difference in the reasons.

are you talking about the classic ssh client, the Bsd one ? It’s a different code base altogether.

Duplicati to openSSH (Windows) and also Android now. Duplicati I’ve only seen work fine with SSH. The alternative wouldn’t be Duplicati but the SSH server.

I’m confident the stuck UI issue was highly likely here as the issue though.

Thank you for the probable explanation. Since it recovered after I rebooted (forcibly killing all Duplicati tasks), I’ll assume doing that is preferable to attempting to modify an advanced option which might not be the real problem.

There appear to be two Duplicati tasks – they both say they are executing “Duplicati.GUI.TrayIcon.exe” with no command line parameters, but I’m assuming one must be the task that runs the backups (the one with the icon that looks like the tray icon when a backup is running) and the other must be the web server (the one that looks like the tray icon when Duplicati is not paused and not running a backup). Since you’re calling this “Stuck UI,” Is it the web server task that needs to be killed? If so, though, why isn’t the backup task recovering after Internet connectivity is re-established and either resuming the interrupted backup task, or at least performing the remaining scheduled tasks? Admittedly, I don’t know how this works internally, so what I just wrote might not make sense.

When it gets stuck with an error it is supposed to exit the code (or do something and exit if things don’t go well or whatever which it is also not do in cases) and the backup should exit. Since it doesn’t, it gets stuck. It can only do what the code allows it to do and no more. All this stuff is programmed, written as it is. It cannot do something it cannot do and thus the end result is likely that reconnecting cannot do anything as the code doesn’t do it.

I’m not exactly sure how to explain it better. Basically, they use a WhenAll (which will never exit unless everything ends correctly) and it never exits so a lot of things will fail to function properly until Duplicati is force ended which a system reboot does or doing whatever in an OS to force end or force kill an application then starting it again (in Windows this is Task Manager and then end all listed Duplicati processes).

Certain errors cause this. I found one that when thrown (throw/thrown is a programming thing to deal with errors) in a certain place and the WhenAll never exits. This results in a stuck UI where Duplicati cannot function correctly any further until force ended. The amount of errors that will cause this is unknown. It cannot be known. Its also unknown when or if they will fix it as its not going to be a fun one.

If it is it then the best you can do is to workaround it by whatever means works best for you. For me it was fixed by not backing up some files but that was a different cause than a connection issue. That doesn’t work here unless you backup so little that it never sees that kind of connection problem. You could for instance not backup online and instead backup locally like to a NAS device somewhere outside or in another building that’s close enough to use the local network. Other things.

The other problem with the WhenAll is that it cannot be force exited via a button or code in Duplicati. WhenAll’s are kind of ugly that way. I’d bang my head against the wall for using it lol. Duplicati is too complex for its use. How they decide to tackle it though is up to them. They could try to find everything that keeps it from exiting. But, personally, I’d take a button that can cleanly or otherwise exit the backup as a stuck application isn’t a good thing :slight_smile:

Edit - By the way, looking at ts678 reply below, it reminded me that the errors related to stuck UI issue can be different on two computers. For example, the above problem I had seen only triggered the issue on one computer. Another very different computer was always fine. So there’s no way to say if one has it than all should. That doesn’t work here. Execution speed (computer performance) affects code too among other things so there are certain problems that code will run into issues with on certain computers as well. If you see it on multiple or one makes no difference.

Options on the Destination screen sometimes get lost (open issue await some JavaScript expertise).
If that happens with this option, you can put it in Advanced options on screen 5 Options instead…

Command line help text says:

–ssh-operation-timeout (Timespan): Sets the operation timeout value
Use this option to manage the internal timeout for SSH operations. If this
options is set to zero, the operations will not time out
* default value: 0

(why the default value is 0 is an interesting question…)

however Test connection button times out in 20 seconds or so, suggesting that it’s not covered by
ssh-operation-timeout whose default value is to not timeout. Code looks like a Connect before list.

The first process is just to find the latest update and start it. The child process does the actual work, whatever it is. For Duplicati.GUI.TrayIcon.exe that’s the tray icon, the operation (e.g. backup), and generally the web server, although you can pass the –no-hosted-server option if you want only icon.

Set this option to not spawn a local service, use if the TrayIcon should connect to a running service.

For manual process kills (unfortunately necessary sometimes), it’s cleanest to get all processes down.

As for the hang, I see a couple of open issues on Connect hangs, but my testing seems to timeout OK.
I’m just using a dead IP address. I tried both a same-subnet LAN IP and a random on-the-Internet one.

yes this looks the same here. Sad but the documentation for SSH.NET is non existent so to understand what this option is doing, parsing huge swaths of code is necessary. Connect timeout may be related to a OS default value or whatever but it’s difficult to say. There is this SSH.NET issue that I can’t repro, with or without the timeout parameter (that is, I can’t get Duplicati to hang - but maybe it can depend on the client OS ? this is tested on Windows)

They have good documentation in a chm file. Probably not helpful if you’re not on Windows, though:

I have used this library in one of my own programs, and I use the ConnectionInfo.Timeout property successfully. I don’t do SFTP transfers but I see there is a SftpFileStream.Timeout property. I haven’t looked to see if Duplicati uses this or not.

Do you always have the log file running? If so, do you mean something like below, but just a Started?
I guess you’re not talking about an internal read, but if you are, that can probably happen on uploads.

2021-04-20 12:05:43 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b41180f0f26f24d9da90693218410ca36.dblock.zip (29.67 MB)
2021-04-20 12:06:13 -04 - [Profiling-Duplicati.Library.Main.BackendManager-DownloadSpeed]: Downloaded 29.67 MB in 00:00:29.2771604, 1.01 MB/s
2021-04-20 12:06:13 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b41180f0f26f24d9da90693218410ca36.dblock.zip (29.67 MB)

The timing is a bit unusual. Most of the backup is spent uploading. Any idea what the status bar said?

I don’t have anything non-default set for logging. What I did was go to About, then Show log, then Live, then set log level Information. I saw a few messages; the latest was a Backend event: Get - Started with a time that was over five hours before the current time.

I think the status bar said that it was Verifying – I don’t remember for certain, though. When I told it to stop immediately, the status bar changed to stopping after current file. I presumed that meant it was going to continue to wait for the operation it started over five hours ago to complete – something that clearly was never going to happen.

I’m not sure jumping into the live log after-the-fact is always reliable (and live log has other oddities).

However this would probably be in line with the backup itself being done, and verifications beginning.

It’s one overly-specific message for either choice, and it only works at certain limited spots in backup.

Choose ‘Stop now’ but status then shows ‘Stopping after current file’

So we’re still not sure what was going on. Best plan is if you can repro it on test backup with a log file.
Maybe we’ll dead-end at an SSH.NET limitation, or maybe there’s some Duplicati fix if a dev can help.

The Duplicati timeout is going into the OperationTimeout member of SftpClient. It seems to be a number of milliseconds to wait for an ‘operation’ (whatever an ‘operation’ is exactly)
There seems to be also a connection timeout, a waiting for input timeout, and the filestream timeout.
Hmm, I just stumbled on this very interesting issue, it can explain why not setting any timeout seems to still produce timeouts: it’s generated by default operating system value (at least on Windows). Setting a timeout smaller than the OS value can do something, but the reverse is not true.

If the Duplicati code goes to code from another source and that source has a timeout then it will timeout and continue. If that is to return to Duplicati then Duplicati would continue regardless of having a timeout or not. A timeout in Duplicati wouldn’t do anything but return faster at best. Both would return.

However, if it results in an error that Duplicati cannot handle correctly, and results in the code getting stuck due to its WhenAll use or whatever, then it just stays that way. There may not be additional log entries.

Programming is quite complex and there’s a lot of extra situations that are mind bending :slight_smile:

Timeouts aren’t everything.