Backups to SFTP Not Closing Connection

I recently upgraded from 2.0.4.5 to 2.0.4.22 to test some of the fixes to an issue I was seeing. Since then I’ve noticed that connections to my SFTP server are not closing after the backup completes. My assumption is that this has something to do with the changes for multiple connections to the target backup.

These open connections will stay open indefinitely (days) until I run out of allowed connections on my SFTP server and backups start failing. I’ve reduced asynchronous-concurrent-upload-limit down to 2 which seems to halve how many connections are left open after each backup.

From what I’ve observed, the backup starts and two connections are opened. As the backup continues two more connections are opened and there are 4 concurrent connections (despite being set at 2), and ultimately two of the 4 are closed and the remaining two are left open indefinitely.

I can do some more troubleshooting and try to figure out at exactly which stage things are happening if that helps but thought I’d see if anyone else has noticed the issue.

I hadn’t noticed it (I don’t use SFTP generally), but I can reproduce it. Also confirming 2.0.4.5 doesn’t do it.

C:\>netstat -ano | findstr :22 | findstr ESTABLISHED
TCP [::1]:22 [::1]:65411 ESTABLISHED 3584
TCP [::1]:65411 [::1]:22 ESTABLISHED 18476

is a way to count connections (if going to localhost as above, expect connections to show mirror-images). Limited tests did seem to show it as upload related, e.g. test of downloading all files had no open leftovers.

This seems like a serious limit, correct? If (not confirmed) it was in v2.0.4.16-2.0.4.16_canary_2019-03-28 then I guess nobody picked up canary and noticed it. There’s a wish-it-was-seen-sooner backend-specific bug in “FTP (Alternative)” in Problems after upgrade to 2.0.4.21 which requires a UNIX-style FTP server…

Although I hope SFTP can be fixed, there’s some time pressure from Amazon Cloud Drive access ending. There are other SFTP servers, but free ones sometimes have limits like yours, or only allow personal use.

Characterizing the problem further will help figure out where it lies. Even better would be if you read code…

This might affect all backends. See the issue below:

I was worried about that possibility, but I’d tested “FTP” (not “FTP (Alternative)” and everything closed OK. Monitoring was with Sysinternals Process Explorer keeping watch on the TCP/IP status of FileZilla Server. Your very recent finding (thanks for finding it!) raises my worry level again though. Some of the backends seem difficult to test (FTP was easy, so I tested…). Using rclone (which I don’t) might be highly obvious…

Proposed fix here:

1 Like

Thanks @ts678 and @warwickmm!

I’ll be sure to test the fix out as soon as it’s available. As of right now I can get around the issue by restarting the service. Appreciate the quick response and action.

The fix has been merged and should appear in the next canary release.

1 Like

I confirmed that the new canary (2.0.4.26) with the fix resolves this issue.

Hello,

sorry for the necro bump but currently using the 2.0.5.1_beta_2020-01-18, I can still experience ssh connexion exhaustion. Checking on the server, I can confirm the number of sshd launched (while duplicati is the only client connected).

It seems that 2 connections are left open on error.

here is a duplicati error log:

{
“DeletedFiles”: 0,
“DeletedFolders”: 0,
“ModifiedFiles”: 14,
“ExaminedFiles”: 302,
“OpenedFiles”: 16,
“AddedFiles”: 2,
“SizeOfModifiedFiles”: 1040758,
“SizeOfAddedFiles”: 6127,
“SizeOfExaminedFiles”: 171700012,
“SizeOfOpenedFiles”: 1373472,
“NotProcessedFiles”: 0,
“AddedFolders”: 0,
“TooLargeFiles”: 0,
“FilesWithError”: 0,
“ModifiedFolders”: 0,
“ModifiedSymlinks”: 0,
“AddedSymlinks”: 0,
“DeletedSymlinks”: 0,
“PartialBackup”: false,
“Dryrun”: false,
“MainOperation”: “Backup”,
“CompactResults”: {
“DeletedFileCount”: 0,
“DownloadedFileCount”: 0,
“UploadedFileCount”: 0,
“DeletedFileSize”: 0,
“DownloadedFileSize”: 0,
“UploadedFileSize”: 0,
“Dryrun”: false,
“VacuumResults”: null,
“MainOperation”: “Compact”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “2020-05-07T01:13:54.6321494Z”,
“BeginTime”: “2020-05-07T01:13:54.6251668Z”,
“Duration”: “00:00:00.0069826”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null,
“BackendStatistics”: {
“RemoteCalls”: 13,
“BytesUploaded”: 111948,
“BytesDownloaded”: 29659,
“FilesUploaded”: 3,
“FilesDownloaded”: 2,
“FilesDeleted”: 1,
“FoldersCreated”: 0,
“RetryAttempts”: 4,
“UnknownFileSize”: 0,
“UnknownFileCount”: 0,
“KnownFileCount”: 40,
“KnownFileSize”: 135179548,
“LastBackupDate”: “2020-05-07T03:13:50+02:00”,
“BackupListCount”: 6,
“TotalQuotaSpace”: 0,
“FreeQuotaSpace”: 0,
“AssignedQuotaSpace”: -1,
“ReportedQuotaError”: false,
“ReportedQuotaWarning”: false,
“MainOperation”: “Backup”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “0001-01-01T00:00:00”,
“BeginTime”: “2020-05-07T01:13:50.2183684Z”,
“Duration”: “00:00:00”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null
}
},
“VacuumResults”: null,
“DeleteResults”: {
“DeletedSetsActualLength”: 1,
“DeletedSets”: [
{
“Item1”: 1,
“Item2”: “2020-05-07T03:00:00+02:00”
}
],
“Dryrun”: false,
“MainOperation”: “Delete”,
“CompactResults”: {
“DeletedFileCount”: 0,
“DownloadedFileCount”: 0,
“UploadedFileCount”: 0,
“DeletedFileSize”: 0,
“DownloadedFileSize”: 0,
“UploadedFileSize”: 0,
“Dryrun”: false,
“VacuumResults”: null,
“MainOperation”: “Compact”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “2020-05-07T01:13:54.6321494Z”,
“BeginTime”: “2020-05-07T01:13:54.6251668Z”,
“Duration”: “00:00:00.0069826”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null,
“BackendStatistics”: {
“RemoteCalls”: 13,
“BytesUploaded”: 111948,
“BytesDownloaded”: 29659,
“FilesUploaded”: 3,
“FilesDownloaded”: 2,
“FilesDeleted”: 1,
“FoldersCreated”: 0,
“RetryAttempts”: 4,
“UnknownFileSize”: 0,
“UnknownFileCount”: 0,
“KnownFileCount”: 40,
“KnownFileSize”: 135179548,
“LastBackupDate”: “2020-05-07T03:13:50+02:00”,
“BackupListCount”: 6,
“TotalQuotaSpace”: 0,
“FreeQuotaSpace”: 0,
“AssignedQuotaSpace”: -1,
“ReportedQuotaError”: false,
“ReportedQuotaWarning”: false,
“MainOperation”: “Backup”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “0001-01-01T00:00:00”,
“BeginTime”: “2020-05-07T01:13:50.2183684Z”,
“Duration”: “00:00:00”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null
}
},
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “2020-05-07T01:13:54.6321494Z”,
“BeginTime”: “2020-05-07T01:13:53.9294832Z”,
“Duration”: “00:00:00.7026662”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null,
“BackendStatistics”: {
“RemoteCalls”: 13,
“BytesUploaded”: 111948,
“BytesDownloaded”: 29659,
“FilesUploaded”: 3,
“FilesDownloaded”: 2,
“FilesDeleted”: 1,
“FoldersCreated”: 0,
“RetryAttempts”: 4,
“UnknownFileSize”: 0,
“UnknownFileCount”: 0,
“KnownFileCount”: 40,
“KnownFileSize”: 135179548,
“LastBackupDate”: “2020-05-07T03:13:50+02:00”,
“BackupListCount”: 6,
“TotalQuotaSpace”: 0,
“FreeQuotaSpace”: 0,
“AssignedQuotaSpace”: -1,
“ReportedQuotaError”: false,
“ReportedQuotaWarning”: false,
“MainOperation”: “Backup”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “0001-01-01T00:00:00”,
“BeginTime”: “2020-05-07T01:13:50.2183684Z”,
“Duration”: “00:00:00”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null
}
},
“RepairResults”: null,
“TestResults”: {
“MainOperation”: “Test”,
“VerificationsActualLength”: 3,
“Verifications”: [
{
“Key”: “duplicati-20200507T011350Z.dlist.zip”,
“Value”:
},
{
“Key”: “duplicati-i7e68b697808b430cad549125032df88c.dindex.zip”,
“Value”:
},
{
“Key”: “duplicati-ba0c0f67080b54939aa7b93bb101ad30a.dblock.zip”,
“Value”: [
{
“Key”: “Error”,
“Value”: “Une tentative de connexion a échoué car le parti connecté n’a pas répondu convenablement au-delà d’une certaine durée ou une connexion établie a échoué car l’hôte de connexion n’a pas répondu”
}
]
}
],
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “2020-05-07T01:16:20.0620377Z”,
“BeginTime”: “2020-05-07T01:13:56.086555Z”,
“Duration”: “00:02:23.9754827”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null,
“BackendStatistics”: {
“RemoteCalls”: 13,
“BytesUploaded”: 111948,
“BytesDownloaded”: 29659,
“FilesUploaded”: 3,
“FilesDownloaded”: 2,
“FilesDeleted”: 1,
“FoldersCreated”: 0,
“RetryAttempts”: 4,
“UnknownFileSize”: 0,
“UnknownFileCount”: 0,
“KnownFileCount”: 40,
“KnownFileSize”: 135179548,
“LastBackupDate”: “2020-05-07T03:13:50+02:00”,
“BackupListCount”: 6,
“TotalQuotaSpace”: 0,
“FreeQuotaSpace”: 0,
“AssignedQuotaSpace”: -1,
“ReportedQuotaError”: false,
“ReportedQuotaWarning”: false,
“MainOperation”: “Backup”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “0001-01-01T00:00:00”,
“BeginTime”: “2020-05-07T01:13:50.2183684Z”,
“Duration”: “00:00:00”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null
}
},
“ParsedResult”: “Error”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “2020-05-07T01:16:21.0836376Z”,
“BeginTime”: “2020-05-07T01:13:50.2183684Z”,
“Duration”: “00:02:30.8652692”,
“MessagesActualLength”: 35,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 1,
“Messages”: [
“2020-05-07 03:13:50 +02 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: L’opération Backup a débuté”,
“2020-05-07 03:13:50 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()”,
“2020-05-07 03:13:50 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed: (38 octets)”,
“2020-05-07 03:13:52 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-bddf9bfed194047559f0521fee4d4bf09.dblock.zip (80,36 KB)”,
“2020-05-07 03:13:52 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-bddf9bfed194047559f0521fee4d4bf09.dblock.zip (80,36 KB)”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-20200507T011350Z.dlist.zip (26,87 KB)”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i7e68b697808b430cad549125032df88c.dindex.zip (2,10 KB)”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-i7e68b697808b430cad549125032df88c.dindex.zip (2,10 KB)”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Completed: duplicati-20200507T011350Z.dlist.zip (26,87 KB)”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-StartCheck]: Start checking if backups can be removed”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-FramesAndIntervals]: Time frames and intervals pairs: 7.00:00:00 / 1.00:00:00, 28.00:00:00 / 7.00:00:00, 365.00:00:00 / 31.00:00:00”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupList]: Backups to consider: 07/05/2020 03:00:00, 07/05/2020 02:00:00, 06/05/2020 02:00:00, 04/05/2020 23:00:06, 03/05/2020 21:00:00, 02/05/2020 20:48:44”,
"2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-BackupsToDelete]: Backups outside of all time frames and thus getting deleted: ",
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler:RetentionPolicy-AllBackupsToDelete]: All backups to delete: 07/05/2020 03:00:00”,
“2020-05-07 03:13:53 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteRemoteFileset]: Deleting 1 remote fileset(s) …”,
“2020-05-07 03:13:54 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Started: duplicati-20200507T010000Z.dlist.zip (26,51 KB)”,
“2020-05-07 03:13:54 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Delete - Completed: duplicati-20200507T010000Z.dlist.zip (26,51 KB)”,
“2020-05-07 03:13:54 +02 - [Information-Duplicati.Library.Main.Operation.DeleteHandler-DeleteResults]: Deleted 1 remote fileset(s)”,
“2020-05-07 03:13:54 +02 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required”,
“2020-05-07 03:13:54 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()”
],
“Warnings”: ,
“Errors”: [
“2020-05-07 03:16:20 +02 - [Error-Duplicati.Library.Main.Operation.TestHandler-FailedToProcessFile]: Failed to process file duplicati-ba0c0f67080b54939aa7b93bb101ad30a.dblock.zip”
],
“BackendStatistics”: {
“RemoteCalls”: 13,
“BytesUploaded”: 111948,
“BytesDownloaded”: 29659,
“FilesUploaded”: 3,
“FilesDownloaded”: 2,
“FilesDeleted”: 1,
“FoldersCreated”: 0,
“RetryAttempts”: 4,
“UnknownFileSize”: 0,
“UnknownFileCount”: 0,
“KnownFileCount”: 40,
“KnownFileSize”: 135179548,
“LastBackupDate”: “2020-05-07T03:13:50+02:00”,
“BackupListCount”: 6,
“TotalQuotaSpace”: 0,
“FreeQuotaSpace”: 0,
“AssignedQuotaSpace”: -1,
“ReportedQuotaError”: false,
“ReportedQuotaWarning”: false,
“MainOperation”: “Backup”,
“ParsedResult”: “Success”,
“Version”: “2.0.5.1 (2.0.5.1_beta_2020-01-18)”,
“EndTime”: “0001-01-01T00:00:00”,
“BeginTime”: “2020-05-07T01:13:50.2183684Z”,
“Duration”: “00:00:00”,
“MessagesActualLength”: 0,
“WarningsActualLength”: 0,
“ErrorsActualLength”: 0,
“Messages”: null,
“Warnings”: null,
“Errors”: null
}
}

for now, I’ll mount a samba network drive to perform the backup, but still, this handle leak should be addressed ; )

Thx for your work

Fred

Not reproducible here. Does it happen without non-zero “RetryAttempts” (what’s that about anyway)?

What about a smaller backup? My test was a very small backup to a local SFTP server. It closed fine.

This was on Windows so I used netstat -ano | findstr :22 and wound up with just the LISTEN.

I also cannot reproduce this. My SFTP tests do not result in any unclosed connections. My main backup also uses SFTP and I haven’t noticed any issues.

Hi,

No problem on my side to reproduce with another configuration =>

  • new configuration no encryption
  • sftp config targeting a linux box whose sshd is set up to handle 10 max connections (connexion ok, fingerprint imported, path created properly as the last folder does not exist yet), no other option
  • source files is a mix of txt / binary files range size of 1ko to 2Mo on a mechanical hardrive (total size < 20 Mo this time). No other option no filter / exclusion
  • no planification for the second configuration, it’s manually launched
  • default size of remote file, so 50Mo but used smart retention, no other option

A simple manual execution should work like that.

If you want to reproduce the problem (It not what’s happened on the first configuration but I can reproduce the handle leak with this “method”)
instead of clicking once on start now, let’s monkey test it by clicking 4 - 5 times in a row. Several process are probably now trying to send / verify data, ultimately failing with the error message saying that the server did not answer correctly or timeouts (sorry I have the french version, the error message is probably not exact)
Once done, you can check the number of sshd process (or netstat on win).

From my understanding duplicati architecture is pretty complex. Could a synchronization issue between the reader and the block preparator (encrypt / compress) be the cause ? (that’s why I’ve mentioned the mechanical hardrive) Or strictly more a problem of parallel connections ? Finally to avoid monkey tester would it be possible to lock a configuration that is running to prevent a second run ? (maybe there is a good reason to avoid that)

Thx

is not present in Duplicati code but is found numerous times on the Internet in non-Duplicati context, so might be a Windows or .NET Framework message (you could look – most of the top hits are in French).

Google Translate:

A connection attempt failed because the connected party did not respond properly beyond a certain time, or an established connection failed because the connection host did not respond

Although I have not adapted my test to exactly your specs, clicking fast and many didn’t find a problem.
My Duplicati is on Windows. It sounds like yours is too. I don’t have a Linux sftp server set up for a test.
My netstat shows just the SFTP server LISTENING for connection. What does your netstat show you? Looking at the Linux remote may also be interesting to see if the two views line up in a reasonable way.

C:\tmp>netstat -ano | findstr :22
  TCP    0.0.0.0:22             0.0.0.0:0              LISTENING       3624
  TCP    [::]:22                [::]:0                 LISTENING       3624
  TCP    [::1]:22               [::1]:52343            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52345            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52346            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52347            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52356            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52357            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52358            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52359            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52367            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52368            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52369            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52371            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52379            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52380            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52384            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52392            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52394            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52395            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52397            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52404            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52405            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52407            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52408            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52414            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52415            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52416            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52417            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52423            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52424            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52425            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52426            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52432            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52433            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52434            TIME_WAIT       0
  TCP    [::1]:22               [::1]:52435            TIME_WAIT       0

C:\tmp>

I’d note that I’m not changing the source, but I do have –upload-unchanged-backups option checked, so it looks like I put up 9 backup versions (though just the dlist file) from my rapid clicks. Each one does go through Verifying backend files after the upload (and I notice one of your errors was in the TestHandler). –backup-test-samples=0 will turn off the verification, if you want to test if that avoids all of the problems.

  "FilesUploaded": 1,
  "FilesDownloaded": 3,

then becomes

"FilesUploaded": 1,
"FilesDownloaded": 0,

but if you have source file changes, you might get 3 files uploaded (dblock and dindex due to changes).

What would be helpful would be to see if the connection failure or the retries are an essential part here.
Is your SFTP server on a reliable (including any WiFi) local network? Might be either network or server.

A useful log would be to see what’s happening at the Retry level, e.g. About --> Show log --> Live --> Retry, or if you prefer a log on disk with more setup effort, options –log-file and –log-file-log-level=Retry.

Yes, I guess it’s happening once all the ssh handle are consumed so a pretty common error. BTW one question : how duplicati determine the max number of parallel connexions to the server ? Is it related to the block size of 50Mo ?

Yes sorry, I’ve forgotten to mention the most obvious for me :slight_smile: :
duplicati is on windows yes, linux box on home wireless network which seems ok (no problem sending some Gigs on it) and no network saturation, so far the server works properly, installed maybe a month ago, a turnkey linux on a proxmox server, no firewall.

Here is, on error, my netstat on win and ps + netstat on linux:

C:\Temp>netstat -ano | findstr :22

C:\Temp>

(yes nothing … )

root       301  0.0  0.1  69956  5324 ?        Ss   03:01   0:00 /usr/sbin/sshd -D
root     20626  0.0  0.1  97052  5948 ?        Ss   21:25   0:00 sshd: duplicati [priv]
duplicati  20642  0.0  0.0  97052  4056 ?        S    21:25   0:00 sshd: duplicati@notty
duplicati  20643  0.0  0.0  14924  1824 ?        Ss   21:25   0:00 /usr/lib/openssh/sftp-server
root     21266  0.0  0.1  97052  5864 ?        Ss   21:52   0:00 sshd: duplicati [priv]
duplicati  21282  0.0  0.0  97052  3916 ?        S    21:52   0:00 sshd: duplicati@notty
duplicati  21283  0.0  0.0  14924  1716 ?        Ss   21:52   0:00 /usr/lib/openssh/sftp-server
root     21302  0.0  0.1  97052  5828 ?        Ss   21:52   0:00 sshd: duplicati [priv]
duplicati  21318  0.0  0.0  97052  3784 ?        S    21:52   0:00 sshd: duplicati@notty
duplicati  21319  0.0  0.0  14784  1744 ?        Ss   21:52   0:00 /usr/lib/openssh/sftp-server
root@nas /var/log# netstat -ano | grep 22
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      off (0.00/0/0)
tcp        0      0 192.168.42.100:22       192.168.42.180:50718    ESTABLISHED keepalive (6723.44/0/0)
tcp        0      0 192.168.42.100:22       192.168.42.180:50721    ESTABLISHED keepalive (6724.61/0/0)
tcp        0      0 192.168.42.100:22       192.168.42.180:50177    ESTABLISHED keepalive (5126.94/0/0)
tcp6       0      0 :::22                   :::*                    LISTEN      off (0.00/0/0)

Linux rebooted before test so no rogue handle in the air.

Well I’m not sure how to interpret that as connexions are actually closed on duplicati side (so sorry for the report) but still created on linux side and server is still blocked after some tests.

Tested with backup-test-samples=0,
2 clicks / second at some point, I know I’ve blocked ssh when I can see the top status bar showing “verifying backend files”.

I’ve checked if testing result is properly disabled, it seems to be the case:

  "FilesUploaded": 0,
  "FilesDownloaded": 0,

(I’m not using your option to force send data)

Soooo go for the logs at that point you’re definitely right:

and here there are:
duplicati - Retry.zip (1.7 KB)

First, I’ve launched the backup on a blocked connexion (=> SocketException), next I’ve rebooted the linux box and we can see a different error at that point (around line 50) the remote host is closing the connexion.

I’m going to try to dig in this direction (proxmox / lxc issue ? as ssh does not close connexions … ) but for now bed time ! :slight_smile:

Thanks for your help and your time,

Any good idea is accepted

I lied…

I’m not sleeping, and have good lead. On the linux box, a fail2ban process is analyzing logs and depending of number of failed login attempts per period of time, bans the originator ip for some time.

My users are stored on an openldap service. What seems to happen is: for each connexion the system tries to find the user first in the local user db (which is empty) and log a failed message; then, the system looks for the user on the ldap service, authenticate it properly no problem.

But in parallel, fail2ban analyze logs, catch the first failure on local attempt, compute the max number of connexions and finally ban the remote ip by adding a temporary firewall rule (which explains why I can still see sshd process)

After some fail2ban configuration to prevent the ban of local ip, the same monkey test on duplicati works properly, so good news for the duplicati dev team : no handle leak to find :slight_smile: (those are always a plague to trace…)

Again, thanks for your help @ts678

This time, I can go to bed :smiley:

Fred

1 Like
  --asynchronous-concurrent-upload-limit (Integer): The number of concurrent
    uploads allowed
    When performing asynchronous uploads, the maximum number of concurrent
    uploads allowed. Set to zero to disable the limit.
    * default value: 4

though it sounds like you found the actual problem. Maybe sshd was cut off from close request?
I guess it would depend on exactly how the firewall rule is cutting. Clearly, it let sshd get started.

From what I saw, fail2ban is really a service running aside of ssh. It’s not only dedicated to ssh but all listening service on the box so it does not try to close properly a connexion using specific action defined on each and every possible protocol. Instead a simple iptable rule is added to prevent connexion and close existing ones on a given port.
In my case, it seems that the forked sshd does not even notice that connection has been closed (or maybe it waits for the client reconnection; that’s probably another valid hypothesis…) and stays alive. Duplicati while trying to reconnect was hiting hard the firewall.