Linespeed and time consumption

Hi!

I am backing up some 400Gb over a 1Gbps line. It takes roughly 4.5 days to finish the initial run. Is that to be expected? Usually I get between 30MB/s and 60MB/s on direct sftp xfer. At 30MB/s it would take (1000/25)=33 seconds to xfer 1 GB, which holds true using sftp to test linespeed. 400 Gigs should then take 400x33 / (60*60) = 3.7hours. What is going on? Is this really caused by the source server being unable to handle 1.5 million (1.5x10⁶) files per directory?

Hi @9colai , welcome to the forum.

That is a tough questions to answer directly as there are many moving parts in a backup.

Do you have any overview of what the Duplicati performance looks like during the backup?
High CPU usage? High local disk usage? High memory usage? Burst transfers?

From our profiling on various machines, it looks like the most expensive operation is the compression, so you can set --zip-compression-level=0 to disable compression and store data “as-is”.

Hmm… The source is a Xeon-based rackserver running win2019 with several millions of files around 100kb size each. Cpu and ram usage is low while running at night. The Duplicati process seems to be hovering around around 6% CPU usage. The destination (storage) is a (similar hardware) ssh/sftp server running Ubuntu 2404LTS. I will try the “nozip” option as it only seems to save some 25% space anyhow.

Once the initial copy has been made the source sees minimal change, and the daily (nightly) backup takes around 80 minutes.

Did you try higher volume sizes? (less overhead for sftp)

Does the receiving end support hardware encryption for sftp?

Yup. Doubling the volumesize (from 50Mb to 100Mb) causes the initial backup to take nearly half the time. Are you suggesting that doubling size again will bring the time down by a (total) factor of four?

I really don’t know about HW support for sftp encryption. If it is built into the XEON I’d expect it to kick-in by itself. BTW using no encryption seems to reduce backup time by a third.

Just try! I use 250 MB (MegaByte).

I assume that the Windows side supports hardware encryption (low system load) but I am not sure about the Ubuntu side.

That is interesting! We could barely measure the encryption overhead in our performance tests.

Edit: Are you using Duplicati 2.1.0.2?
There were some issues previously with encryption not performing well.

The performance setback caused by encryption was observed with an older version - around 8 months ago. Apparently the performance hit increased with number of source files and total size. I am (now) running the latest version, but I haven’t checked up on the encryption thing.

Just to close this… When Duplicati is counting 2.5mio files it uses between 1 and 10% of the source (Windows) cpu and less than 1% of the destination (Ubuntu) cpu. (I wonder why it opens the connection while counting). While Duplicati is actually sending data it uses max 15% cpu at the source, - and the destination is still way under 1%. To me that (sorta) indicates that the receiving system is not overloaded.

It starts by listing the remote source to check that all files are still there. Could that explain what you see?

There is also the disk that can be overwhelmed, do you have a view of the disk I/O?
Also, Duplicati does processing while it is transferring.

Thanx. That makes sense. If you by “remote source” are referring to the backup destination, then I suppose counting and checking 2.5mio zipped and encrypted files over a 1Gb line on a ssd disk will take some time.

And yes… while running / sending data to the backup server, the source disk is pretty busy. The source disk is a standard mechanical SAS disk. The destination disk (ssd) is barely at work.