How Much Faster will Duplicati be on a Newer CPU?


#1

I know this may sound obvious, but I want to know how much faster will Duplicati be on a newer CPU. Or rather, how much a newer CPU with supposedly more cores, threads, AES-NI performance and/or IPC will influence Duplicati’s performance.

I am asking as I think this may be just the data point some of us need when making hardware purchasing decisions (personally, I am thinking of upgrading the DIY NAS to something newer, or merge its duties with my workstation and then building a much newer and powerful computer)


#2

For system performance questions like this, CPU is just one factor. You also have disk read speed at play when checking folders for changed files, then reading through changed files to gather their changes which are then compressed into files (causing disk writes), encrypted, and queued to await upload to destination.

You probably should measure system resource utilizations to see what resource seems to limit the speed, then see how close the next bottlenecks would be if somehow the first limit were relieved, e.g. by upgrade.

There’s maybe no direct way to see if you’re at the –asynchronous-upload-limit, but you can see temporary files with names starting with dup- flowing through your temporary file folder. Some will be roughly 50 MB if you’re sticking to default settings for remote volume --dblock size, so you can see if you’re network-limited.

Disk utilization at 100% is a warning sign, but even worse is if there’s a request queue to gain disk access.

CPUs similarly can get fully busy, or have backlogs, but also have multi-core effects. Duplicati now utilizes concurrency more than it used to, so takes advantage of multiple cores, but I’m not sure how far that goes.


#3

Hmm. So in other words, I should check my logs for any bottlenecks at my disks?

As for concurrency, so far the logs show only one core being used. I also want to know if increased AES-NI performance in newer CPUs can help Duplicati or rather, does Duplicati use AES-NI instructions for its encryption?


#4

If you have logs. This might depend on the OS performance monitor tools you run. On Windows, Task Manager’s Performance tab offers a live view of various utilizations, including Disk, CPU, and Memory. From that screen, you can also click “Open Resource Monitor” at the bottom, for a more powerful tool which can show things like Disk Queue Length whereas Task Manager can only say if it’s always busy.

The answer appears to be yes, provided you run newer Windows, and the comment below is accurate.

Above covers encryption done for Duplicati disk files. If your encrypted SSL/TLS connection happens to negotiate AES encryption, whether or not AES-NI is used depends on Windows’ cryptographic services.

This can be tested on actual operations to either a trusted destination, or using files that aren’t sensitive. Deconfigure encryption, and you can estimate the speed help, if encryption performance went to infinite. This is the same as the general CPU question. It seems like it should help some, but how much overall?

This possibly implies that you are not CPU limited, although that depends on whether .NET Framework which runs Duplicati code (on Windows anyway) spreads threads across cores. Doing that needlessly could actually hurt performance, due to cache effects, but at some point one core can’t handle the load.


#5

My primary backup system is Xeon E5405 8 cores 2GHz, 32GB RAM. It rarely uses more than 50% of any single core, rarely uses more than one core with Duplicati, and has very little else running. OS (Server 2012) and Duplicati only use 1.5-2 GB of RAM at any given time. Backups are read from network drives (they are SATA3 SSDs in RAID connected via 10Gb network connection), and backups are stored locally on SATA3 platter drives (RAID5). We have 5 different backup process setup for different times, varying from 80GB to 700GB worth of files.

Once you figure the slowest point in this equation is the 100-120Mbps Read/Write of the local drives, that becomes the limiting factor. Even then there is plenty of processing and verifying of the files, database, backups, etc.

Initial backup always takes the longest. With 700GB+ on one network drive with some larger files but mostly medium to smaller files, it took 5 full days for the initial scan, read, catalog, and backup of those files. Now the daily backups for this set of files takes around 20-30 minutes since there are few file changes.

Then another backup that has 200GB worth of a lot of smaller files, and a lot of changed files, new files, removed files, etc, that backup takes over an hour per day to complete.

So there are a lot of factors that come in to play. I plan to but have not tested it on my newer system at home: Ryzen5 2600, SSD with OS (~420Mbps r/w), NVMe drive for storage (2100-2400 Mbps r/w), standard platter SATA3 drive for storage (80-110 Mbps r/w).


#6

Thanks for the details, right now I am on Debian for Duplicati so I’ll have to see if these work the same too (which I think and hope so!)

As for my “one core” comment, actually that’s more like only 25% of the CPU being used at all times instead of say, 90%. I’ll need to check again for sure if there is a way to see each core’s utilisation rather than overall CPU utilisation


#7

Thanks for the reply! I am indeed thinking of moving to a Ryzen setup but it seems like Duplicati works nicely even on old hardware mainly due to the bottleneck caused by storage speeds. I’ll be interested to see how your new setup will be like!


#8

Web search suggests top can do it if you press the 1 key, or sar -P ALL might work. One possible point of confusion is whether the breakout is by core (common), or physical CPU package (not common), or both. There are certainly other Linux performance tools available if the above are not available or are inadequate. Linux is probably mainly responsible for what runs on what core, though possibly mono gives it some hints.


#9

For an anecdotal data point (no official statistics)…

I’m running 2.0.4.5_beta_2018-11-28 on a 4 core Intel i5-3450S @ 2.8 GHz and Debian 9.

This box hosts 12 TB of disk in various arrays, SMB & AFP fileshares for my home network, media content for Plex and Plex Server, a bunch of usenet automation software, and probably a few other things I can’t remember right now.

I’m encrypting my backups and backing up to Backblaze B2, I don’t even notice duplicati doing it’s thing. When I was using Spideroak, it would chew up CPU and thrash the disk (and leave leftover files that I’d have to manually clean up once every few months, otherwise my root volume would fill up).

Spideroak would also bog the machine down - if Sabnzbd was unpacking a large file, and Spideroak was also in the midst of a backup, media was unwatchable due to stuttering & buffering. I haven’t seen these issues at all since moving to duplicati.


#10

Yeah I noticed the same issue with spideroak which is why I removed it.


#11

Thanks, I’ll give this a check but so far based on just looking at overall stats, Duplicati seem to play nice with all other processes I have on the server

Thanks! I am now testing out Plex too, will see if I get similar results with Duplicati


#12

I’ve found htop quite useful. And though I haven’t used it much, ntop gives “Network TOP” type info. :slight_smile:


#13

I did some preliminary unofficial testing and found that backup times/speeds is actually not affected much by the newer CPUs (Ryzen5 2600), the newer RAM (DDR4-3200), and drive speed seems to have a bit more impact but overall speeds are not that different.

System 1: Ryzen5 2600 (stock 6x 3.4-3.9GHz), 16GB DDR4-3200, 500GB M.2 NVMe (2800MB/s) storage, 256GB SATA SSD for OS, 2TB platter based SATA drive
System 2: Core i3-2120 (stock 2x 3.3GHz), 8GB DDR3-1333, 256GB SATA SSD for OS, 4TB platter based SATA HDD connected via USB 3.0.

Test 1 was 50 personal videos (kids sports) that came to 140GB worth.
Test 2 was a mix of 20 personal videos a hundreds of zip, exe, msi, and other random files and folders coming to 150GB worth.

On sys1 I had all the files on the NVMe under “Data” and Duplicati backing up to “Backup”. I tested “from” SSD, NVMe, and HDD to “backup folder” on the NVMe and saw almost no speed difference since most of it is processing time. Going from NVMe to SSD or HDD, saw little difference either. This only pertains to the first initial backup that is the one that typically takes a long time (still took 7-9 hours).

Overall processing time comparison between system 1 and 2 varied for that first initial backup time by 3 hours, 7-9 hours with the Ryzen5 system1 and 9-12 hours with the Core i3 system2. That leads me to believe that a few CPU cores at higher speed (like a Ryzen) performs better than more cores at a slower speed (like a 2.05GHz Xeon), amount of memory has little difference because I use a 32GB system with more slower cores and see same or slower speeds.


#14

Wow, thanks for taking the time to test this out! Sounds like Duplicati is more about clock speed (or IPC?) than core count or AES-NI speeds. I wonder what will happen if System 2 uses SATA instead of USB and System 1 has its cores restricted to just 2 cores for Duplicati. Do you think you may see more nuanced differences?


#15

Once you figure that USB 3.0 has a maximum bandwidth rate of 5 Gbps (gigabits per second) or 640 MB/s versus SATA3 has a limit of 6Gbps (750MB/s), and the platter based hard drives are limited to 133-150MB/s (sometimes a bit faster with RAID and a 10-15K RPM HDD, but still nowhere near an SSD nor NVMe), the platter based drive is the limiting factor there, not the USB. This is why I love the USB 3.0 external drive dock for my work related stuff, it lets SSDs run at full speed (350-550MB/s).

System 1 with the Ryzen and 2TB platter based drive is connected via SATA port and sees the same raw data read/write speeds (avg 130MB/s) as the other system connected via USB 3.0.

I doubt there would be much speed difference with a core limited setup as the system/OS spreads most of the workload among the 12 cores (6 physical + 6 hyperthreaded) which should be close to the same as limiting it to just 2 cores.