Duplicati backup slow/taking-forever while using very few system resources

BurningCopper · August 13, 2022, 1:20pm

Greetings!

My apologies if this is answered someplace else, but I wasn’t able to find anything quite like this (though it feels basic )

I’m attempting to run a backup on about 3TB of data to an S3 (not AWS) bucket. It’s a good number of files (150M or something like it) so I expect it might take a long time to complete, but the problem I’m seeing is that the it’s taking a long time while hardly using any system resources.

The setup:
Duplicati installed as a service on an Ubuntu 20.04 VM
8 Intel Xeon cores
4GB of RAM
25GB local VM disk space (the disk image is stored on a local all flash hardware RAID6 array. Real world test transfers are pretty well steady at ~500MB/s)
1G network connection to all network shares, including the S3 bucket
Files are connected to the VM through an NFS share with a Truenas device (which routinely provides >80MB/s file transfer speeds to the VM)
Other backup software running on the Truenas sees a steady >20MB/s transfer rate when backing up

Duplicati settings I’ve change to possibly improve the upload speeds;
asynchronous-upload-limit: 0
blocksize: 500KB
concurrency-block-hashers: 8
concurrency-compressors: 8
dblock-size: 100MB
no-encryption: true (I trust the endpoint and the pipe, but plan to add encryption if I can fix the performance issues)
thread-priority: highest

What I’ve observed:
CPU Utilization: Rarely goes above 30% and never about 50%
RAM: after a couple of days the VM hits about 50% but never goes above that
Disk usage: paltry… the indicator is near idle pretty much all the time with occasional spikes up to about 40MB/s before crashing to idle again
Network activity sits around 20M, while occasionally bobbing up to ~50M (Again, I’ve clocked the slowest connection in the chain at a steady 160M network connection)
Merely counting all the files takes literally days, followed by a steady 1.5MB/s transfer rate after that

Duplicati has the perfect feature set for what I want to do with it, but if even checking for changes takes more than a day, the performance is unworkable for a nightly backup. This is a VM doing nothing but backups and running in a test environment. I’m willing to make it bleed.

What can I do to convince Duplicati to push the ‘hardware’?

Let me know if you need any more info.

Thanks!

JimboJones · August 13, 2022, 3:05pm

Welcome to the forums BurningCopper.

Just a guess but I’d image the source files are not local enough for Duplicati to be able to “work with them” in a proper fashion. I don’t know all the details as to why but Duplicati really wants it’s source files to be local, truly local. I like the idea but I don’t think you’ll be able to use Duplicati in this way reliably.

Have you looked at just running Duplicati directly on your NAS?

gpatel-fr · August 13, 2022, 3:59pm

I have a setup backing up a NAS from a PC running Win 10 and performance seems all right. Snapshots are of course impossible but generally backups don’t take forever. It’s a very different configuration than the OP’s though, all is done with Windows CIFS. NFS is another thing entirely and it’s possible that the library used by Duplicati don’t work well in this case. From a quick Internet search, NFS is prone to slow file listing in some contexts, especially when not having huge amounts of RAM.

ts678 · August 13, 2022, 4:41pm

Technically there’s more than counting, processing runs in parallel, and it’s hard to isolate a bottleneck.

Channel Pipeline says what goes on inside . About → Show → Log → Live → Verbose shows file work, however for easier large scale use (and timing to the second), it’s worth setting up a log-file maybe with log-file-log-level=verbose to see if time per file times 150 million is ballpark reasonable for waiting days.

I have to say I worry about checking that many files over NFS which Duplicati obtains access to via OS.
snapshot-policy is probably the fastest way to identify changed files, but it requires LVM to be available, and TrueNAS may or may not be willing. But even without that, I’d think a local file scan would be faster.

BurningCopper · August 13, 2022, 5:31pm

Interesting that NFS might not play nice with Duplicati. I can say that sftp gives similar performance. I could easily share the same files over smb/cifs and see if there’s a difference.

I wonder what would cause it? I don’t think RAM is the issue here, since the VM never consumes more than 50%.

Thanks for the tip. I’ll have to test it out.

BurningCopper · August 13, 2022, 5:49pm

Interesting… I found that the Duplicati process was niced to 19, which seems pretty low considering I set the preferences in Duplicati to give it the highest thread priority. I suspect that option isn’t actually doing anything.

I manually reniced the root Duplicati process down to -20 and now I’m seeing >15MB/s transfer rates during transfer, and the processor utilization is refularly jumping up to 70% or higher. Processing seems to be at least part of the bottleneck. No idea if the gains will continue throughout the entire backup.

gpatel-fr · August 13, 2022, 6:02pm

it’s set in systemd unit:

sudo systemctl cat duplicati
# /lib/systemd/system/duplicati.service
[Unit]
Description=Duplicati web-server
After=network.target

[Service]
Nice=19

ts678 · August 13, 2022, 8:02pm

I vaguely recall having seen on Windows that some of the priority stuff only applies during the operation. This might be worth a close look on Linux, for both the initial (idle) nice value, and how backup moves it.

gpatel-fr · August 13, 2022, 9:47pm

right, it is doing sp in a bit too simple way:

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/Library/Main/ProcessController.cs#L205


      
          

                  data = RunProcessAndGetResult("renice", $"20 -p {pid}");

                  if (data.Item1 != 0)

                      Logging.Log.WriteWarningMessage(LOGTAG, "BackgroundPriorityError", null, "Failed to get background IO priority, exitcode: {0}, stderr: {1}", data.Item1, data.Item3);

                  else

                      m_hasEnabledBackgroundIOPriority = true;

              }

          }

          else

          {

              var data = RunProcessAndGetResult("ionice", $"-p {pid}");

              var results = data.Item2.Split(new char[] { ':', ' ', '\t', '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);

              var ioclass = results[0];

              if (string.Equals(ioclass, "idle", StringComparison.OrdinalIgnoreCase))

              {

                  m_originalNiceClass = 3;

                  // Only allowed for "best-effort" and "realtime"

                  m_originalNiceLevel = -1;

              }

              else if (string.Equals(ioclass, "none", StringComparison.OrdinalIgnoreCase))

              {

that there is no error checking (contrary to the Windows and MacOS parts of code doing the same thing), so if something goes wrong in setting the priority nothing at all will be reported. Also ionice is not the same as renice. So if the process is processor bound, it will remains so it seems.

ts678 · August 14, 2022, 12:57am

ionice(1) - Linux man page

ionice - get/set program io scheduling class and priority

renice(1) - Linux man page

renice - alter priority of running processes

I’m not a C# dev and haven’t looked hard at cited code, but I’d guess that’s use-background-io-priority.
thread-priority might be in code that handles System.Threading.Thread.CurrentThread.Priority

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/Library/Main/Controller.cs#L652-L656


      
          if (!string.IsNullOrEmpty(m_options.ThreadPriority))

          {

              m_resetPriority = System.Threading.Thread.CurrentThread.Priority;

              System.Threading.Thread.CurrentThread.Priority = Library.Utility.Utility.ParsePriority(m_options.ThreadPriority);

          }

github.com

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/Library/Main/Controller.cs#L517-L521


      
          if (m_resetPriority != null)

          {

              System.Threading.Thread.CurrentThread.Priority = m_resetPriority.Value;

              m_resetPriority = null;

          }

Thread.CurrentThread Property
Thread.Priority Property
ThreadPriority Enum

ts678 · August 14, 2022, 1:46am

Decrease server’s CPU and IO scheduling priority #2722

Setting low priority helps to run server process truly in background and provides CPU cycles to Duplicati only if no other process requires them.

Apparently the devs thought this was a better setting. I’m not sure, but it’s certainly not in line with

which might be out of ordinary usage, or at least amusingly extremely stated in the other direction.
The question still exists whether this plan sticks or is changed by other options for the actual work.
What I actually had in mind was someone looking at nice values at idle versus during the backup.

EDIT:

Preferably look at the thread level. Figuring out which threads are relevant might be difficult though.

gpatel-fr · August 14, 2022, 6:13am

Yes, doing things with VM has a side-effect that is often underappreciated, being that the host OS don’t know anything about the guest OS priorities and schedule - or not - the whole VM as a single process.

Edit: I should of course have written ‘generic host OS’. Dedicated hypervisors (such as Vmware) exist for a reason.

BurningCopper · August 29, 2022, 4:18pm

Update:
I’ve transferred the files to a test server and attempted running the same backup with the same settings from there. The files are now local and the server should be well up to the task (24 threads, 64G RAM, storage is on a local ZFS array).

Results:
The counting phase still takes 3 days to complete (Yes, I know it’s doing more than just counting files ) The transfer rate after several days is ~580KB/s (even though I’ve confirmed that the target can easily handle 15MB/s). RAM usage, processing, and other resource usage appear to be minimal.

If the storage is local, iops and bandwidth shouldn’t be a problem. Duplicati should be packaging up files and sending them to storage at something like the max transfer rate. I know the files are small and already compressed, but I’d guess that would put the processing as the limit, but the CPU is seeing ~2% utilization.

Any suggestions of any additional troubleshooting I can do to find out where things are going wrong?

Thanks!

BurningCopper · August 31, 2022, 2:57pm

Another update:
I installed cockpit so I could monitor the data transfer. Network traffic is pretty well zero, with occasional 100Mbps squirts of data to the S3 share.

Meanwhile, processor utilization continues to be negligible.

I’m at a total loss as to what the issue could be. It’s a lot of files, but I’m kinda shocked that Duplicati would fall all to pieces like this over a 3T backup.

gpatel-fr · August 31, 2022, 3:37pm

did you follow this advice ?

BurningCopper · August 31, 2022, 5:55pm

I haven’t tried that yet. I assumed that moving to a local file system from NFS would make it irrelevant. I’ll give it a try though.
At this point, I’m more worried about the 700KB/s transfer rate, which seems really low. No matter how many files there are, with the local file system I would think that, after it’s done with the counting phase, it should be able to package the files quickly and send them up in 100MB chunks or stress some system resource (disk io, processor, RAM, etc). Though it’s possible I’m misunderstanding how Duplicati works.

gpatel-fr · August 31, 2022, 7:37pm

from what your monitoring tool describes, it seems to be what happens (except for the ‘quickly’ part).

While it’s wild speculation, maybe the Duplicati zip library is single threaded ? Maybe take a look with htop to check on that ?
In this case if your files are already compressed, it’s often a worse case for compression and if a single core is used, it could be rather slow. It may not apply though, since there is a file (default_compressed_extensions.txt) setting the file extensions that should not be compressed again (unless all your files are compressed and have an unusual extension or none)

BurningCopper · August 31, 2022, 8:35pm

Hmmm… I can say it’s definitely using multiple threads, though it takes some serious priority adjustment to get it much above 50% utilization of all cores.

The point about already compressed files is interesting, though. The data is from an MRI machine. So, it’s almost entirely images (compressed with JPEG2000 or LZW, I think), but there would be no hint of this from the extensions.

Maybe I could see if I can turn off compression entirely and see if the speed improves…

ts678 · August 31, 2022, 9:54pm

There’s not really a counting phase. It’s pipelined as described. Is this initial backup or an update?
Update is probably dominated by searching for changes among 150 million files. Initial is probably dominated by packaging and uploading all the easy-to-find new data. This is parallelized per your:

and it’s not clear writing randomly to 8 dblock-files-to-be will speed things up. Hardware dependent.
Do you have any measurements of things like disk utilization (maybe. from %util in sar) or iowait?

“Quickly” may depend on the random-writes speed of the tempdir folder. Is this now on hard drives?
Regardless, you should be able to see the files grow. Note that if source files are smaller than value

then you get one block of the file size (plus attribute metadata). Eventually it hits 100 MB and is sent.

is I think increasing the random-reads parallelism of the source side. Is this good with your hardware?
If the ZFS array is mechanical, it’s probably easy for it to be slower at multiple random read/write than network uplink can move data, and there are limited ways to avoid this. Maybe a RAM disk might help because you definitely don’t look CPU limited, but were there some stats on how utilized the array is?
The OS probably has some ability to make up for this, but there’s a bottleneck somewhere to discover.

IMO it depends on the storage. My PC mechanical drive does 4KB random reads and writes at 1 MB/s.
If you happen to have an all-flash array, then ignore much of the above. If not, probably need metrics…

EDIT:

How to Check Disk Queue Length in Linux discusses iostat, and shows another way that sar can help.

I guess I’ll also mention that the database can grow and get slow at over a million or so blocks tracked, meaning your blocksize is too low, however if you’re on initial backup (are you?) it might not slow yet, because you might see this as high CPU usage (not happening) and maybe high disk use (not known).