Performance Very Poor on Large Files and Archives

mcole · June 6, 2021, 10:34am

Hi there,

I’ve started using duplicati to back up my NAS to offsite, appreciating very much the gdrive native integration. I started going case by case and now left is my video media, for everything else performance was quite OK.

Its two backup folders, one 3TB and files of 500-2000MB and one 10TB and files of 4GB to 20GB.

Using latest beta, environment is a QNAP NAS with AMD Ryzen 8 Core CPU, 8GB memory. SATA for backup, SSD for docker environment of duplicati.

The symptoms I see are that performance starts out at 20-30MB/s, within an hour is to 10MB/s and after two days is at 5MB/s. I seem to be running into the same issues as described two years ago here: Poor performance - #17 by Bogmonster. This is with settings of volume size of 1GB / 4GB, encryption and otherwise defaults.

Nothing seems maxed out, maybe iOPS/latency but at 150/s also that is not really anything for SSD.

I tried different settings, larger block size (4MB, 64MB, 256MB), which didn’t seem to have any impact on performance except to consume more memory, making me to kill the process after it took up 6GB.

I’m now trying to do performance measurements iterating through different settings with encryption, compression, and dblock, but feels a bit poking in the dark. Is there any advise what settings make sense and end up with a performant duplicati for the size and kind of backup described above? I’d look to get something like 20-30MB/s which would be acceptable andseems to be possible since the max I could get on smaller backups was 40MB/s.

Thanks,
Marcus

ts678 · June 10, 2021, 5:57pm

Welcome to the forum @mcole

I think there’s too much system dependency and too few volunteers with equipment. General advice is at Choosing sizes in Duplicati, but I’m not sure what (if any) actual lab testing (no central lab) backs that up.

My own rule of thumb (based on very little testing) is to try to not have more than 1 million blocks/backup, implying 10 MB blocksize for the 10 TB backup, but you’re already testing in that range and even beyond.

Other general advice is to use fast storage in all storage areas such as source, temporary, and database storage (which for Docker systems should be mapped through to host storage – don’t persist in Docker).

What does the slash mean? You tried both? I’m not sure if very large remote volumes help backup speed, but restore speed might suffer if you have to download multiple large volumes to obtain all required blocks.
4GB is also pushing the limit of standard ZIP. Duplicati can do ZIP64, but I’m not sure how transparent it is.

How the restore process works

Do you have per-CPU-core and per-drive-info for all drives? Aggregate data won’t show the bottlenecks.

concurrency-block-hashers could possibly be raised from default 2 to keep more CPU cores going, but source file read performance might be worse because the file read pattern might appear more random. Lowering it to 1 might make source reads more sequential which might help if you’re drive-seek-limited.

Note that Google Drive has a 750 GB per day upload limit. I’m surprised it’s not throwing errors at you…

If running 24 hours per day, I’m calculating that upload limit should hit at somewhat below 9 MB/second.

Do you have anything that graphs network upload speeds over time? Ideally, only look at Duplicati’s use.

  --asynchronous-concurrent-upload-limit (Integer): The number of concurrent
    uploads allowed
    When performing asynchronous uploads, the maximum number of concurrent
    uploads allowed. Set to zero to disable the limit.
    * default value: 4

helps keep the upload going at closer to its maximum speed (what is yours, and what else is uploading?).
Do you have any fast somewhat spacious local destination so you can see if speed is limited by upload?

tempdir is where Duplicati builds the .zip files with your data, makes an encrypted version, and uploads it.
You can probably watch progress with ls -lcrt --full-time dup-*. Files that are growing are the .zip files being built. Ones sitting might be seen by ls -lurt --full-time to be read out, e.g. during upload.

asynchronous-upload-limit should allow production to get ahead of upload – if production is able to do that.
You could probably see if you have lots of static full-size files in the area. If not, might be a production limit.

How the backup process works describes the process.

mcole · June 12, 2021, 4:38pm

Hi,

thanks for the reply.

That I tried both sizes, the 4GB for large media, 1 GB for small media files without any effect.

The info was per drive and per CPU core, I forgot to mention that, increasing parallel hashers as well as compression did not have any effect either in my tests (though I did that with default block size only, which in retrospect may explain the limited impact since it could have it other bottlenecks).

Ah, sorry, that was me throwing a random fact in there that I appreciated about duplicati. All tests for the performance were done locally. duplicati running in Docker, with source and destination being SATA drives mounted via volumes. Duplicati docker (database, temp files) running on an SSD. So everything wrt network you mentioned does not apply.

[some background in case its interesting: When writing the post I only had smaller backups running into GDrive. The big ones never completed due to performance reasons. I don’t have the bandwidth at home to get 10TB of initial backup data uploaded to Gdrive within a reasonable amount of time, the plan is to upload these from the office where I have more bandwidth and can split this over multiple days given the 750GB limit. Then incremental backups are going via my home line.]

Thx for the tips on the temp dir and files, I used that to locate bottle neck and see at which phase of generation it happens next to profile level debug output. Zipping was never the issue, also next to iotop, during zipping throughput went up to close to max of SSD limit (250 MB/s) when testing on SSD drive only. Same with copying temp files over, which happens at max pace of the USB SATA drive connected (60 MB/s).

I tried to put all of this together and ultimately as far as I could get the only logical point was block hashing is the bottleneck. So to isolate that I basically set everything else to a fixed value, set # of concurrent hashers, compression, etc. to max CPU count. And then created a testbed of scripts that ran for 2h then got killed on the same maxed out config with varying block sizes on the large media archive (10GB avg file size, 10 TB in tital) and low media archive size (2GB avg file size, 3TB in total).
After letting it burn over night and then looking at output of how much data it pushed (512k, 1M, 2M, 4M, 8M, 16M, 32M, 64M, 128M, 256M), 16MB was the winner by a significant margin. I then re-run both full backups with the 16MB block size and concurrent hashers to CPU cores, it generated a continuous performance of around 22MB over the whole backup, with an average CPU usage of 60%. CPU pattern is 100% for a while, then going down to 40% for a few minutes, going back to 100%. I guess its happening at times when it hits the --asynchronous-concurrent-upload-limit that its not fully used, probably hitting some limit on IO or elsewhere that isn’t obvious to spot.

I still think it should be possible to get more, but performance is fine for my use case and incremental backups I tested achieve the same performance and since there upload is anyway the bottleneck I don’t mind.

Thanks for the tips and clarifications, helped to set up a simplified test bed and rule out different bottlenecks.

Best,
Marcus

ts678 · June 12, 2021, 5:53pm

Interesting, and I have no idea why larger block sizes would be slower. I just expected diminishing returns.

Thanks for the great dive into it so far. You’ve gone wider and deeper than most. There’s always a wish to have improved performance, if you have any interest in chasing further, but core staff is kind of swamped. There’s a current push to move to a faster framework such as .NET 5 or 6, which could add some speed.

SHA512 as standard hash #2419 was an interesting proposal, though I have no idea if C# will show gains.

mcole · June 12, 2021, 9:45pm

I’d love to dig in an see how to speed things up, but my current forte is Scala and Python. .NET and C# is 8 years in my past and I’m probably no help =)