Sooo.... Sloooowwwww... Uploading to B2

TheDaveCA · April 18, 2018, 5:24pm

I’m moving from zbackup+rclone to Duplicati, as well as rolling out Duplicati to a couple servers that were using other backup techniques in the past.

Unfortunately the backup speed is proving to be slow when using B2 as a destination. I’m seeing 200KB/s to 1000KB/s as reported by Duplicati. Both servers can upload to B2 much faster using rclone alone.

I was initially assuming it was the compression bottlenecking as Duplicati was using 100% of a single core, but I am now backing up a ~120GB dataset which is already deduplicated, compressed and encrypted so I set --zip-compression-method=None and --zip-compression-level=0 and CPU use is averaging around 1%. Still slow.

I started digging around in the forums and there are other reports of B2 being slow, apparently because Duplicati is uploading single-threaded. Fair enough, I understand it’s not a trivial change to start uploading multiple files at once.

Is the rclone option any faster than using B2 directly?

rclone implements multiple uploads at once using --transfers, this includes support for uploading single large files split into chunks, as well as uploading multiple files at once. B2 allows chunks as small as 5MB, but recommends at least 100MB per chunk with smaller files not being split. Testing with rclone’s --transfers and --b2-chunk-size switches basically confirms that uploading a single 50MB file in one part vs 5MB chunks doesn’t make any difference.

Based on this, I wonder whether Duplicati places multiple files into a directory for rclone, or does it still just pass a single file at a time to rclone? If it places multiple files to upload at once (maybe up to --asynchronous-upload-limit?) then switching to rclone might have some benefits, but if Duplicati is still passing a single file then I’d rather have Duplicati talk directly to B2.

Obviously if I had enough local disk space for Duplicati to work locally then I could use rclone or any other tool I like for the final upload. Unfortunately not all of my environments have enough space for this to be possible, but for environments where I have the space I will definitely take this approach.

Are there any other options/optimizations I have missed? Or do I perhaps just need to invest in patience for the initial upload and then evaluate whether this is really an issue on a day-to-day basis based on the (unknown) rate of changed data?

Pectojin · April 18, 2018, 7:26pm

I don’t think the rclone backend will help. It implements the same IBackend interface, so it uses the same Put, Get, Delete commands as the other backends. So even if the rclone call can be made in parallel, the Duplicati main process would just call it once and wait for it to finish before calling it again.

A “Concurrent processing” upgrade is in the works, but I don’t know if it includes concurrent uploads yet.

JonMikelV · April 19, 2018, 8:34pm

Though I thought there was one, I just did a quick scan of Github issues going back about a year and didn’t find anything about concurrent uploads.

Assuming I’m not just having a bad search day, it might be worth creating an issue there with some details about getting it going on B2 then putting a Bounty on it to see if it attracts any developers…

JonMikelV · April 21, 2018, 3:56am

Guess it was a bad search day.

https://www.bountysource.com/issues/38460805-support-parallel-uploads-for-a-backend