The process from listing files, checking files, extracting blocks, hashing blocks, hashing files, compression, encryption, etc is done in multiple threads. It does not upload multiple files concurrently, because there is some logic that is hard to get right if one file is uploaded and another fails.
In other words everything is running concurrently, but the uploads are still sequential.
If Duplicati uses single thread for upload, what does this parameter do then:
--asynchronous-upload-limit = 4
When performing asynchronous uploads, Duplicati will create volumes that can be uploaded. To prevent Duplicati
from generating too many volumes, this option limits the number of pending uploads. Set to zero to disable the
Is this just number of volumes to create for a separate upload in single thread?
I believe that sets how many archive files "ahead"of the current upload that are created. For example, if your archive (dblock) size is set to 100M and the upload limit parameter is set to 4 then Duplicati will queue up no more than 4 archive files awaiting their turn to be sequentially uploaded.
I recall at least one post where a user questioned why Duplicati needed 2 gigs of local temp storage until they realized they had the upload limit set to 4 and archive size set to 500M.
Ok, than you for re-interpreting this… I read it as 4 concurrent uploads on initial read, but after learning a bit more about how Duplicaty work, I realized that this is just an upload queue size… I think parameter name is more confusing than its description
Onedrive/Google still the champ - solid averages with no outliers
pCloud still strong but with a couple of (not so bad) outliers.
Backblaze is good but there was one major outlier (more on that below)
Box over Webdav looking pretty good (but over a slightly lower sample size since started later)
I examine averages as well as variability of the backup time. Jotta is a fail (and likely to be dropped from test suite soon) since there is so many outliers.
For one particular run with Box over Webdav I was actually watching the status bar and the outlier for Box (which has been otherwise very good and consistent) was due to operation ‘compacting data’ (and post verification). Not sure what happens in that operation but it sure is very costly and maybe worth devs to look into how to optimize the transfers to reduce the impact.
Obviously the best result is local storage which is much better than all cloud providers. One particular feature of pCloud is to give a virtual drive which may give best of both worlds (speed of local but then later sync to cloud safety and without the cost of local storage space).
Need more research since the pCloud virtual drive is (transparently) cached and so does consume some local storage but assuming that pcloud cache strategy is sound then then the impact to speed may not be high but otherwise invisible to user for managing local storage. Will update that in next round.
You’re doing such a great job so I don’t want to ask too much, but if you don’t mind, could you post your updated results directly in the actual post? It would make it much easier for people to find and read those results and they would be preserved as long as this forum exists.
If backing up moderate data sets (less than 1 TB) then B2 would be most economical. Speed is not bad (aside from one major blow up in my testing) with pay-as-you-use model. Need to be careful with the restore cost though.
For large data sets - the winner is… Onedrive. With the MS Family Pack mentioned in the post (Experiences with OneDrive as a backup target ) very economical for anything up to 5Tb with good performance (and you get MS Office as a freebie).
If speed is the only priority - then also can consider pcloud with using the virtual local drive. Speed of local backup with an separate cloud syncing happening. The firm is running an interesting 2Tb for life plan (but a rather high cost but breakeven after a couple of years compared to other paid services).
[[ Update ]]
further testing was done on S3 - see link below
In terms of conclusions - the above holds unless cost is not an issue and speed is highest priority - in that case S3 is clear winner (with caveats as noted in post
Not to sure on the providers having hashes as part of the API. Is there any way I can find out / test for that?
Right now I am pretty much retiring the testing as the results are pretty stable and conclusive now. I would consider testing again if there is new backend (anyone want to give me an invite for free storage somewhere ). However for Tahoe it just looks a little too advanced/manual to setup so haven’t tested.