I come from Mozy and I am testing Duplicati on 32.500 files and 34,5 GB (including two pst files. 15,4 and 7,4 GB respectively).
The remote storage is SFPT.
Upload volume size set to 50 MB.
I understand that I use local database. No sure where to double check that
My backup routine is to close all applications and run a backup.
Part of my test is to do that routine with both Mozy and Duplicati (one after the other!) on the same data set (both backup are always executed in order to avoid a files changed bias in my testing).
Mozy takes 1 minute and 20 seconds to complete the process.
Duplicati takes 15 minutes and 35 seconds. Duplicati’s logs at the end of the post
I am ending an initial upload on backblaze to see if any protocol impact with upload volume size set to 300 MB.
To complete the test, I will do the same with SFTP : Upload volume size set to 300 MB and a fourth test with 50 MB on backblaze and share my results if any findings.
I am guessing that execution time difference is due to lack of server side software and fully encrypted files. (Mozy encrypt file by file excluding file name)
I am wondering if there is anyway to optimize the execution time.
The major difference is that Mozy supports watching for changed files while Duplicati does not. It’s been discussed multiple places but not implemented yet.
In essensce, Mozy has to check a couple of files while Duplicati has to check all of your files for changes. This can be sped up with `–check-filetime-only’, which will reduce the time it takes to check each file, but each file will still be checked.
I couldn’t find them in a quick search on github either, so I’m wondering where I read it. But I know it’s been mentioned at least in passing a couple of places…
I’ll take a proper look later, but if there is no proper issue on github, perhaps we should create it to track it.
Therefore, the size of the upload volume has no (or little) impact?
Because the log of remote storage shows that Duplicati list files and that process takes 8 minutes, I was assuming that bigger but fewer files will speed up the backup process. (Maybe the list task with SFTP implies hashing files; then it will not have any impact).
I have search on github for FileSystemWatcher but have not find any opened issue. Do you know if it is an active issue?
On the protocol side, do you know if there is any performance differences between SFTP, Backblaze API, webdav.
I have seen this post but it does not answer the question: https://forum.duplicati.com/t/webdav-vs-s3-vs-sftp
Hmm, 8 minutes of listing files does not sound right. I don’t believe listing files even has to download anything if it’s just running from the local DB. Are you seeing large CPU usage during the listing? Any network usage?
I notice it says 24 million files, that may be causing quite a bit of work for the database due to some not-that-optimized database queries being run when listing files. I think this may be the culprit. I’m not super familiar with those queries, but 24 million files produces quite a lot of rows to sort through when listing files.
!!
But this is not so much as far as I understand.
To answer
There is no CPU usage neither and very little network
Here the high level remote log:
Mar 13, 2018 7:06 PM: get duplicati-b94f71aafa0484a85bb601680d6cfaffd.dblock.zip.aes
Mar 13, 2018 7:06 PM: get duplicati-ia6a66e88dfc8401ebe8f003cd0b1d92b.dindex.zip.aes
Mar 13, 2018 7:06 PM: get duplicati-20180313T175103Z.dlist.zip.aes
Mar 13, 2018 6:59 PM: list
Mar 13, 2018 6:59 PM: put duplicati-20180313T175103Z.dlist.zip.aes
Mar 13, 2018 6:59 PM: put duplicati-i8542208148ab428a9504e89403fea746.dindex.zip.aes
Mar 13, 2018 6:59 PM: put duplicati-be2ad8420c00e4365850959a841ccb06b.dblock.zip.aes
Mar 13, 2018 6:51 PM: list
Attached a zip with the detailed log for Mar 13, 2018 6:51 PM: listlog.zip (36.6 KB)
Too long to be paste and not able to attach txt files.
Looking a bit closer at the log info today it seems half the time is spent on verification of remote data
Verification: 7:51
So about 8 minutes spent on actually checking all the files and backing up the 5 MB that was changed. And no compacting or deletions were done, so it should only have to upload those 5MB and download maybe 100MB for verification
Those speeds are not great for either verification or backup. My laptop with a fairly dated 4288U completes a slightly larger backup to Google Cloud in 5 to 6 minutes and 3 minutes to a local Minio S3 bucket.
Slow download speed (either at your location or at backblaze) could explain some of those ~8 min spent on verification if it was like 1-2Mbit/s download, but the remaining 8 minutes for backup could hardly be an upload speed issue, so it would have to be slow file counting or an other process impacting the resulting time.
I’m not entirely sure what it could be, but you could try the --check-filetime-only option to see if assists with the file counting.