Slow backup Mozy vs Duplicati

Hi there,

I come from Mozy and I am testing Duplicati on 32.500 files and 34,5 GB (including two pst files. 15,4 and 7,4 GB respectively).

  • The remote storage is SFPT.
  • Upload volume size set to 50 MB.
  • I understand that I use local database. No sure where to double check that
  • My backup routine is to close all applications and run a backup.

Part of my test is to do that routine with both Mozy and Duplicati (one after the other!) on the same data set (both backup are always executed in order to avoid a files changed bias in my testing).

  • Mozy takes 1 minute and 20 seconds to complete the process.
  • Duplicati takes 15 minutes and 35 seconds.
    Duplicati’s logs at the end of the post

I am ending an initial upload on backblaze to see if any protocol impact with upload volume size set to 300 MB.

To complete the test, I will do the same with SFTP : Upload volume size set to 300 MB and a fourth test with 50 MB on backblaze and share my results if any findings.

I am guessing that execution time difference is due to lack of server side software and fully encrypted files. (Mozy encrypt file by file excluding file name)

I am wondering if there is anyway to optimize the execution time.

Thanks!


DeletedFiles: 4
DeletedFolders: 0
ModifiedFiles: 37
ExaminedFiles: 32510
OpenedFiles: 46
AddedFiles: 9
SizeOfModifiedFiles: 24516821075
SizeOfAddedFiles: 5460932
SizeOfExaminedFiles: 37424583560
SizeOfOpenedFiles: 24522563832
NotProcessedFiles: 0
AddedFolders: 1
TooLargeFiles: 0
FilesWithError: 0
ModifiedFolders: 0
ModifiedSymlinks: 0
AddedSymlinks: 0
DeletedSymlinks: 0
PartialBackup: False
Dryrun: False
MainOperation: Backup
CompactResults: null
DeleteResults:
    DeletedSets: []
    Dryrun: False
    MainOperation: Delete
    CompactResults: null
    ParsedResult: Success
    EndTime: 13/03/2018 18:59:45 (1520963985)
    BeginTime: 13/03/2018 18:59:42 (1520963982)
    Duration: 00:00:03.0938055
    BackendStatistics:
        RemoteCalls: 8
        BytesUploaded: 50824279
        BytesDownloaded: 55828999
        FilesUploaded: 3
        FilesDownloaded: 3
        FilesDeleted: 0
        FoldersCreated: 0
        RetryAttempts: 0
        UnknownFileSize: 0
        UnknownFileCount: 0
        KnownFileCount: 1101
        KnownFileSize: 28547665209
        LastBackupDate: 13/03/2018 18:51:03 (1520963463)
        BackupListCount: 3
        TotalQuotaSpace: 0
        FreeQuotaSpace: 0
        AssignedQuotaSpace: -1
        ReportedQuotaError: False
        ReportedQuotaWarning: False
        ParsedResult: Success
RepairResults: null
TestResults:
    MainOperation: Test
    Verifications: [
        Key: duplicati-20180313T175103Z.dlist.zip.aes
        Value: [],
        Key: duplicati-ia6a66e88dfc8401ebe8f003cd0b1d92b.dindex.zip.aes
        Value: [],
        Key: duplicati-b94f71aafa0484a85bb601680d6cfaffd.dblock.zip.aes
        Value: []
    ]
    ParsedResult: Success
    EndTime: 13/03/2018 19:06:38 (1520964398)
    BeginTime: 13/03/2018 18:59:47 (1520963987)
    Duration: 00:06:51.2778378
ParsedResult: Success
EndTime: 13/03/2018 19:06:38 (1520964398)
BeginTime: 13/03/2018 18:51:03 (1520963463)
Duration: 00:15:35.8945829
Messages: [
    No remote filesets were deleted
]
Warnings: []
Errors: []

Thanks for Duplicati:

The major difference is that Mozy supports watching for changed files while Duplicati does not. It’s been discussed multiple places but not implemented yet.

In essensce, Mozy has to check a couple of files while Duplicati has to check all of your files for changes. This can be sped up with `–check-filetime-only’, which will reduce the time it takes to check each file, but each file will still be checked.

1 Like

Sorry if it has been discussed before. I had a quick look and did not find theses posts.
Thank you !

I couldn’t find them in a quick search on github either, so I’m wondering where I read it. But I know it’s been mentioned at least in passing a couple of places…

I’ll take a proper look later, but if there is no proper issue on github, perhaps we should create it to track it.

Therefore, the size of the upload volume has no (or little) impact?
Because the log of remote storage shows that Duplicati list files and that process takes 8 minutes, I was assuming that bigger but fewer files will speed up the backup process. (Maybe the list task with SFTP implies hashing files; then it will not have any impact).

I have search on github for FileSystemWatcher but have not find any opened issue. Do you know if it is an active issue?

On the protocol side, do you know if there is any performance differences between SFTP, Backblaze API, webdav.
I have seen this post but it does not answer the question:
https://forum.duplicati.com/t/webdav-vs-s3-vs-sftp

Thanks

Uups, I was writting the post while you were answering me.

Definitely !

Some mentions of file monitoring are here:


https://forum.duplicati.com/t/file-backup-order-for-du

A closed GitHub issue for Duplicati 2:

And some old and closed GitHub issues for Duplicati 1:


plicati/2545/2

1 Like

Hmm, 8 minutes of listing files does not sound right. I don’t believe listing files even has to download anything if it’s just running from the local DB. Are you seeing large CPU usage during the listing? Any network usage?

I notice it says 24 million files, that may be causing quite a bit of work for the database due to some not-that-optimized database queries being run when listing files. I think this may be the culprit. I’m not super familiar with those queries, but 24 million files produces quite a lot of rows to sort through when listing files.

If your refer to:
SizeOfModifiedFiles: 24516821075
SizeOfOpenedFiles: 24522563832

I understand this the size of modified files. This is due to my two pst files (15,4 and 7,4 GB).

Haha, I must be tired. I read SizeOfExaminedFiles instead of ExaminedFiles :slight_smile:

:slight_smile: !!
But this is not so much as far as I understand.
To answer

There is no CPU usage neither and very little network

Here the high level remote log:


Mar 13, 2018 7:06 PM: get duplicati-b94f71aafa0484a85bb601680d6cfaffd.dblock.zip.aes
Mar 13, 2018 7:06 PM: get duplicati-ia6a66e88dfc8401ebe8f003cd0b1d92b.dindex.zip.aes
Mar 13, 2018 7:06 PM: get duplicati-20180313T175103Z.dlist.zip.aes
Mar 13, 2018 6:59 PM: list
Mar 13, 2018 6:59 PM: put duplicati-20180313T175103Z.dlist.zip.aes
Mar 13, 2018 6:59 PM: put duplicati-i8542208148ab428a9504e89403fea746.dindex.zip.aes
Mar 13, 2018 6:59 PM: put duplicati-be2ad8420c00e4365850959a841ccb06b.dblock.zip.aes
Mar 13, 2018 6:51 PM: list

Attached a zip with the detailed log for Mar 13, 2018 6:51 PM: list log.zip (36.6 KB)

Too long to be paste and not able to attach txt files.

That’s very reasonable.

Looking a bit closer at the log info today it seems half the time is spent on verification of remote data
Verification: 7:51

So about 8 minutes spent on actually checking all the files and backing up the 5 MB that was changed. And no compacting or deletions were done, so it should only have to upload those 5MB and download maybe 100MB for verification

Those speeds are not great for either verification or backup. My laptop with a fairly dated 4288U completes a slightly larger backup to Google Cloud in 5 to 6 minutes and 3 minutes to a local Minio S3 bucket.

Slow download speed (either at your location or at backblaze) could explain some of those ~8 min spent on verification if it was like 1-2Mbit/s download, but the remaining 8 minutes for backup could hardly be an upload speed issue, so it would have to be slow file counting or an other process impacting the resulting time.

I’m not entirely sure what it could be, but you could try the --check-filetime-only option to see if assists with the file counting.