Slow backup of VHD file

StevenJohns · July 24, 2018, 9:24pm

H All,

I am running 2.0.3.9 and I am seeing extremely slow backup performance.
I have a VHD file (not a running instance) and I have set Duplicati to backup this single file to a local NAS box across a 1GB LAN.

I currently see speeds of 745 bytes/s and therefore this backup will still be running at the end of this millennium !

I have set the volume size at 200MB, .vhd files are in the compression-extension-file (so it shouldn’t compress vhd ?).

Not sure what else I can do to get this running at an acceptable speed.

just to confirm, the VHD file is a completely separate copy of the guest that Duplicati is running on (scripted on the host to shut down the guest and copy the VHD file to another folder - this folder is the one that Duplicati is backing up).

if I use Windows Explorer to copy the file across to the NAS box, it goes at full wire speed (1GB/s) and completes in an appropriate time so this has nothing to do with the underlying hardware / network / NAS - this is Duplicati getting something completely wrong.

Any suggestions because at the moment this is an unusable product (in terms of speed) compared to almost everything else I’ve used.

thanks

Steve

samw · July 25, 2018, 12:28pm

Welcome to the forum @StevenJohns. Are you able to provide some more information like the OS version and the size of the VHD. Are you willing to forgo the deduplication features of duplicati to get this backup going?

StevenJohns · July 25, 2018, 9:07pm

Hi SamW,

yes, of course - what would you like to know?
the server is a Windows 2008R2 fully patched. the VHD file is 381GB
deduplication is obviously a huge issue, but I’m willing to try anything - how do I disable it ?

thanks

Steve

JonMikelV · July 27, 2018, 5:16pm

Has an initial backup finished yet or are is it still running? Duplicati is very much slower during the initial backup than most subsequent ones.

If the initial backup is done and you’re seeing this issue with later ones, then my guess is you’re running into the situation where so much of the VHD file has changed it’s almost the same as an initial backup.

Pectojin · July 27, 2018, 6:50pm

First things first. To deduplicate a 381GB VHD file into 100KB chunks requires 381 million hashes to be calculated. For each hash Duplicati must look in the database and check if it exists. If it doesn’t exist Duplicati must write it to the database before adding it to the volume to be uploaded. If it does exist it can skip the chunk. The database lookup time gets progressively slower for each hash added until all unique hashes have been added.

Comparing copying a file over a 1GB network connection to calculating 381 million hashes, making 381 million database lookups, and then copying the file over the network is going to leave you a little disappointed

On repeat backups Duplicati will check the file and if it’s changed it will calculate the hashes again ideally getting to skip most chunks as they’re duplicates… Except when they’re not.

Duplicati doesn’t do variable chunking, so if you insert 1 bit in the beginning of your 381GB VHD then the entire VHD will be misaligned and, in the worst case, you’ll actually need to re-upload 381GB and 1 bit.

This isn’t usually an issue when backing up normal files because, spread over the entire dataset, the duplicate chunks heavily outweigh any misaligned blocks and these misaligned blocks only happen to the specific file where content was inserted rather than appended at the end of the file.

I would strongly discourage you from doing what you’re trying to do. I’d almost be willing to bet money that every backup will be re-uploading 50+ GB of data without even knowing what the VHD is used for.

The issue of dealing with virtual machine disks is on Github here: VM virtual disk files - not deduping · Issue #3255 · duplicati/duplicati · GitHub