Proper settings for backing up large files without excessive downloading?

Hi, I’m hoping someone can help me figure out what settings I should be using for this, because I suspect that I am doing it wrong.

What I want to back up are several large files (full disk image incremental backups in the 10-20GB range) and one even larger file (~100GB). I do not want versioning (the files are already versioned), just a single backup of this data. The target is OneDrive business.

The nature of the data means that daily, one new file will be uploaded and one old one deleted, and some subset of the larger file changed. I believe Duplicati can back this up without needing to re-upload the entire 100GB file, sending only the changed bits - that’s why I want to use it.

I set the remote volume size to 1GB, and left the block size at the default.

What I’m finding is that very soon Duplicati wants to do a Compact, and to do this, for some reason, starts downloading gigabytes worth of data. I don’t want it to download anything (other than the verification file after a backup), unless I’m doing a restore. It shouldn’t need to, it’s got all the data locally, I just want it to send the new data and delete the old.

If I disable compact with --no-auto-compact, is my backup just going to keep growing in size without deleting old data? Should I have chosen different block or volume sizes? What’s the proper way of backing up large files without wasting a lot of remote space and without unnecessarily downloading a lot of data that I just uploaded, only to re-upload it again?

Compaction does download volumes. It will take the non-expired blocks from multiple volumes and repackage them so they are stored more efficiently on the back end.

You can disable it but it most likely will cause your backup size to grow larger in OneDrive. Duplicati won’t be able to delete remote volumes until all blocks within that volume have expired.

Ah, so even with auto-compact disabled, it will still be able to delete data once an entire volume can be deleted? In which case, it suggests that the correct settings would be to make volume size smaller (100MB might be OK without proliferating too many thousands of files) and disable auto-compact. That way it won’t need to download data to compact, it will just delete the volumes from OneDrive.

Yep, so yes using a smaller remote volume size may help you in this situation.

How efficient are the subsequent backups? I’m curious how well the deduplication is working for full disk image files.

I will gather some information over the next few days (backup runs daily), but my gut feel is that deduplication won’t be doing much because the images are already deduplicated. I don’t need to keep multiple backup versions because the images are already multiple versions.

What I am backing up is the output of Veeam full disk backup, which produces one full backup and then incremental backups. As each new incremental backup is created, the oldest is merged with the full, and deleted. So the daily duplicati backup will consist of adding the new incremental (which should be quite deduped already, by its nature), deleting the oldest incremental, and applying the changed blocks of the full.

The reason I am using two backup products is because Veeam can’t back up to OneDrive, and Duplicati can’t back up full disk images. After once spending days restoring a system after a failed disk I’m never doing that again, even if I didn’t lose any personal files. So now I consider any solution which doesn’t allow me to easily restore the whole system back to a working state is not acceptable.

Yes, I also use multiple backup products. In my case I use Macrium Reflect for image level but this is just at home. I agree that image level is mandatory for quick recoveries! Veeam is an excellent product if you work in a virtualized environment. I do really like Duplicati for my user data as it can very efficiently store many versions.

If you don’t care about versioning with Duplicati, and you suspect its deduplication won’t work too well, then maybe Duplicati is not the best choice. Have you looked at rclone? It is an awesome product for synchronizing files to/from cloud providers, and I believe OneDrive is supported. It will keep your backups in native format (unlike Duplicati), thereby making your recovery process quicker, too.

I’m honestly open to suggestions, but from my understanding rclone does not do block-level sync, so every day I would need to re-upload the entire 100GB full backup (in addition to the latest incremental). If there was a tool that was like rclone, but could upload only the changed 10GB or so of a 100GB file then that would be ideal. Duplicati is the closest I’ve found.

My environment is not virtualised, but Veeam has still been the best full disk backup I have found (except for the not supporting making an off-site copy to OneDrive thing, of course!)

Yes I believe that is true, but if Duplicati’s deduplication doesn’t work very well it may not be much different. That’s why I was asking about about how efficient your subsequent backups are (how much data was uploaded vs the size of “new” files).

It’s been a while since I looked into delta sync, but out of the drive sync type services, Dropbox was the only one that supported it. OneDrive, Google Drive, etc., do not natively. Veeam targeting a Dropbox sync enabled folder might be interesting to test.

1 Like

As requested, here’s some stats on backup efficiency. I’m using the 100MB volume size now, threshold at 90, and 1 verification file.

Size-wise, Duplicati reports Source: 255 GB, Backup 276GB

Today’s Veeam full disk backup resulted in deletion of about 14GB of data, addition of 11GB new data, modification of around 10GB of a 89GB file.

Replicating those changes off-site to OneDrive Business using Duplicati resulted in (according to the Duplicati backup log): 21GB uploaded, 5GB deleted, 0.1GB downloaded.

After the backup, compaction reports total of 3.97% wasted space (10.96 GB of 275.78 GB)

Backup operation took about 40 minutes.

I’m happy with all of that. As long as I don’t see wasted space increasing unreasonably, or excessive downloading of volumes to perform compaction, I’m satisfied with this solution.

Dedupe is working pretty well it seems! Glad this solution is working for you. Thanks for following up.