How should one determine the size of those chunks?

David_Jameson · October 13, 2017, 6:18pm

Now that I’m able to actually start a backup (thanks to those who helped, you know who you are I have a related question that I think asked when I was doing my original experiments on an older synology.

What’s the scoop with the default 50Mb chunks? Is the value dependent on how many files or the total size of them? If so, might it be better to wait until you see how many files (total size) have to be backed up before deciding on the appropriate size of these chunks?

Wim_Jansen · October 13, 2017, 6:58pm

This article should help you out.

JonMikelV · October 13, 2017, 7:01pm

50MB is just a safe default size that reduces the likelihood of running into bandwidth caps.

A “quick” summary is that Duplicati chops up files into smaller blocks and those blocks get compressed into dblocks (archive) files. So size vs. number of files doesn’t matter much. 1,000 x 1M files and 1 x 1,000M file will both end up looking about the same at the destination (assuming no de-duplication occurs).

Since the dblock file is what’s built and transferred around you’ll find you need local temp space big enough to handle a few dblock files while being built, sent to the destination, and downloaded / verified. Downloads also happen during version cleanup - so if you say you only want to keep the 5 most recent versions of files, then when a 6th version is backed up the dblock with version #1 gets downloaded and re-compressed. (but maybe not right away, see below for more detail)

Since one or more dblock files are also downloaded and verified with each backup so if your destination (or pipe) has bandwidth limits, very large dblock sizes could run you into bandwidth quota issues.

Low quality (frequent drops) connections between source and destination would be another reason to stick with a lower dblock size (less data to re-send when a drop occurs). However high quality and/or fast connections (such as LAN or even local disks) can benefit from larger dblock sizes. I’ve heard of people going as large as 2GB, though that’s an extreme case - mostly the larger ones seem to be clustered around 500MB or 1GB.

Pretty much all of the above can be enabled, disabled, or adjusted with advanced parameters. For example, you can turn off download/testing of dblock files or adjust the number of “pending” dblock files get created while waiting for the current one to finish being uploaded.

Note that dblock sizes CAN be changed after backups have been created, but the new size is only applied to new files. Dblocks using the older file size won’t be touched until they are deemed “sparsely” filled (such as when deleting older historical versions) at which point they’ll be downloaded, combined with other sparse dblocks, and re-compressed at the newer dblock size.

However, you can NOT change the block (file chunk) size once backups have been created. For that one you might want to review your content and consider having multiple backups with varying block sizes.

For example, a backup of JUST video files that don’t change often might work well / run faster with a larger block size. The same would apply to infrequently changed music files, though a smaller block size that video might be more appropriate. Keep in mind that the block size is the smallest chunk of processing that happens - so if you have a 500MB block size (crazy, don’t do it) and change a single character (maybe fix a typo in some meta-data) then the entire 500MB block gets reprocessed.

For the official summary, check out this page:

Edit: Woops - missed hitting “post” be THAT much…

David_Jameson · October 13, 2017, 8:44pm

Very helpful — I would suggest though that for users who are either not technical or like me (technical in other areas but not system administration) that your system proposes a suitable size after seeing how much data has to be backed up. In fact you could get very cute and measure the bandwidth of the connection (just like speed tests do) and use the combined information to recommend the right size.

For what it’s worth, I mostly live in a “usability” world — my concern is always around, “how do we make this as easy as possible for the end user”?

JonMikelV · October 13, 2017, 9:11pm

I believe that suggestion has been made before - and if it hasn’t, it’s because I forgot to post it.

Unfortunately, we have more functional changes needed than developers at this time so usability features like that tend to fall on the back burner. But as an open source tool anybody is free to try and speed up development on specific features by checking out the source code and working (or offering a bounty) on them over at GitHub…hint, hint…

David_Jameson · October 14, 2017, 3:18am

Hint taken!

(or offering a bounty)

David_Jameson · October 14, 2017, 5:13pm

Hmmm, I just noticed that over 10% of my contribution went somewhere other than to Duplicati. That sucks. Can I pull back my contribution (I know I can get my CC company to do that if I have to) and contribute DIRECTLY to Duplicati?

JonMikelV · October 15, 2017, 12:57am

I don’t, I guess I should have paid more attention when I put out my bounty… If you do find a way, please let me know.

Note that while a direct contribution would be appreciated, a bounty (even at 90%) MIGHT get a feature done faster simply because more developers are likely to look at it…

David_Jameson · October 15, 2017, 3:21am

Actually, I didn’t do it in expectation of features being prioritized on my behalf. You guys already provided enough support to get me up and running and I’ve already done a test restore (backups, like insurance companies, are useless until you know they can deliver when you have a catastrophe!). My suggestions were really made in an effort to save others from experiencing similar issues.