Block size for local backup

Like others who are being abandoned by Code42 (Crashplan), I’m looking for an alternative to back up a bunch of machines (Mac, Windows, Linux, spread out over multiple locations). My plan is to back up everything to a local Synology NAS file server using Duplicati and then backing up Synology to the cloud, probably using something like backblaze B2.
One of my machines has about 5Tb of data on it (video and audio) but if I use the default blocksize, it seems that Duplicati will produce a huge number of files in a single directory on my Synology and I’m unclear as to how many files can be stored in a single directory on the Synology. The default size of 50Mb seems like it will produce about 100,000 files. Is that too many?

There are many tradeoffs in choosing the size, I have written a detailed explanation here:

hi,
Your article about sizes is clear and very comprehensive quality wise. But does someone have some real life figures and experiments? Of course I understand it depends a lot on both the machine, the storage provider, the link and data file characteristics. But I imagine pictures, music or video have some specific characteristics that may lead to specific efficient figures.
Cheers

I use the defaults for my backups (images, source code, etc) to S3. I do not have any experimental data from others, but maybe someone will tell what settings they use?

I’ve created a small storage calculator, which will give you basic information about how much storage you need and how many files approximately will be created at the backend.

4 Likes

This is 100% anecdotal, but I’ve backed up my full media collection (modest sized, ~700GB) to a local backup drive using volume sizes of 2GB, and it’s worked well. For B2 I use much smaller blocks (~200MB) so as not to instantly exceed the daily download limit when duplicati does a download-n-verify, but there are no such constraints on a local HDD. I’d be curious to see if someone has some benchmark data on relative speeds, though.

I have read the block sizing article and tried using the storage calculator and am still confused about what block sizes to use. (No offense intended, kenkendk and kees-z. The article and tool are very well written. This is clearly a PEBCAK problem.) I did an analysis of the files I need to back up and here’s what I came up with:

image

3/4 of the files are under 1 meg in size, with almost all the rest between 1 meg and 500 meg. I am willing to submit to the wisdom of Duplicati’s default settings for block & dblock sizes, but I am hoping one of you “pros” might see something in the file/size distribution which would steer me in another direction.

I am also considering using the backup-test-samples option to verify backups, but by back-end provider (B2) charges for download bandwidth, so I obviously want to minimize the amount of data downloaded to verify backups. Does using that option change how I would set the block and dblock sizes?

I’m about to “fully commit” to Duplicati and begin a cloud backup of almost 1.5T of data, so I want to be sure I get it right the first time. Any advice would be greatly appreciated.

I’m no expert but it looks too me like a 50 or 100m size might be best for you.

Note that this Topic was originally for a local, not cloud, backup so you might be better off asking in a new cloud specific topic…

Your table gives me an idea, though. Would it make sense to offer a “review my source files, destination type, and bandwidth then suggest an appropriate size” feature…

(Apparently I’m a few months slow on the uptake as a “block size suggestion tool” has already been discussed on the old Github forum. :slight_smile: )

JonMikeIV,

I had thought about posting a new topic, but I’m new to the group and wasn’t sure of “Extend this Topic vs. New Post on Similar Topic” sentiments on this forum. Every board has it’s own culture on this matter. I’ll post a new topic on cloud-specific sizing.

I liked kees-z’s calculator, but it would be nice to have one that does the reverse: enter information about the files, sizes, destination, & bandwidth and have the model suggest an “optimal” block size. Unfortunately, I’m not smart enough about Duplicati block handling to figure out how such a model would work. Perhaps if I get more knowledgeable on the subject I can take a stab at one.

1 Like