Recommended block size for 750GB and cloud storage?

Hi all,
I recently got the dreaded ‘we expected X files but found Y files’ error. I tried to repair to no avail and eventually did a delete/recreate. I watched the machine spin for about 10 days before a brief power outage killed it. I am considering starting over w/ more manageable chunking.

I have a roughly 750GB archive with about 250k files. This is mostly coldish storage, family digital junk accumulated over several years. New data streams in on a fairly constant basis as new email and pics come in and periodically docs/etc are updated but for the most part I am not worried any significant ‘file edit’ churn.

I am on a pretty stable line of about 20MB (power outage above is probably the second in 5 years), using wasabi and b2 as offsite providers.

My observations over the last 10 days were that duplicati was mostly pegging the io of my old fileserver hammering on sqllite file for about 80-90% of the time, then it would spin CPU for a while and go back to pegging IO (a pair of mirrored spinning platters). I have read several articles on block and dblock sizes and from these, i am surmising that the problem was around my block sizes being too small, ie too many lines in the sqllite db.

So, that is my situation. A few questions if I may. I think the main goal here would be to alleviate some of the local load (mostly sqllite IO) to make operations run more smoothly.:

  1. Am I correct that there is no way to alter my local block size after a backup is already created, right? (In any case, that would probably just blow up my sql lines further.)
  2. As I am considering starting over from scratch, does anybody have any suggestions on the local block sizes? I see the default is 100k. I was thinking maybe 2MB? 5?
  3. I actually think 50mb seems like a fine remote block size after reading Choosing Sizes in Duplicati • Duplicati. Maybe, as Block Size for Cloud Backup suggests, I’ll go to 200 to avoid too many pagination re-requests. Open to thoughts here.

Thanks for your time

Edit to add one more question. What do we want for file retention on the remote (b2/wasabi)? For Arq backup, their support told me it was a requirement that I didn’t enable versioning support on the destination. I don’t think it should matter, really, but does duplicate have any requirements here?

Hi @cornasdf, welcome to the forum!

Correct.

Tough call - larger block sizes will reduce database record count however it will likely increase hashing CPU load. Sounds like with your data you don’t have to worry about large uploads for small changes.

I’d run the math - 750G (source size) / 200 MB (dblock size) = 3,840 dblock files (assuming no deduplication, no compression, and 100% fill ratio) on initial backup. Consider that number if your provider has any max file count limits. (All very rough estimates, of course).

quote=“cornasdf, post:1, topic:6531”]
their support told me it was a requirement that I didn’t enable versioning support on the destination
[/quote]
It wouldn’t help anyway as Duplicati likely won’t be able to use destination provided versioning because anything pulled from their old versions won’t match your local database. (Unless you pull a complete folder version and use direct destination restore.)

For estimating how many remote files are created, average number of block files etc, there’s this Google sheet I created some time ago. Maybe this can help choosing the block size that fits your needs.

1 Like

The best fix, in my experience, is to just delete the one problematic backup version. Don’t try to repair or recreate your database.

I am backing up about 500GB on two computers (1TB total) and use the standard 50MB block size for what it’s worth.

How do you find and delete one backup version while leaving the others untouched?

To find the backup version you want to delete, just open the Restore files window and list the available versions from there. The most recent version is version 0.

To see what’s changed between 2 backups, use the COMPARE command.
To delete a backup version, use the DELETE command.

You can invoke these commands from the commandline, of by clicking your backup job and choosing Commandline.

The version you need to delete is reported in the error message itself: “Unexpected difference in fileset XX…”