Database recreate performance

JimboJones · February 19, 2023, 7:01pm

Seems to me, raising the default -blocksize to a larger value could be a good thing moving forward, I’m thinking 100K just isn’t quite enough for the amounts of data people are now backing up.

@gpatel-fr Thanks for the effort on a fix.

This is the in the developer section so I’ll toss a bit more onto the fire…

1- As mentioned above, changing the default -blocksize to a larger value (YTBD) is likely a simple change that could be implemented in the next release (I know, also YTBD). Short of re-compiling with the new default there shouldn’t be anything else required, no re-writing of code or conflicts it’s just a default change that would likely prevent a lot of user frustration moving forward as this sort of thing seems to be coming up more and more often.

If that’s not in the books for the foreseeable future could we create a “hot fix” of sorts, something users can run post install to adjust the -blocksize setting to a larger default value? This leads to item 3, but stop by item 2 first.

2- Change the wording in the Duplicati documentation around -blocksize to emphasize how it should be scaled up based on data sizes. Maybe a note like if you’re backing up more than 0.5TB you absolutely should increase the value to “xyzKB/xyzMB” or more. I started re-writing the manual page on the topic but lost the edit in a stray reboot, I’ve been trying to get back to it but things have been busy as of late.

3- Create a routine that can change the -blocksize for an existing backup set. Now I haven’t really looked into this all that hard but I know there are many threads around here on the topic and at the end of the day it can’t currently be done but holy moly would it be nice.

I get that the whole DB and backup set would probably have to be recreated and storage would likely increase following the process. I also expect that the process would require at least 2x the storage space in the first place to make it even close to a safe process but I don’t think that’s all that out of reach for that many users. I think this should be a local process vs processing directly to a remote destination.

Storage is cheap and time is money… An external 12TB USB3 drive is only a few hundred bucks and getting cheaper by the day. If the process has to download the existing backup set to a sufficiently large local drive, convert/verify it, remove the old one (option to keep either version locally), purge the old backup set then upload the new set, that doesn’t seem like the worst thing. You can at least reuse the external drive afterwards to create an additional local backup.

I’m by no means saying that’s a simple set of tasks to complete but there has to be a way it could be done and if so would probably really help with user retention.

If the users backup set is too large to be converted locally then they would need to create a new backup with a “better” -blocksize value. That brings up another thing, in a changing data set the “best” -blocksize could change next week or most certainly after a few years, chances are your data is going to grow not shrink so being able to move up a new -blocksize at some point seems like a really valuable feature.

Sure, if you’d rather just make a new backup set then by all means but I really think that that option only appeals to a very small percentage of users faced when faced with that reality. One of Duplicati’s best features, in my mind is it’s versioning and to loose a years of versions would be a huge hit for many, if not most.

I’ll do some more reading on the subject…

4- I’ll try to get some tests setup to see if I can catch TM and Duplicati clashing.