Better to configure multiple backups or one giant one?

Dericious · May 31, 2021, 7:29am

I have a lot of media, projects, linux ISOs, etc spread across multiple folders and hard drives? Is it recommended to just select all these folders into one main backup? Or would creating multiple backup configurations be better?

ts678 · June 2, 2021, 5:22pm

Welcome to the forum @Dericious

How giant is “one giant one” in terms of size? Large file counts might matter too. Both can slow things.
One can compensate some for this by using a larger blocksize. 100 KB default is small for big backup.

Anther thing to consider is that one backup has only one schedule. Do you want the same for all files?
Big backup will have longer run, longer compact when it’s done, and longer repair if the backup breaks.
Total times might not be very different (if blocksize is under control), but what size chunks do you like?

Dericious · June 2, 2021, 5:49pm

Thank you for your response @ts678

I have a lot of files, but my main 3 folders are:

Projects - 40gb (140k files)
Games - 565gb (280k files)
Media - 600gb (1.2k files) (very large files each)

I did an initial backup and it took 2+ days to run. I got an error at the end of backup so it’s running a database recreation job and it’s been stuck at ~90% for a day now. I’m really considering scrapping this config and using a much larger block and remote volume size.

What block size and remote volume size do you recommend? I don’t mind having large overhead as I will run backups overnight. I have an i7 8700k CPU and 950Mbps/25Mbps download/upload.

ts678 · June 2, 2021, 6:43pm

The 90% to 100% range is doing an exhaustive search for info that’s missing, maybe due to prior error.
Ideal is to not get past 70%. You can watch About → Show log → Verbose to see where a Recreate is.
After all its work, it might still find an issue at the end. You can try to note it, or start with a fresh backup.

I aim for about a million blocks which for roughly 1 TB backup means 1 MB blocksize if one big backup.

Choosing sizes in Duplicati gives advice on some areas. Your upload is faster than download? Usually opposite is true, especially on cable modem, however if upload is 25 Mbps it’s more like 4+ day upload.

I worry less about optimal dblock-size, also settable at Remote volume size on the Options screen 5.
You can also change it later (but doing so may make compact run soon, or not run when you’d prefer it).
Starting with a larger size should be fine, but probably not as important. The article discusses tradeoffs.

Dericious · June 2, 2021, 7:02pm

Oh I had that backwards my bad. It’s 950Mbps/25Mbps download/upload.

I did some reading around and I saw people saying they use 50/100/200 MB block size for the really large backups. I’m in the 1TB+ category so i’m going to try 100MB. That should be fine right?

Also is it fine to cancel my current job and switch block sizes? Do I have to delete my whole backup to set a new block size?

ts678 · June 2, 2021, 7:15pm

Yours is large, but not what I’d call really large. Some backups are tens of terabytes.
Large blocks may see worse deduplication, but maybe better in-block compression.
The article discusses some downsides of large blocks. You can certainly try them…
I don’t see any dramatic dangers, but 100 MB might be a little bigger than is optimal.

If you’re in the middle of a Recreate, I’m not sure if there’s a cancel. You usually don’t want to do process kills because it can affect the database, but in this case you’re going to delete the DB and all remote files. Running the backup again will start from scratch, as if first time, except I hope it won’t be seeiing an error.

Dericious · June 2, 2021, 7:37pm

Ok i’m going to try a 50MB block size and 100MB remote volume size. Thank you so much for your help!

Sami_Lehtinen · June 4, 2021, 5:53am

I’ve also split backups into several sets, because different data-types require different settings. Also retention requirements might be different for separated data types. Generally it’s not great to mix mostly passive and very active data if possible. Especially with large files that also makes compaction super slow operation when it happens. Smaller backups are easier and faster to manage.

Dericious · June 4, 2021, 12:31pm

Can you share what settings you split per backup?

michael9dk · June 4, 2021, 8:06pm

I agree with Sami.

I’m storing my files in different data-sets.
Primarily for performance - I don’t need to run a backup of my photos or music frequently, because they rarely change. The backup-retention is set to a long time.
My Desktop-backup runs seperately, because it changes constantly (it’s my “temporary work space”), and has a short backup retention of 3 months.

Sami_Lehtinen · June 5, 2021, 9:52am

Sure, but these should be pretty obvious ones.

Retention - As example some data MUST be deleted, and some other data MUST be retained
Volume / Block Size - Huge zero de-duplication files, which are just stored for extended periods without changes, versus files which are generated and deleted daily, versus files which are modified daily slightly etc.
Encryption, if data is already encrypted, there’s no reason to re-encrypt
Compression, if data is already compressed (or encrypted), there’s no reason to compress

But in general, it doesn’t make any sense to store sensitive logs which must be deleted with as example full disk images, which are stored for a long time. Because now the logs get stored with the images, and the compaction happens rarely and the data which is deleted is still lingering around breaking laws. Also the remote volumes are probably configured to be huge, so compaction process will be super slow and run (hopefully) rarely. Yet, if this is done over slow(er) network, it could take several days easily.

Another bonus of separating such data, is that when the volume contains only blocks, which all expire at the same time, then it’s possible to delete the whole dblock file without running compaction immediately when the data expires.

I hope I made some sense.