Should I use Duplicati to backup a few TB?

Hey everyone

I’ve used Duplicati in a small client with success and it’s been running smooth for almost a year now. So, first off, thank you very much for this beautiful piece of software.

The necessity for a backup grew in another client, tho, who is bigger and things aren’t how they should be as of right now. I finally got the green light to start backing up their data.

They have a PACS Server, so a lot, and I mean A LOT of small files that together make up for a huge amount of data (probably 11TB, not in a single drive, tho, which means they will probably be different jobs).

I read somewhere Duplicati’s DB could get messy with such amount of files. Even splitting jobs, I’ll probably have to upload around 3TB of data per job at some point, while keeping versions. I probably can do a new job each month, but I’d like something more “set it and forget it”.

Will Duplicati handle it well enough or will I run into problems? I’d like to think my setup would be as easy as in the first client, but this is a more complex setup on its own.

For large backup sets I think the most important factor is your deduplication block size. The default is 100KiB which is too small (IMO) for a backup that is larger than a few hundred GB. Small dedupe block size leads to a large/slow local job database due to the sheer number of blocks that must be tracked. Increasing the block size helps keep the database small and fast, at the expense of reduced deduplication.

Splitting your backup into multiple jobs could help, as well.

If you split the 11TB dataset into 3TB jobs then a dedupe block size of 2MB-5MB should be ok. If you had a single 11TB dataset I’d look at 10MB for the dedupe block size.

Don’t confuse block size with remote volume size (which defaults to 50MiB). Remote volume size is usually fine left at the default unless you are using a restrictive back-end that limits the number of files you can store. (See this for more info on choosing sizes.)

When you say “A LOT of small files” how many are you talking about? The number of files can affect the database size as well.

In any case, I say give it a shot with the larger dedupe block size. You should be able to “set it and forget it” with Duplicati even with large data sets like this. Make sure you enable some sort of monitoring, whether it is email notifications or duplicati-monitoring.com or something else.

Thanks for your insight.

Looking at last month, this server generated 900k files with 169GB total. It could be an outlier since things are going back to normal(-ish) due to the pandemics, but let’s go with it. Enough for around 21 months of files on this hard drive, which would be 18 million files at some point.

With this info, any further tips on what would be the best way to handle it?

That’s a couple orders of magnitude higher than any backup job that I personally run. I’m not sure how well Duplicati would handle that number of files. If you have the time and inclination, you could certainly give it a shot.

Welcome to the forum @mazz

I think the posted record is at least 26 million, here. It seemingly worked well enough to be worth a bug fix.

YMMV and please also test administrative actions like how long it takes to recreate DB if drive or DB dies.
I’ve seen Duplicati take awhile to read through dlist files, one per version, and yours are likely very large.

If you need to keep all versions, that’s lots of versions on top of a big file per version. Can’t you thin them?

I’m not sure what the turnover rate is for file content, but typically a new job means a lot more data saved.
I’m also not sure about the client’s attitude about backups. They had none before? Ideally, have several…

You might also find less performance than you like in GUI operations like browsing files in a Restore tree.

EDIT:

Actually, I’m not sure what “new job” means. It’s more space if it’s a clone of old to different storage area.
This method does add some historical safety. If new job breaks, maybe old job can still restore older files.
If it’s a fresh redo of original, it’s not more storage, but a big upload, and it conflicts with keeping versions.

Yeah, didn’t think about that. Recreating the DB to test my restore might be a problem, but I’ll have to give it a try.

About retention, I’ll do that, probably with the same policy I did in the other client I mentioned. There will probably be around 40 versions, still.

Not an ideal scenario to test things, but I think I’ll have to do that. I’ll reach back if shit hits the fan lol. Thanks everyone.