Configuration tips for large (200TB) SME cloud backup

ts678 · July 18, 2023, 2:45pm

I think you might hold the record for large, but Google allows numeric range search so you can try
"duplicati" "1..999 tb" which unfortunately misses when it’s all run together, such as 200TB.

The rough rule of thumb for Duplicati scaling is to stay below 1 million blocks, else SQL gets slow. Sometimes raising the blocksize from the rather small default of 100 KB (good for 100 GB) solves,
however if there are lots of files, the tiny blocks (about a hundred bytes) of file metadata ruins this.

Although available volunteers and their equipment are too scarce to focus on performance testing, recently I tried a test with a larger backup than my usual rather selective one, and hit this issue by forgetting that I set up a 10 million file folder with 0 byte files, so that’s an extreme small-file setup.

Basically (as near as I can tell) it bogged down in the database despite a very generous blocksize. Observations were made at both the drive level (backup to an external drive) and on file accesses.

Recently one trick was added, which is to increase the database cache which I haven’t tried yet… Duplicati was running as a Windows service, and it’s harder to play with the environment variable. Possible after we gain confidence in that new option, it will become available as a Duplicati option.

My motivation for this test was that I used Macrium Reflect Free for occasional PC image backup, however it’s going away, so I was looking for replacement, then wondered if any file backup could perform as fast. Answer at least for Duplicati is that it can’t – but I’m happy enough with an image, provided I can restore specific files. Basically that plus frequent selective file backup works for me.

You’ve got both problems – lots of data and lots of files, so I think you should be looking elsewhere.
Maximum usable size of the repository? Petabyte scale possible? is a Kopia thread on large cases.

My personal opinion is that scaling well to petabyte or even 200 TB will not be possible for Duplicati, however reliable backups (an issue for any solution) at smaller sizes is still a worthwhile target goal.

Kopia is a somewhat newer entrant. You’ll see the thread mention restic, which I think is older, but it reportedly didn’t scale-to-huge well either. Finding a mature free solution may be hard. Good luck…

If this is a bet-the-business situation, choose carefully and consider having several backups available.

so you probably want something that deduplicates well for the additional data. Many solutions do that.

There is certainly less expensive storage even from major vendors, but there is usually a catch such as minimum file retention period. Some may require cold storage which can be a pain if you want restores. Simply pointing this out because this might get costly, and might go well with backup software selection.

If you go image based (I guess this was also referred to as “block based”), then the question is how well deduplication happens if you need frequent backups too. Some people have tried combining base plus frequent backups done different ways. You can find them here asking for backups of recent file changes.

That’s about all I can suggest for generalities, but there are forums where large backups get discussed.