Specifying multiple drives as backup destination?

duplo97 · January 16, 2023, 8:38pm

I have about 30TB of data stored on two drives which I would like to backup to a QNAP TL-D800S enclosure (8 drive JBOD storage connected via 2x SFF-8088 cables) which mounted only for the purposes of making the backup and subsequently stored offsite. I know relying on disk as backup isn’t ideal, but it’s what I have available at present and it’s better than no backup.

For clarity, the drives containing the 30TB to be backed up as well as the QNAP would be connected to the same headless server that Duplicati is installed on.

30TB source is mounted as :

/data/d1
/data/d2

QNAP target drives are mounted as mounted as:

/qnap/d1
/qnap/d2
…
/qnap/d8

The operating system sees the 8 drives in the QNAP as 8 discrete drives (and I want to keep it that way rather than risk one huge JBOD or bothering with a RAID array which isn’t backup to begin with.

Is it possible to have Duplicati write the backup out to the QNAP drives, filling one, then the next until the backup is completed?
On the assumption (1) is possible … As some of the drives in the QNAP contain data I’d ideally prefer not to override, is it possible to point Duplicati to a folder on each of the 8 drives into which to write the backup?

ts678 · January 16, 2023, 11:11pm

Welcome to the forum @duplo97

Can you explain a little more how you would like these used?
Are you looking for redundancy, rotation, app-level pool, etc.?

EDIT:

How the backup process works might give useful background.

duplo97 · January 17, 2023, 7:09am

Apologies, it was late when I posted. I’ve edited it to clarify - hopefully what I’m looking to do makes more sense now?

ts678 · January 17, 2023, 1:41pm

Developer documentation explains its simple storage model that big numbers of storage types allow:

The backends encapsulate the actual communication with a remote host with a simple abstraction, namely that the backend can perform 4 operations: GET, PUT, LIST, DELETE. All operations performed by Duplicati relies on only these operations, enabling almost any storage to be implemented as a destination.

But the aim for minimal support works against your wish for very specialized support. That needs help.

Backup to multiple cloud is a similar question that got some unlikely options that you could confirm fail. Fortunately local storage is a little easier, and UnionFS might be possible, although it likely adds risks, including drive failure or setup and operation accidents. Different RAID levels add different protections.

Setting up mergerfs on JBOD’s (or a poor mans storage array) is (I think) a step removed from kernel support, using FUSE, but I’m not sure that it can aggregate folders easily or at all, but it may be “light”.

On the heavy end of the scale is another system or maybe a VM on this one. Not my area, but a post:

Explain to me like I’m 5: Why would I use TrueNAS Scale over Unraid?

(rattles off some names that I’m not familiar with – but other forum users use some and “might” assist)

You might also be able to find or build a “better than nothing” file copier program that will fill-then-move. Keeping up with changes would be the challenge.

If you find a way to give Duplicati file or network storage, all I’ll say is to beware of performance for that amount of data. Raising blocksize so there are no more than a few million blocks helps the SQL speed.

I tried a fast Google search on a Reddit where people worry about larger data, and even found a 30TB:

Backing up a 30TB dataset on to multiple 8TB disks? (102 comments – maybe something there helps)

EDIT:

What is the correct syntax for an on-the-fly union remote? and Union and –backup-dir might help store, although whether or not it would work with Duplicati Rclone backend (for 4 operations at top) is unclear. Tapping into the interface rclone uses with your own scripting would allow dangerous do-it-yourselfing.

Xavron · January 18, 2023, 7:38am

Personally, I’d only do this far more simply with least amount of problems and possible problems by just making multiple backups each with selected folders that fit on each with extra room. Once you pick it, its going to be good.

Why overcomplicate it lol. It will use up to the max space of each. In the end its the same without all the crazy and some benefits on top.

Actually cross that out. I wouldn’t put 30TB inside of containers lol. That’s a recipe of disaster. It better all go well and stay well. Duplicati is more of a beta state and with some of its possible issues, I’d recommend avoiding that regardless.

ts678 · January 18, 2023, 2:54pm

What does container mean? If it meant the UnionFS idea, your initial proposal was to not go that way.
Estimating sizes might get difficult, but for a rough guess one could take the sizes of the source trees.
The downsides include figuring out what goes where, and administering smaller backups, which have advantages. If one of them breaks, it’s probably faster to fix. If it won’t fix, then there’s less of a loss…

One potential pain point of a large backup is database recreation (due to damage or loss in a disaster) works very hard to locate referenced blocks. Typically the dindex files say what dblock contains what. Loss of a dindex can leave an unresolved reference, causing a search through perhaps all the dblocks.

Raising blocksize means fewer blocks per dblock, smaller database, and faster SQL, but 30TB means roughly that much in dblock files, unless there’s a lot of compressible or redundant data to reduce size.

For whatever it’s worth, there are some people wondering how large they can push Kopia, a newer tool which at least seems to be in active development. One can decide on their own about its maturity level.

Maximum usable size of the repository? Petabyte scale possible?

EDIT:

I did a Google search on “TB” to see what the record was for Duplicati backup size. Unclear, but found:

Duplicati for large archive? (asking about 14TB and growing, and getting back ideas on how to handle)