How does Duplicati handle a change in storage media?

Sorry if that title makes no sense. Here’s what I’ve got:

Duplicati server running on linux. Set to back up the data directory on my server, about 100 gb. Backups are being saved onto portable media mounted in the FS.

I want to cycle the portable media so I can store at least some physical backups off-site. On the OS side, my solution is to copy the UUID for different portable media, so that my “backup media” always exists on the same mount point/directory. [I know I will need to manually re-mount when I do the actual swap}

How will duplicati react to this? On backup media #1 right now, there is the initial backup (so, very large) and then my incremental backups which are small as the data doesn’t change much.

If I swap out the backup media, will Duplicati recognize that the pointed-to backup folder (i.e. what is actually backup media #2, but mounted in the same directory) no longer contains the initial archive and will re-create it? [This is ideal]

Will it just continue to store the incremental backups on the new backup media, unaware that this folder no longer has the initial archive? If so, can I just copy the initial archive from backup media #1 to backup media #2 manually?

Or will Duplicati throw errors and no longer properly do any backing up?

No first to start the backup duplicati scan the destination directory and if occour a mismatch (eg file deleted) the process will be abort. IMHO you should simply use only one driver to backup and use it for synchronizing other drivers (I use winSCP under windows but it exist under linux!). Bonus: synchronization should be more quickly of backup.

2 Likes

which might be important if a local disaster happens and takes out the only backup on the on-site media.

A variation on the WinSCP idea is rclone SFTP (or some other protocol). This only propagates changes.
Questions about destinations, rclone, scripts and jobs talks about this and other possibly relevant topics.

If you’re sure you want this, the basic need is to keep local database correctly describing the destination, because database says what has been backed up, and this is how the later backups can write less data.

Although it’s possible to create backup jobs in pairs, and figure out which job goes with the current drive,

was an experiment which possibly loses a little performance but gives you lots of leeway on plugging in a drive and being sure that the database matches the destination files – and only one job has to be created.

Be sure you don’t unplug a drive in the middle of a backup – another problem the network method avoids.

1 Like

Thanks. Instead of setting up a process so “synchronize” a second backup media, can’t I just copy it completely to another drive?

How about creating another backup job with same data set being backed up. For example Backup job 1 for data set A scheduled Monday to Thursday on Drive 1. Backup job 2 for the same data set A schedule to run on Friday on Drive 2. Which you can rotate offsite.

If I were you I would run a second backup job directly to cloud.
I use ZFS replication for local site backup and duplicati to backup to cloud.

I’m not shure but duplicati should be ignore the metadata information, so you can simply copy every file every time but it’s tedious. In the process of synchronization the program copy only the modified files: it means more quickly and less stress for drivers.

Yes, it doesn’t care about details of the destination files such as timestamps. It just needs the data.

Depending on the chosen options, it may also mirror deletions on the original, which is a good thing.
Leaving should-be-gone files around not only wastes space, but has the potential to cause troubles.
Certainly one can empty the mirror-destination fist, but the full copy is slow, and worse if it’s remote.

robocopy for example has a /purge option that’s implied by the /mir option to mirror a directory tree.
rclone sync would also work, and (though it doesn’t matter here), I think it’s also good with metadata.

Lots of ways to go offsite. Great discussion. OP might want to consider process, convenience, cost, possibility of both drives being local at same time (thus no offsite), risk of a backup getting damaged (favoring two actual backups such as the DB-on-portable-drive one) rather than one backup + mirror.

@st678 yes, you are right but as you can say bad syncro behavior certainly cause trouble: what happen if a dblock is modified in ABC1234.zip.aes and it was copying whitout overwriting the file (eg: rename the new file ABC1234(1).zip.aes)? the backup will become inconsistent and I don’t know if duplicati got an error* in backup or restore - duplicato should be check only the *.zp.aes structure, not dblock “structure”. So the only way to synchronize more driver correctly is to overwite files according to file system timestamp and (optionally?) delete the files that aren’t in the backup directory of “master” driver.

*a worst scenario: duplicati straight forward and a corrupted file will be “restored” :roll_eyes: mmh in this christmas holydays, if remember, I can try to simulate this scenario…

I’m not sure if I’m following your scenario exactly, but I thought it’s important to mention that Duplicati never modifies files after uploading them. It may delete files, and during a compaction it may take parts of existing files to make a new file (with a completely new name).

mmh…

what do you mean for “file”? dblock file or .zip(.aes) file? or both?

If duplicati use the policy that you have mentioned my scenario could be impossible to make happen.

All files Duplicati places on the back end: dindex, dblock, and dlist, regardless of zip, aes/unecrypted, etc. Once a file is placed on the back end it isn’t modified (besides deletion).

Don’t let anything make up names. Names matter. Extra files (as mentioned before) can cause trouble. Missing files can cause trouble. Duplicati checks for the names it expects and complains on mismatch.

Deletions should be synced. You want an exact duplicate, except timestamps don’t matter to Duplicati. Typically they matter to the sync program though unless “sync” is copy of all the files to a cleared area.

Backup directly to two locations is a post from @drwtsn32 describing actual use of the sync strategy, replacing a two-independent-backup approach, and describing dislike of running two separate backups.

The drive rotation idea I cited above where the DB is on its drive is only possible because it’s a drive, but avoids the backup-twice and backup-differences issues – but it means only one current backup at a time. Offsite drive is good for disaster recovery, but it’s slightly stale. A network offsite is easier to keep current.

Files at the backup destination aren’t changed due to source changes. Any new data goes into new files.

There might be one exception (untested) to no-changes design. A purge might overwrite a current dlist. Alternatively maybe it writes a slightly different name, e.g. incremented by 1 second as upload retry does.