Backing up to multiple destinations?

Kahomono · October 3, 2017, 10:11am

You can sort of achieve this now by maintaining your filters in a text file and then pointing different jobs to it.
No UI, but more flexible.

I do this by maintaining a directory containing only symlinks to what I want backed-up. Set my symlink policy to “follow” and point my backups at those directories. Then I can change what’s in each backup set by just adding/removing links in the one dir of what to back up.

(Windows users, you have my sympathy. But you can replicate this with Junctions)

JonMikelV · October 3, 2017, 1:12pm

That’s a pretty clever way to do it! I know Windows users can use the ‘mklink’ command but I prefer to use the Link Shell Extension (http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html). Note that “linkability” on Windows varies based on the drive formatting (NTFS, FAT32, etc.).

I’m curious, have you tested what happens if you do a restore to other than the original path? I assume you get a compete file tree, not recreation of symlinks with restoring over the original file locations…

Kahomono · October 3, 2017, 4:40pm

If you restore this way it shows you

+root
|
+-> Symlink 1
|   |
|   +-> First Linked Dir
|   |    {and all its subdirs}
|
+-> Symlink 2
|   +-> Next Linked Dir
|   |    {etc.}

etc.

Very workable.

magnust · October 16, 2017, 9:08pm

I’m in the same boat as many of you here. A number of backup jobs to backup to a local computer. And then the “same” jobs run again to a remote computer.

It’s really sub optimal to do the 100% exact same job of hashing, compressing and so on twice.

I pondered on running local jobs first and then a simple file sync from the huge pile of backup files to a remote location. But I came to the same conclusion as above, that there is a risk of copying the backup files “mid job” and then I’m in trouble…

JonMikelV · October 16, 2017, 10:08pm

There’s also the hassle of getting your synced files into an appropriate location for restores if your primary backup destination goes bad.

Of course the mid-job-sync issue MIGHT be worked around with a --run-script-after triggered sync. I assume it’d be something like the following combined with a --run-script-befoe semaphore check:

create semaphore file
tigger sync
when sync done remove semaphore file

That’s a very good point I hadn’t really considered. As long as your multiple destinations have the same sources and block / dblock sizes there’s little reason not to just send the resulting dblock file to both destinations.

sanderson · October 16, 2017, 10:26pm

I know this is kind of KISS, but could you not just back up the destination from 1st local backup? Disable hashing, disable compression, can even disable encryption if you want to (and any other unnecessary process given what is being done) assuming the job is only being used to replicate an existing backup off-site.

There would still be a little overhead generating dblocks, but this might actually be useful. You could use large 1GB dblocks for the local backup, then use more appropriate dblock sizes based on internet connectivity etc. for offsite replication.

The fact that jobs run serially and not in parallel would actually be helpful in this instance. Ideally you’d stagger the start times appropriately, but even if you didn’t the second job wouldn’t run until the first one finishes.

JonMikelV · October 16, 2017, 10:38pm

That would work (assuming you have want local-and-remote, and not remote-and-remote) but the restore process would be a bear due to having to to restore basically the ENTIRE backup somewhere, then restore actual files form from the restored backup.

If Duplicati had a sync or “clone destination” feature that could be be triggered at the end of a backup to “make destination B look just like recently completed destination A” that could go a ways towards what people seem to be wanting. Though we’re still stuck with double transfers if destination A is remote…

An additional benefit of “clone destination” is that, if written flexibly enough, it could be used by people who are wanting to move their destinations from one provider to another.

kees-z · October 16, 2017, 10:42pm

That approach would restore operations from the backup-of-the-backup much more difficult.
To restore a single file from it, you would have to restore all files from your destination to local storage.
Then you have to start a restore operation from the folder containing your restored backup files, requiring a (partial) rebuild of the local database.
Probably this is a very time consuming and storage demanding procedure.

/Edit: @JonMikelV: you beat me with this one!

magnust · October 17, 2017, 8:20am

What if…

You just have the possibility to have two destinations in a job. The two backup filestores should be kept 100% identical. As long as nothing goes wrong it’ll be a pretty quick task to check that they are in sync and update with the files to each location. Just upload/delete each new/changed file from the temp directory to both targets as you go, running the hashing and compressing once.

IF something goes wrong, i.e. the two targets are not identical when starting the job (or when finsished…) you could either isssue a warning and fail or give the option to sync. Choosing to sync target a to b or b to a will be done by:

if one target is corrupt -> copy non corrupt target over corrupt target
if the targets are ok but don’t have the same last job run -> copy newest over to oldest
if both are corrupt… well that’s obvious

What I see from testing is that my backup jobs run for many hours each day, even while just doing updates and not full backups. Full backup take over a week. So what I’m VERY interested to know is how much time could be saved doing it this way instead of running each job twice, currently exactly doubling the total time compared to running only one destination.

A real cool thing with this way is that you can choose to have some backup jobs run to one location and some to two, you can decide when setting it up depending on your type of data, size and so on. Yes in two-target jobs you’d have to have identical block sizes, retention, frequency of running job and so on identical. But that sounds extremely ok to me.

Kahomono · October 18, 2017, 11:45am

I’m also concerned about this issue. I have backups to local destinations, and the destination folders syncing to pCloud.

The volatility of the files in the local destinations are pretty heavily overworking my sync. I want the sync’ed dir to be usable. I plan to restore-test from all destinations… if the synch EVER finishes.

Maybe I should stop the daily backups until all the local dirs sync one time? Then I might get the “catch-up” syncs to finish inside a day.

Niels_Hoogenhout · October 18, 2017, 12:19pm

Maybe I should stop the daily backups until all the local dirs sync one time? Then I might get the “catch-up” syncs to finish inside a day.

That would indeed help since less new files will be created for a while. After that you should just test of your sync can keep up with the daily updates it has to handle.

I’m not familiar with pCloud, but did you find the reason why sync can’t keep up? Is it the limited upload speed of your connection, or is it the limited upload speed to pCloud? In case it’s the second option, maybe you can backup directly to pCloud (if that’s possible) and let the sync only handle the desired non backup files you want to sync. That way you might be able to force a faster total upload speed to pCloud.

drwtsn32 · October 18, 2017, 3:20pm

Does this approach actually work with Duplicati? Reason I ask is that I believe it won’t run more than one backup set at a time with the default scheduler (I could be wrong). And I don’t think there is a way to prioritize backup sets. Say you add 100GB of movies to your low priority backup set… that will run for a while and your high priority set won’t start until it’s complete, right?

With CrashPlan this approach DID kind of work because you could set priorities on backup sets. If a long-running lower priority backup was still in progress when a high priority backup was scheduled to run, it would stop that lower priority backup to run the high priority one.

I used to do 3 tiers just like you when I was on CrashPlan but with Duplicati I’m trying to keep it simple and have just one backup set. The lower priority data rarely changes so in reality it doesn’t hurt to have backups run frequently as high priority data… it doesn’t do anything with data that doesn’t change.

Nelvin · October 18, 2017, 8:24pm

You’re right, the backups don’t run in parallel but this usually only matter for the first initial run, now my lowprio backups usually are done within a few minutes as there’s rarely something new to backup, and even if it’s a GB here or there it’s done within an hour even when uploading to a cloud service.

Initially I thought about a single backup set too, but I noticed that the local databases for the backupsets grow kind of linearly with each run - my high prio backup (170 versions) is already at about 2GB so guess this would become huge quite fast if I only use a single backup set. Also given I want redundancy I’d need the whole set at least twice. I guess the database will stop growing when I don’t keep versions forever - I might change that config at some point

JonMikelV · October 18, 2017, 8:33pm

If you’re interested in reducing backup density (thinning versions) over time, the latest canary build (2.0.2.10) has initial support for a --retention-policy parameter as described here.

drwtsn32 · October 18, 2017, 9:11pm

I look forward to trying it out but will wait for it in the next “beta” release.

Nelvin · October 19, 2017, 12:05am

Wow that sounds like a great feature that will help manage backups, especially in the longterm quite a bit. Looking forward to use this in the future.

Sami_Lehtinen · May 13, 2018, 6:12am

I’ve also been thinking for this option. But usually it’s good to use also separate backup technologies for different destinations for reliability reasons. If you only use Duplicati all of your backups could be totally ruined.

Yet in situations where I need to have files on multiple locations, I’ll usually run backup to local storage drive, and then mirror the data set to destinations using whatever tool works best with that destination. rsync, robocopy, ssh, webdav, sync (with whatever destination having it’s own sync app). - This works very well.

But it’s always important to have that alternate backup, unless duplicati won’t work. Also having local backup set (duplicati or not) is very important for restore speed. The remote backups are only for total loss of site, cases. If something needs to be restored, using (more or less) local source, like NAS is much better and faster option.

Nelvin · May 13, 2018, 8:56am

Yep, I’ve added Duplicacy as a second backup tool for that reason and, in addition to the cloud targets I use 2 local external harddisks of which one is always at my parents house (about 200km away) and I just swap them whenever I’m there for a visit.

drwtsn32 · May 13, 2018, 9:43pm

Just curious … do you use the free command-line version or did you pay for the GUI version?

Currently I have 2 other backup programs: CrashPlan (which I will drop when my license term runs out) and CloudBerry (something I experimented with and bought a license for before I found Duplicati). I also have a license for Macrium Reflect for doing image level backups.

Nelvin · May 14, 2018, 6:39am

I bought a license but it’s not required and you’re actually much more limited by the GUI as the CLI version has more options/can be used in a more flexible way (the GUI is just a frontend, but not supporting all the options). I use both, the GUI and the CLI, the CLI for manually handled small backups and the GUI for a big, all including set.