Backing up to multiple destinations?

alex_london · September 1, 2017, 11:55pm

Hi,

I’ve been evaluating Duplicati as a replacement for CrashPlan Home (yes, another future refugee here), and so far I like what I see, great job, with a lot of potential too!

Is it possible to configure the same backup set to target multiple destinations? I would like both a local and an offsite backup from the same selections, using the same encryption key, schedules, etc.

I know in theory I could just backup locally and then sync that to a cloud provider, but there’s no guarantee the synced files will maintain integrity of the data, as they could be done out of order, or another backup might run while it’s still syncing the previous one etc. I feel Duplicati would be better suited to handle this, rather than rely on a 3rd-party sync solution.

I also know I could just have two backups configured, one for local and one for remote, but keeping their configurations in sync would then become a manual process.

Or maybe a hybrid approach, where we have two backup jobs configured, and a way to “copy” or “sync” everything from job A to job B except the destination info?

Thanks!
-A

drakar2007 · September 2, 2017, 4:57pm

I agree with your use case (i back up my work laptop to B2 and to Google Drive, and it’s been a bit of work learning how to keep their configurations manually synced) - however I’ve learned it’s not too hard to keep the sources and filters, at least, synced - when you make changes to one, just switch to “edit as text” mode, ctrl-A / copy, then go into the other one and paste. It is a bit of hassle (and there’s potential to mess up as the backup you’re currently editing isn’t shown very clearly), but doable at least.

saviodsouza · September 2, 2017, 5:31pm

There is a open feature request in GitHub. Can’t find it now but it’s something like …raid…

RAID 0 : Cluster different online storage to make a single repository.

RAID 1 : Mirror Duplicati repository to 2 or more destinations.

Raid ?? : more ideas

dgcom · September 2, 2017, 9:34pm

I think, this feature was requested several times before, but might be not very easy to implement, especially if two destinations become out of sync…
Running two separate backups can be inefficient - each will spend resources on compression and encryption.

For local/off-site model you can try backing up locally first and then syncing destination to cloud with tools like rclone…

-D

apet · September 3, 2017, 12:13am

Yet another refugee evaluating your great solution! Hi everyone:)

My use case:

one backup set (all my files i want to backup)
one local destination (hdd on separate pc)
one backup location (remote pc)

both pc’s are available at different times and while one already backed up my weekend pictures, the other one does it later - in my case, they are always out of sync

The feature is basically almost there - you described the workarounds. If one backup set could create a child backup set that is in sync with the parent e.g. when you add another destination path, would that be it?

Would be great to have a feature like this:)

Thanks!
-A

alex_london · September 4, 2017, 11:37pm

That’s what I’m testing right now (Duplicati backup to a share on my Synology NAS, then use Cloud Sync to upload/sync that share to OneDrive).

But my concern is that the sync process is not deterministic (or unknown to Duplicati), and obviously much slower than Duplicati’s local backup. With a mere 2-3 Mbps upload bandwidth, it is a very real scenario that sync of the previous backup has not completed by the time of the next local backup. At which point, new files will appear for sync, and they may get prioritised over older ones. Which will leave the Duplicati sync copy in OneDrive in a potentially inconsistent/corrupt state.

Hence why I think doing this all from within Duplicati itself would be more reasonable - as I guess it can handle restoring from an interrupted backup much better than restoring from a bad/incomplete sync.

-A

dgcom · September 5, 2017, 12:27am

If you have slow upload, Duplicati’s own backup to cloud will be very slow and you will not be protected at all until it completes…
But if you first backup to local destination, you at least have full local backup in case you needed it.
The size of incremental chunks depends on how much you change data… YMMV, but my incremental portions are very small…

In any case, setups like this are double-edged sword anyway - you have to decide on what it priority for you - complete local backup and slow sync to cloud or less complete, but in progress cloud backup…

-D

alex_london · September 5, 2017, 12:39am

I agree about the slow upload, but the use case I’m trying to describe is slightly different (and perhaps I’m mistaken about all this and I’m just making a storm in a tea cup):

Initial backup is completed, and fully synced
New incremental backup starts
Cloud sync starts for the new backup files (but does not complete)
Another incremental backup starts
Sync is ongoing, but now has even more files to process…
Go to step 2

You can imagine steps 2 through 5 could go on forever, with no guarantee that sync will upload the files generated in (2) before processing the ones generated in (4) and so on… which could leave the cloud backups unusable (beyond the initial backup).

IIRC the way CP handles backups to multiple destinations is it prioritises local/faster destinations over slower ones. So it will interrupt backup to a slow destination if it’s time to “refresh” a faster one.

-A

dgcom · September 5, 2017, 12:56am

Yes, in scenario you describe, when incremental portion is so big that it cannot be uploaded in time to the cloud - you will face the inconsistent cloud backup…
As I said - all depends on the usage: I usually run incremental couple of times per day and delta can be uploaded very quickly.

-D

Nelvin · September 5, 2017, 7:23am

If you’re ongoing changes are bigger than what you upload speed can handle no setup will make it work.

My setup consists of 3 different backups.
1 - HighPrio (work/development projects) - backup every few hours to 2 different OneDrives
2 - LowPrio (Photos and many other mostly static stuff) - backup ever 2-3 days to 2 different OneDrives
3 - High+Low (everything combined) - backup every 2-3 days to local harddiscs

This way, important backups run every few hours (I use delays of prime numbers of hours for all, so they usually don’t run at the same time and with little collisions over time which I prefer over just the very same time of the day) - also, IF I know I did some important changes to any of my datasets, I’ll just manually trigger a backup.

The setup of multiple backups as well as the first initial runs of the bigger (LowPrio) to the cloud did take some time but now each additional run usually only requires a few minutes. Also if some of my backups have to be edited, I may need to edit them multiple times, but if you organize your files on your drives, you mostly don’t have to do this even if you add/remove files/folders (as long as you backup their parent folders).

kenkendk · September 5, 2017, 12:21pm

The Github issue is here:

dgcom · September 5, 2017, 7:09pm

Since this issue has been open for 3 years already, it tells me that proposed configuration is not easy to implement…
And yes - if targeting a general case of “RAID” for destination, it may become very complicated…
I’d rather have something simpler, but working reliably for sending upload to two destinations simultaneously.
Even that may require special handling if two destinations become out of sync or one of them is unavailable during backup…

Other option is to implement smart “copy” command from primary destination to a secondary one which is executed immediately after backup stage, optionally caching files locally.

kenkendk · September 6, 2017, 9:45am

Yes, this would be easy to implement, and could somewhat ignore out-of-sync errors, but what would happen if the upload to the secondary storage fails? Does the backup fail?

dgcom · September 6, 2017, 8:43pm

This depends on implementation, again… If you define all destinations as equivalent, then you’ll need an option to toggle between failure and warning if one of the destination failed.

However, there might be a better option to designate one destination as primary and others as secondary.
In that case backup will fail only if writing to primary destination fails. If secondary fails - it could be only a warning.
However, you may want to count how many backups run while secondary destination was not available - and if count goes over threshold, you fail entire job…

And then it goes to the question on how to recover failed destination - should the job download files from primary and re-upload? Or keep local cache? Or just delete revision with missing files?
The case with local destination being primary is probably the simplest case, since missing chunks are easily available, but targeting S3 may cost sizable re-download…

As I said - even just a copy to another destination is not very simple, but it can be made much more manageable if designed properly.

dgcom · September 7, 2017, 4:16am

Thinking of this a bit further… Primary/Secondary destinations can be called Hot/Cold or Online/Offline for example…
But the idea is the same - hot destination is fast, local, easily accessible storage which can serve as cache for cold destination.
Backup to one or more hot destinations have to be successful for job to be considered successful.
If uploading to cold destination fails (and it is always done after successful upload of the chunk to hot destination), warning is logged and remembered. After specified number of runs where cold storage fails, backup is considered failed even if hot storage is fully consistent.

Hot destination would usually be some local location on usb/share/ssh/webdav/etc…
Cold would be mostly cloud storage with download costs…

This is just an example. I have set up something similar with Duplicati and rclone right now - hot is on usb and rclone syncs to Backblaze B2, checking only file sizes to protect from partial uploads.
I plan to protect local copy further with either ssh or webdav (or, at least, authenticated share).
And rclone check will be run after each upload job with report sent out to make sure there is nothing corrupting local copies.

This also adds flexibility to schedule local backups more often and do upload over night…

-D

drakar2007 · September 7, 2017, 3:18pm

A bit of a departure, but upon thinking about it, it seems to me that we could get a lot of additional functionality out of simply separating “backup sets” and “backup locations”.

Say I have 3 main backup sets: “Personal Documents”, “Pictures”, and “Downloaded Movies”.
“Personal Documents” comprises my “my documents” folder and a carefully-selected and disparate grouping of other files including certain application setup files, cherrypicked items in my downloads folder, and other things that I might want to update or add things to often.
“Pictures” simply points to my “pictures” folder, which is medium-large and grows at a mild pace.
“Downloaded Movies” is huge but low-priority, since most of the stuff in it could be replaced externally (with some effort) in case of catastrophic loss.

I want to back up to 2 targets: B2 and a local USB drive. B2 should get “personal documents” and “pictures”, whereas the USB drive would back up both of those with the addition of “Downloaded Movies”.

Currently if I want to do this backup scheme and keep the shifting items in “personal documents” (and others) synched, it requires doubling the manual labor of whatever additions or filters I do to the B2 backup set.

Now imagine if all 3 of those backup sets were configurable independently with no need to set a backup target. And then when adding a backup target, instead of (or perhaps in addition to) pointing it to specific files on its own, it could be pointed to “Backup Set: Personal Documents”, etc, and all changes made to the backup set would automatically be reflected in the Target itself. I feel that using a method like this could make keeping different backups (superficially) in sync, much much easier.

dgcom · September 7, 2017, 3:33pm

You can sort of achieve this now by maintaining your filters in a text file and then pointing different jobs to it.
No UI, but more flexible.
See help on filter and parameters-file.

drakar2007 · September 7, 2017, 4:24pm

Even with this hint I’m not finding any clue as to where to look or how to do this – where is “help” located? I’ve scoured the UI, the forum, and the main website - maybe I’m missing something. But if the feature is this hidden then surely it’s not doing very many people much good

dgcom · September 7, 2017, 4:32pm

Open command prompt, navigate to the folder, where Duplicati is installed and run:

Duplicati.CommandLine.exe help
Duplicati.CommandLine.exe help parameters-file
Duplicati.CommandLine.exe help filters

and so on…

drakar2007 · September 7, 2017, 5:12pm

Oops, thanks, that’s an approach I wouldn’t have thought of.