Complex backup sequence?

Lex-DRL · December 9, 2019, 2:51pm

I’m sorry if this question has already been asked, I couldn’t find any similar topic.

Is it possible to set up a complex backup sequence, with dependencies?
I mean, a set of backed-up directories, each with their own set of filters, that need to be performed in a specific order. E.g.:

A
.|
B
.|
C . . D
.\ . . /
. .\ ./
. . E

dir A is backed-up first and only then dir B is backed-up, then dir C.
In parallel to that sequence, dir D (located on another drive) is archived, and finally, only after all of them, dir E is processed.

I know I can set up multiple back-ups, but I have no control in which order they’re executed.
OR I can set up a single backup with all those directories included and all the filters in a single huge bunch, but, again, I can’t control in which order they’re executed or make some folders to be processed in parallel.
The best I can do is assume how long would it take for each backup to finish, and set up their start time accordingly. But if there are a bunch of folders, each of which is variable in size (even if they have about the same size in total), I need to plan it for the worst-case scenario for each folder. Say, each folder takes somewhere from 10 minutes to 4h to back up. If I have 6 of those, the backups are performed for the whole day and PC is always “could be busy”, even if the actual total time is somewhere about 6h.

I guess, it’s a feature request.
Or, maybe, this feature is already implemented? If so, could you point me to the doc page on how to accomplish that?

drwtsn32 · December 9, 2019, 7:22pm

Welcome to the forum!

The short answer is no, you cannot define complex sequences and dependencies like this. I’m curious about your use-case and why this would be important.

I understand that you’d need to know how long the backup job will take, but keep in mind that Duplicati is a type of “incremental forever” backup. The first backup will scan and process all files, but subsequent backups will only scan new/changed files. For changed files, only changed blocks within those files are backed up again. So you may find that your second backup and beyond are quite fast.

If you have concerns about when exactly files will be backed up in relation to each other, a possible mitigation would be to use filesystem snapshots (like VSS on Windows). When enabled, a snapshot is taken at the start of the backup and all files will be backed up exactly as they were at that moment in time. You don’t have to worry about files being modified during the backup window.

Lex-DRL · December 10, 2019, 12:15am

Thanks for your reply.
I’ve already had some experience with Duplicati at my work (a few months) and have read the docs, so yes, I have an idea about the way deduplication works in Duplicati.

About my use-case:
Tl;DR: many drives, many folders, unpredictable amounts of daily changes, different change rates for different folders (up to hundreds of gigabytes per day).

More detailed description:
I’ve decided to set up Duplicati on my home PC, too.
But since it’s a home PC and I also use it for work, it has an unpredictable amount of changes I may have on my HDDs (yeah, plural) each day. And since my work is related to CG, multi-gigabyte files is also a common thing. Some of them are a pure cache that can be safely ignored, but some - are intermediate files that should stay there until a task is finished. Or it will take a lot of time to re-render them. So it’s a good idea to back them up daily, too, but simply keep just a few backups of those.
So, in other words, one day I may have a folder with hundreds of gigabytes of changes. Another day - this folder is pretty much the same, but the other one has similar amount of new/changed data. And there are about 10 of those folders, which I’d like to have individual backups for. And also, they’re spread over 6 HDDs, so it’s good idea to start some backups in parallel to others while starting other ones sequentially. (say, folders A, B and C from the above example are located on one drive while D is on another)

Of course, I can put them all into a single directory (with softlinks) and set up a single mega-, all-folders-including backup job, but this way filters list would turn into an unmanageable mess. And I won’t be able to specify kept backups number individually for each folder. A much better solution would be having some way to specify separate backup jobs, but with a single job-starting manager.

But I guess the best I can do right now is stick with the “assume the max duration” approach and specify a backup cycle that’s longer then a day.

warwickmm · December 10, 2019, 3:10am

Would a custom script using the Duplicati command-line interface work?

Lex-DRL · December 10, 2019, 3:51am

Hm… Haven’t thought of it this way. Indeed, it should.
Not as flexible as native setup in GUI, but it should do the job.
Thanks for the advise, I’ll look into duplicati’s CLI.

ts678 · December 10, 2019, 2:17pm

One way to start is with an Export As Command-line from GUI until you get the hang of what’s required. CLI jobs run independently, which can get you parallelism. The GUI serializes jobs, and that makes me wonder why the original post works so hard to make sure jobs of variable time don’t step on each other.

Do you have an upload link that’s a lot faster than your drives? If not, perhaps omit parallelism and just make a load of jobs with settings for the various categories of files, and let them loose to run until done. Folders without changes will be basically instant (timestamp didn’t change = no need to even scan file).

As a side note, your backup sounds large, so Choosing sizes in Duplicati might be worth reading, since Duplicati can slow down when trying to keep track of big amounts of tiny blocks. Default block is 100KB.

Lex-DRL · December 11, 2019, 9:25pm

The backup server is in my local network, with gigabit connection… It’s not as fast as HDD itself, but I believe it’s enough considering all the archivation work Duplicati needs to do before uploading a file over sftp. And I also installed another Duplicati instance on the server itself (linux), to back up it’s own data.
I might be wrong, but from my research Duplicati seems a much better solution than outdated (discontinued, afaik) Deja Dup, as well as much more flexible then rsync-based GUI tools like TimeShift (they’re mostly for system snapshots). So, Duplicati on server, too.
That’s also the reason to run backups in parallel, but I guess setting up multiple instances of Duplicati to act in sync is a whole different topic I gonna scratch only after I set up jobs on my main PC.

Also, I’m in Russia and internet here is extremely cheap in comparison to US prices. 500 Mbit upload link is a regular speed here. So I guess, even if I decide to backup to a cloud storage, parallel backups would still be a thing for me.

Thanks for the links, man! I’ve already read those when I was configuring our office server, but I need to refresh that docs in my memory anyway.