Backup job "Meta"

Each of us who have created a separate backup job to back up the local databases, and only them, already know how simple it is to define that job. There are some aspects in the setup, however, that might need to get locked into Duplicati itself to ensure the integrity of the backup set. Yet, there is no urgency with such a new feature.

This seems like an ideal topic to try a crowdsourcing approach: how about sharing snippets of our specific solutions, evaluating together their weak points, and thereby gradually working towards a setup that eventually may find its way into Duplicati?

I’ll begin the evaluation by making a few notes on the merits of having a special backup job for the purpose.

Recently, the internal disk on my Windows laptop broke down and I had to run the restore operation of my backup sets without access to any of the local databases. Until then I had not bothered to back up the local databases at all, and as I set out to perform the lengthy restore operation, I realized that even if I had included the databases in some backup job, I would not have had access to any one of the databases anyway, not until I had done at least one restore. Therefore, if I want include the local databases in a backup job, I will want to be able to restore that particular backup set quickly, which means that the job has to include only the local databases, and nothing else. The local database of this metajob itself need not and should not be included, as it is the one being active while the job runs.

If I can somehow ensure that the databases are backed up after succesful backups only, keeping just one version of the backup set is quite enough. I just can’t see any reason to go further back into the history of these databases. If I have not done anything else to ensure the integrity, however, then keeping a few more versions obviously is safer than just one. In any case, I expect a small number of versions to also speed up the restore operation in a disaster situation.

Thanks to easy-to-use Duplicati, the restoration I needed was succesful, thank you. I have since created a metajob on both of my laptops (Windows and Ubuntu), following the principles above. Next, I could export my job definition as a command and/or json file (with some degree of anonymization) to share here. And then someone else could share their different approaches to ensuring that only consistent databases are backed up, and so on.

By sharing some simple, even trivial, templates to each other, we can invite more viewpoints and refine whatever needs refining, until at some point down the rosy lane, maybe we could write a polished proposal on what kind of settings such a metajob needs a user to fill in, and Duplicati to support, and why it would be better to integrate the feature into Duplicati instead of us just sharing templates.

Here is the command line from my Ubuntu machine (with username modified and without password):

mono /usr/lib/duplicati/Duplicati.CommandLine.exe backup file:///media/USER/Duplicati/meta/ /home/USER/.config/Duplicati/ --backup-name=meta --dbpath=/home/USER/.config/Duplicati/meta.sqlite --encryption-module=aes --compression-module=zip --dblock-size=50mb --keep-versions=1 --disable-module=console-password-input --exclude="/home/USER/.config/Duplicati/control_dir_v2/" --exclude="/home/USER/.config/Duplicati/updates/" --exclude="/home/USER/.config/Duplicati/meta.sqlite" --exclude="/home/USER/.config/Duplicati/*.sqlite-journal"

Surely this is no suprise to anyone who has already done it. The job on my Windows machine only differs from this one by having less excludes; on Windows, the local folder does not have the updates subdirectory, nor does it have any journal files during Duplicati runs.

Any comments?

My approach is to define and configure the database backup job in the Duplicati Web UI just like a normal backup job. The database backup job is NOT automatically scheduled, and I set retention to 10 versions (overkill since I’d most likely restore the most recent database backup, but the increase in versions was just as an added precaution…)

On the primary backup job, I configure the --run-script-after option to launch a batch file. The batch file triggers the database backup job ONLY after a successful backup operation. Instead of the standard Duplicati.CommandLine.exe tool, I use the excellent duplicati-client program to trigger the database backup job as if I started it manually using the WebUI. This lets me see the database backup job running in the Web UI and tray icon, and also updates the backup job stats in the Web UI. (The standard command line tool doesn’t integrate with the Web UI at all - which can be a good thing or a bad thing depending on need.)

Pic of the Web UI on one of my systems showing both backups:
Capture

Here are the contents of my --run-script-after batch file (Windows system):

pushd "%~dp0"

if /i "%DUPLICATI__OPERATIONNAME%" neq "Backup" goto end
if /i "%DUPLICATI__PARSED_RESULT%" neq "Success" goto end

.\duplicati_client.exe login
.\duplicati_client.exe run 2
.\duplicati_client.exe logout

:end

popd

Here’s what the script looks like on my Linux systems:

#!/bin/sh

pushd /volume1/_Duplicati

if [[ "$DUPLICATI__OPERATIONNAME" == "Backup" && "$DUPLICATI__PARSED_RESULT" == "Success" ]] ; then
  ./duplicati_client login
  ./duplicati_client run 4
  ./duplicati_client logout
fi

popd

I have every backup job (including database backup jobs) configured to email me on Warning/Error/Fatal, but I also use the wonderful Duplicati Monitoring web site. Every backup job reports status there and I get a nice single daily report. Also I’ll get alerted if a backup job hasn’t run at all within the past X days. (Duplicati’s built-in email alerting of course does nothing if a job doesn’t run at all…)

Than you. This is just what I had in mind.

I also define all of my backup jobs in the web gui, and originally I planned to export the metajob as a json file, but then realised that it has details such as scheduling, which would be difficult to share as such. So I thought the exported command line is easier to share.

You have one single primary job. On Windows, I have two, and on Ubuntu, three primary jobs (one for home directory, another for files and folders that only hold non-private stuff such as some particular download files, and the third comes up in my next comment). So my approach to timing is to schedule them all to start early in the morning, when both of my laptops are in sleep or hibernate state, and then let them run while I have my breakfast. Only after that, I start opening up applications that should be closed down during the backup. In this kind of setup, I have the metajob start one minute after the last primary job, which in practice means that it runs as the last job every morning.

Ah, I had not understood the role of the duplicati-client yet. Thank you for pointing it out. I have one backup that I want to run every time I close the encrypted file(s) in its backup set. For now, I have used the duplicati-cli command. I really need to RTFM some more.

I wish the export page in web gui would give more options as to which type of command line to export; that would be an excellent way to educate the users.

Oh, I see. It’s not included, but a separate app. Thank you again for pointing it out.

Let’s take a look at which aspects of our two solutions are different and how these differences must be accounted for in a solution for all.

You have only one primary backup job, and therefore, the --run-script-after option works fine for you. I have multiple primary jobs, and a general solution cannot be based on knowing their schedules. If the metajob is triggered right after any successful backup, it may get queued after another job and that other job may fail. My knee-jerk reaction to this is to have each primary job set a flag (in --run-script-before) to signal that a job is running, and after success remove the flag (in --run-script-after). This way the metajob can check (in --run-script-before) that there is no single flag left. Since Duplicati does not run jobs in parallel, this would work in all cases, whether the metajob is scheduled or triggered by some or all of the primary jobs. Personally, I prefer scheduling. The whole flagging and removal of flags would be the stuff that Duplicati could handle under the hood.

Emailing backup results is something I definitely do not want, but obviously that need not be a part of the metajob setup as such.

Duplicati cannot come with a predefined metajob that works out-of-the-box. The user has to make some choices, such as whether to have a metajob at all, and password, obviously, and scheduling, unless the metajob is triggered by all primary jobs. Target location is also something Duplicati cannot decide for the user. If there is only one target server or local disk already in use, Duplicati may have a chance of proposing a default, but if there are multiple targets in use, there is none.

Did I miss something?

Of course I missed something: the whole restore side!

If Duplicati is aware of the metajob, it can propose restoring it first right after the user has pointed out the target location.

Not sure I follow… why would that job failure affect the queued meta job? Wouldn’t the meta job then start normally after the failure and back up the database of the earlier, successful backup job?

You are right that my approach gets less convenient the more primary backup jobs there are. And honestly I can’t shake the idea that this entire meta backup is just a kludge to get work around the problem where database recreates sometimes take too long. If that problem is solved perhaps meta backups won’t be needed at all.

Honestly I’d rather development time be spent fixing the root cause instead of trying to make Duplicati do meta backups automatically out-of-the-box.

Interesting thought though…

I thought it was obvious the idea is to avoid backing up databases after failure, and only back up if backups have succeeded. If the feature is part of Duplicati, it would not need any before and after scripts.

Set1 succeeds and the run-script-after triggers Set1Metadata backup. But Set1Metadata backup gets queued due to Set2 job already pending. Set2 job runs and fails, so run-script-after never triggers Set2Metadata backup. Set1Metadata backup then starts to run since it was next in the job queue.

I’m not seeing the problem, am I misunderstanding? Set2 backup failing shouldn’t stop Set1Metadata backup from proceeding.

Duplicati does not back up individual files, but whole backup sets. Do you really want to define separate data set for each database? Do you think Duplicati team would define separate sets for this feature?

Ok I see what you mean. I don’t mind keeping a Db backup set for each of my main backup sets, but I only define 1 or maybe 2 main backup sets. Others may define a lot more. Also, I don’t do a separate Db backup unless the dataset is quite large (at least a few hundred gigs). For my smaller machines the Db recreate process is fine and doesn’t take too long.

Ultimately my preference is to not need separate Db backups at all!

If there are two jobs whose only difference is that one backs up to local disk, the other to cloud, each backup can include the other job’s local database, and there is no need separate metajob. However, a general solution cannot expect this to always be possible. If there is to be a feature for backing up and restoring local databases, it has to allow for more variance.

Edit: Including local databases in primary backup jobs, though, has the disadvantage that way too many versions (and their metadata) of the databases are kept in store.

In addition, in the case where you lose the entire local drive you won’t be able to rebuild the database quickly. The nice thing about a separate, database-only backup with a short retention is that it’s quick to do a database recreate.

You seem to think that backing up to a local disk means backing up to the only local disk, which you cannot assume. I have a local, but external disk to which I make the backups (plus I prefer to have the whole disk or partition encrypted instead of encrypting individual backup jobs).

But yeah, basically, if the databases are in a primary set, restore is slow not only because of the the extra versions that are kept, but also because of the other content, which most probably is bigger. Plus all the metadata of that large content.

You’re right, that phrasing was ambiguous. What I meant by “lose the entire local drive” I meant where the databases are stored by default. Wasn’t really talking about any particular backup destination or if you had other drives local to the system.

If you lose the location that contains the databases, you have to either recreate (lengthy process for large backup sets) or restore the database from a secondary backup… (as you are well aware)