[Idea] Backup Duplicati database to avoid recreate

While reading some of the (many, many) postings and bug reports on the problems and performance issues imposed by the requirement of recreating the local database in case of a full disaster recovery, I came up with this idea:

What about saving the local backup database file (.sqlite) to the backup destination?

Of course this would have to be done after the existing backup steps and after all writes to the database are committed and the file is closed. But then it could be compressed, encrypted and stored in the backup destination.

On recovery Duplicati would check for this backup of the database, thus eliminating the need to recreate it.

3 Likes

What I do is schedule a sqlite backup set few minutes after the other backup sets.

This way I backup my data and sqlite as seperate backup jobs.

In case of a disaster…

import all JSON configs
Recreate database of the sqlite backup set
Restore the databases of backup sets first.
Run a repair to confirm the database is in sync
Begin data Restoration

This way you can save days or weeks to rebuild the main database.

I think this leaves your password unencrypted at the backup destination…

In my opinion this is something Duplicati should handle itself properly.

In another discusstion about this topic, it was mentioned that backing up the database after every job is not very practical as the database can grow pretty large, and it has high change rate, meaning that most of it will be repacked and uploaded most of the time.

Also, the design of Duplicati initially was to make the DB (re)creation easy, but practically as the number of blocks increases, the amount of processing required grows more rapidly, and this is why the DB recreation takes so long.

I believe the solution is in some re-engineering of the database so that it can be less prone to corruption and faster to recreate.

Could you clarify what exactly you have in mind, e.g. which password, and how you think it can happen?

Agreed this is a concern if you just uploaded the server database, but an encrypted backup is encrypted.

And it’s filed as a feature request. Quite reasonable to ask, but meanwhile you manage however you can. Aside from the performance overheads mentioned below, this might be a possible near-term workaround.

Did you find the post for that? I recalled that, but couldn’t find it. I’m also curious whether anyone has set things up so the SQLite page size (default 4096) and Duplicati –blocksize (default 102400) line up better. Possibly (any SQLite wizards out there?) that could help reduce the amount of data uploaded each time, and that’s nice to know whether it’s a do-it-yourself database backup, or Duplicati doing backups for you.

1 Like

Hello, SamSirry

You are right, this is a reengineering of the database, there are other backup tools that makes use of Sqlite, they do not go through this problem.

I’ll mention one, Cloudberry makes use of Sqlite, I use two clients, I’ve never had these kinds of problems.

Syncovery, makes use of Firebird, very stable and fast by the way, I do not have these problems.

Duplicati is a great tool, but developers need to look at and work hard on this database issue as a priority, otherwise duplicati will always stay in those endless loop of a stable and reliable tool to work with.

I believe a lot in the tool and the returners.

Anderson

Yes, I have a second backup set configured whose job is to back up ONLY the database from the first backup set. I use the --run-script-after option on the first backup job and point to a batch file. The batch file checks to make sure the previous backup was successful, and if so it triggers the second backup job.

I use this approach on 2 of the 10 computers I back up, the ones with the largest data sets (over 500GB).

Cloudberry doesn’t do deduplication. The back end has files in their native directory structure. It could rebuild all available files and versions just by doing a directory listing on the back end storage. Duplicati with its deduplication requires a much more involved process.

1 Like

Isn’t the password for the backup stored in the database?
It must be stored somewhere for the scheduled runs and I just guessed it might be the local sqlite database as I’m not aware of any other places. But I’d like to stand corrected …

Yes and no. It’s in the small server database Duplicati-server.sqlite (along with other secret-but-needed option values such as remote login credentials). The sometimes-huge per-backup databases are safer. There is a salt and hash of the password there, useful for verifying password use, but it’s relatively safe. Per-backup databases are also the ones that Recreate works on, that sometimes are big and too slow. Names are <semi-random-digits-or-letters>.sqlite and for any backup can be seen in Database screen.

The key claim I dispute is how the password stays unencrypted at the remote if you encrypt the backup. Use a good password, especially if your backup backs up Duplicati-server.sqlite (so that you suffer less pain if you lose a drive, need to recreate jobs, and find that you don’t have job exports saved anywhere).

One other note on database backup is that backing up a database actively in use (e.g. for active job) is risking an inability to read the database (or associated journal file) that’s in use at just the wrong instant. –snapshot-policy (most feasible on Windows) can avoid access issues, but active database will still be obsolete pretty much instantly after backup. Database for a job should be done after backup is finished. Keeping remote files in sync with database that stores information about the remote files is important…

1 Like

I don’t suppose sqlite has a transaction log feature such that a DB snapshot could be done infrequently followed by transaction log backups.

Temporary Files Used By SQLite is where I assume such a transaction log would be, and I don’t see one. Databases are not really in my expertise area though, and what I learn is more for SQLite and Duplicati… Maybe someone with more expertise in backing up large databases can say where this idea would apply.

I did find a report of an SQL Server install that managed to grow a transaction log to where it ate Windows (whereas I think all Duplicati’s files are the well-known ones plus some temporaries that don’t last long…).

While arguably one might say that logs SQLite uses for transactions are transaction logs, they’re not likely candidates for what you’re thinking about, which I suppose I would view as a rather raw differential backup.

sqldiff would be a way to do do-it-yourself differential backups, but I don’t know if it would be better than the backup Duplicati would do after its attempt at deduplication (said by someone, maybe you, to not be really effective, at least with default settings, where a 4KB page change can make a 100KB block get uploaded).

Hi drwtsn32

I understand your technical point of view, but that does not make the slightest sense when you need information and the bank reconfiguration can take days.

Imagine a disaster situation, a business stalled because the database needs five or more days to be rebuilt.

I use duplicati in parallel with other tools believing that the project will ovoluir in the near future.

At this point I do not feel safe using it as the only backup tool.

Maybe you misunderstood… I was just commenting on a major technical difference between Cloudberry and Duplicati. I’m not trying to say the database rebuild process in Duplicati shouldn’t be improved. It definitely should be!

In the meantime there are some things that can be done to help mitigate this risk, but I get if you and others aren’t comfortable with those mitigations.

1 Like

@drwtsn32, could you post the batch file you used with --run-script-after to back up the database? I’d like to get that set up on my machine and my DOS scripting skills are limited…

Sure, here’s the batch file I run. I am using the excellent duplicati-client command line program to trigger the database backup. You’ll need to adjust the number for your setup. This batch file is configured to only run after a successful backup operation, as you can see from the first two tests:

if /i "%DUPLICATI__OPERATIONNAME%" neq "Backup" goto end
if /i "%DUPLICATI__PARSED_RESULT%" neq "Success" goto end

%LOCALAPPDATA%\Duplicati\duplicati_client.exe login
%LOCALAPPDATA%\Duplicati\duplicati_client.exe run 6
%LOCALAPPDATA%\Duplicati\duplicati_client.exe logout

:end
2 Likes

At first glance, I thought that several answers are somehow aside the point. Then I realized that my viewpoint was more specific than the original question. I already had created a separate job that backs up only the local Duplicati databases, but the original question seems to suggest the databases would be included in other jobs.

I came here to see if someone already has presented the idea of Duplicati providing a template of a separate backup job for its local databases. It really needs to be a template, not a ready-defined job, because there are many details each user may want to tailor to their own needs, such as which particular target destination and which schedule would the job have.

Once we limit the focus on such a separate backup job, many arguments here become moot. For one, the argument about the size of the local databases being too big is moot, because you only ever need one version of each in the destination. Also, since Duplicati does not run the jobs in parallel, the only local database that is being active during this separate job’s run is its own database, which need not be included the set anyway. When a disaster happens, and the local databases need to be restored from this set, you do not have access to any database anyway, so you have to let Duplicati rebuild the database for this separate job first. And then, if Duplicati provides the template, you can set it up easily again.

So how about it, could you consider providing this kind of a template?