[Idea] Backup Duplicati database to avoid recreate

fichtennadel · April 16, 2019, 8:06am

While reading some of the (many, many) postings and bug reports on the problems and performance issues imposed by the requirement of recreating the local database in case of a full disaster recovery, I came up with this idea:

What about saving the local backup database file (.sqlite) to the backup destination?

Of course this would have to be done after the existing backup steps and after all writes to the database are committed and the file is closed. But then it could be compressed, encrypted and stored in the backup destination.

On recovery Duplicati would check for this backup of the database, thus eliminating the need to recreate it.

saviodsouza · April 20, 2019, 5:06am

What I do is schedule a sqlite backup set few minutes after the other backup sets.

This way I backup my data and sqlite as seperate backup jobs.

In case of a disaster…

import all JSON configs
Recreate database of the sqlite backup set
Restore the databases of backup sets first.
Run a repair to confirm the database is in sync
Begin data Restoration

This way you can save days or weeks to rebuild the main database.

fichtennadel · April 20, 2019, 8:25am

I think this leaves your password unencrypted at the backup destination…

In my opinion this is something Duplicati should handle itself properly.

SamSirry · April 20, 2019, 1:16pm

In another discusstion about this topic, it was mentioned that backing up the database after every job is not very practical as the database can grow pretty large, and it has high change rate, meaning that most of it will be repacked and uploaded most of the time.

Also, the design of Duplicati initially was to make the DB (re)creation easy, but practically as the number of blocks increases, the amount of processing required grows more rapidly, and this is why the DB recreation takes so long.

I believe the solution is in some re-engineering of the database so that it can be less prone to corruption and faster to recreate.

ts678 · April 20, 2019, 5:26pm

Could you clarify what exactly you have in mind, e.g. which password, and how you think it can happen?

Agreed this is a concern if you just uploaded the server database, but an encrypted backup is encrypted.

And it’s filed as a feature request. Quite reasonable to ask, but meanwhile you manage however you can. Aside from the performance overheads mentioned below, this might be a possible near-term workaround.

Did you find the post for that? I recalled that, but couldn’t find it. I’m also curious whether anyone has set things up so the SQLite page size (default 4096) and Duplicati –blocksize (default 102400) line up better. Possibly (any SQLite wizards out there?) that could help reduce the amount of data uploaded each time, and that’s nice to know whether it’s a do-it-yourself database backup, or Duplicati doing backups for you.

Anderson_Marcelo · April 21, 2019, 1:37am

Hello, SamSirry

You are right, this is a reengineering of the database, there are other backup tools that makes use of Sqlite, they do not go through this problem.

I’ll mention one, Cloudberry makes use of Sqlite, I use two clients, I’ve never had these kinds of problems.

Syncovery, makes use of Firebird, very stable and fast by the way, I do not have these problems.

Duplicati is a great tool, but developers need to look at and work hard on this database issue as a priority, otherwise duplicati will always stay in those endless loop of a stable and reliable tool to work with.

I believe a lot in the tool and the returners.

Anderson

drwtsn32 · April 21, 2019, 2:43pm

Yes, I have a second backup set configured whose job is to back up ONLY the database from the first backup set. I use the --run-script-after option on the first backup job and point to a batch file. The batch file checks to make sure the previous backup was successful, and if so it triggers the second backup job.

I use this approach on 2 of the 10 computers I back up, the ones with the largest data sets (over 500GB).

drwtsn32 · April 21, 2019, 2:45pm

Cloudberry doesn’t do deduplication. The back end has files in their native directory structure. It could rebuild all available files and versions just by doing a directory listing on the back end storage. Duplicati with its deduplication requires a much more involved process.

fichtennadel · April 22, 2019, 7:44am

Isn’t the password for the backup stored in the database?
It must be stored somewhere for the scheduled runs and I just guessed it might be the local sqlite database as I’m not aware of any other places. But I’d like to stand corrected …

ts678 · April 22, 2019, 12:21pm

Yes and no. It’s in the small server database Duplicati-server.sqlite (along with other secret-but-needed option values such as remote login credentials). The sometimes-huge per-backup databases are safer. There is a salt and hash of the password there, useful for verifying password use, but it’s relatively safe. Per-backup databases are also the ones that Recreate works on, that sometimes are big and too slow. Names are <semi-random-digits-or-letters>.sqlite and for any backup can be seen in Database screen.

The key claim I dispute is how the password stays unencrypted at the remote if you encrypt the backup. Use a good password, especially if your backup backs up Duplicati-server.sqlite (so that you suffer less pain if you lose a drive, need to recreate jobs, and find that you don’t have job exports saved anywhere).

One other note on database backup is that backing up a database actively in use (e.g. for active job) is risking an inability to read the database (or associated journal file) that’s in use at just the wrong instant. –snapshot-policy (most feasible on Windows) can avoid access issues, but active database will still be obsolete pretty much instantly after backup. Database for a job should be done after backup is finished. Keeping remote files in sync with database that stores information about the remote files is important…

JonMikelV · April 22, 2019, 12:58pm

I don’t suppose sqlite has a transaction log feature such that a DB snapshot could be done infrequently followed by transaction log backups.

ts678 · April 22, 2019, 2:14pm

Temporary Files Used By SQLite is where I assume such a transaction log would be, and I don’t see one. Databases are not really in my expertise area though, and what I learn is more for SQLite and Duplicati… Maybe someone with more expertise in backing up large databases can say where this idea would apply.

I did find a report of an SQL Server install that managed to grow a transaction log to where it ate Windows (whereas I think all Duplicati’s files are the well-known ones plus some temporaries that don’t last long…).

While arguably one might say that logs SQLite uses for transactions are transaction logs, they’re not likely candidates for what you’re thinking about, which I suppose I would view as a rather raw differential backup.

sqldiff would be a way to do do-it-yourself differential backups, but I don’t know if it would be better than the backup Duplicati would do after its attempt at deduplication (said by someone, maybe you, to not be really effective, at least with default settings, where a 4KB page change can make a 100KB block get uploaded).

Anderson_Marcelo · April 23, 2019, 4:08pm

Hi drwtsn32

I understand your technical point of view, but that does not make the slightest sense when you need information and the bank reconfiguration can take days.

Imagine a disaster situation, a business stalled because the database needs five or more days to be rebuilt.

I use duplicati in parallel with other tools believing that the project will ovoluir in the near future.

At this point I do not feel safe using it as the only backup tool.

drwtsn32 · April 23, 2019, 6:38pm

Maybe you misunderstood… I was just commenting on a major technical difference between Cloudberry and Duplicati. I’m not trying to say the database rebuild process in Duplicati shouldn’t be improved. It definitely should be!

In the meantime there are some things that can be done to help mitigate this risk, but I get if you and others aren’t comfortable with those mitigations.

dbrunner · June 2, 2019, 7:36pm

@drwtsn32, could you post the batch file you used with --run-script-after to back up the database? I’d like to get that set up on my machine and my DOS scripting skills are limited…

drwtsn32 · June 2, 2019, 10:56pm

Sure, here’s the batch file I run. I am using the excellent duplicati-client command line program to trigger the database backup. You’ll need to adjust the number for your setup. This batch file is configured to only run after a successful backup operation, as you can see from the first two tests:

if /i "%DUPLICATI__OPERATIONNAME%" neq "Backup" goto end
if /i "%DUPLICATI__PARSED_RESULT%" neq "Success" goto end

%LOCALAPPDATA%\Duplicati\duplicati_client.exe login
%LOCALAPPDATA%\Duplicati\duplicati_client.exe run 6
%LOCALAPPDATA%\Duplicati\duplicati_client.exe logout

:end

AimoE · July 16, 2019, 11:02am

At first glance, I thought that several answers are somehow aside the point. Then I realized that my viewpoint was more specific than the original question. I already had created a separate job that backs up only the local Duplicati databases, but the original question seems to suggest the databases would be included in other jobs.

I came here to see if someone already has presented the idea of Duplicati providing a template of a separate backup job for its local databases. It really needs to be a template, not a ready-defined job, because there are many details each user may want to tailor to their own needs, such as which particular target destination and which schedule would the job have.

Once we limit the focus on such a separate backup job, many arguments here become moot. For one, the argument about the size of the local databases being too big is moot, because you only ever need one version of each in the destination. Also, since Duplicati does not run the jobs in parallel, the only local database that is being active during this separate job’s run is its own database, which need not be included the set anyway. When a disaster happens, and the local databases need to be restored from this set, you do not have access to any database anyway, so you have to let Duplicati rebuild the database for this separate job first. And then, if Duplicati provides the template, you can set it up easily again.

So how about it, could you consider providing this kind of a template?

tacioandrade · February 18, 2021, 11:51pm

People would create a new post with EXACTLY the same content, when I started typing I saw this post and thought it was better to revive it than to create a new one.

I have been working with Duplicati for a few years now and I like it a lot, so much so that it is among the best backup tools I have ever used, mainly due to its simplicity in maintaining it and the always incremental backup option (you don’t need to do a new full every 1 week for example ), which helps a lot for cloud backups.

However I had some problems in the last times with the computer where Duplicati was installed to be lost due to hardware failure, as well as ransomware and the re-creation of the databases take a long time, in one case it took more than 1 day, since there was a lot of data in the cloud.

So I wanted to suggest to the developers who knows an additional parameter, just as we have several such as the bank’s auto vacum and things like that, to send a backup of the bank to the destination.

Even if this functionality is not used by everyone, in some cases it may be better to send 500MB to more than one database than to spend several hours recreating it.

I saw here some options how to backup this database before the main file, but I was not able to understand as well how the restoration process works, so I understood I have to import the backup file, start the process of recreating the bank, stop , download the old sqlite and replace it manually, that left me a little lost.

ts678 · February 20, 2021, 3:28pm

What’s the 500 MB in the example? If it’s the entire database, you wouldn’t send it to a database, but maybe you’d send it to a remote destination for safeguard. Assuming that was the idea, the question

would be to restore the database from its backup which would be a secondary done after the main job, because a backup can’t backup its own database because that database is still changing at the time…

If you’ve lost the whole original Duplicati, you’d probably bootstrap getting back by either setting up the secondary job again, or just doing Direct restore from backup files assuming you have necessary info.

Probably the usual way to reinstall a database is to use Database management screen to let you know where database belongs, and either put restored database there, or tell Duplicati the new database file.

This is space efficient but can slow down the restore of a file that undergoes constant change, because different generations of changes may wind up in different backup files, all of which might need download.

A database is a great example of a potentially big file that undergoes constant change, scattered around, making deduplication less effective. People who have tried backup have found they upload just about the database size every backup. This plus more reliable 2.0.5.1 Recreate, makes DB backup less attractive.

If you really want to try, I’d suggest using a low Backup retention to keep the collecting-the-parts issue somewhat under control, but you’ll still endure frequent automatic compact because of high churn rate…

Given a suitably large blocksize (100 KiB is too small for big backup to be fast), ideally only the dlist and dindex files would download. If you get dblock files, that’s a bad sign. Before 2.0.5.1 it was too common.

If the progress bar gets past 70%, it’s downloading dblock files. After 90%, it’s downloading all the rest…
About → Show log → Live → Verbose will also show you what you’re downloading, and how far you are.

Doing an occasional test of DB Recreate is a good idea to make sure it’s healthy when you really need it. You can copy off the old database for safety, and your new database will also be a lot smaller than it was.

Why not backung up latest Database? talks about an exotic DB backup method that eliminates standard Duplicati processing (which, as noted, doesn’t add much for DB backup, and might even make it worse).

I’m not sure how solid the control file code is, but you could pioneer its use if you want to see how it does. You could keep the usual number of versions of primary backup, and maybe just two of database backup because super-stale databases are nearly useless. As dlist files obsolete, they’re deleted not compacted.

jabuzzard · March 8, 2021, 11:27pm

Hum, I don’t think so. Let me look

Protect: TSMA>BACKUP DB DEVCLASS=db TYPE=full SCRATCH=yes COMPRESS=yes WAIT=no
ANR2280I Full database backup started as process 344.
ANS8003I Process number 344 started.

I can backup the database on my Spectrum Protect (nee TSM) server while the server is running. One has been able to do that since forever

Of course TSM is an expensive piece of high end backup software that has been around for decades. However the point is that it is possible for backup software that uses a database as TSM demonstrates, to back it’s own database up. Before anyone says well TSM is using IBM’s DB2 for it’s database you could back the database up back in pre 6.0 versions when TSM was using IBM’s equivalent of Microsoft JET database engine for the database.