DB recreation takes forever

Hi all…

So after a DB problem (Duplicati say that’s a problem with a file and the repair doesn’t resolve) I used the delete and recreate options…
the DB was 9 GB (5TB of data) and now is running from more than 2 days and isn’t half completion…

for future, how can I make a backup of a DB after a job completition?

1 Like

You could use the --run-script-after (or --run-script-before command to run a script which copies %DUPLICATI__dbpath% to %DUPLICATI_dbpath%.old (or something like that)…

Can you post a complete script example?

Well, it varies depending on things like what OS you’re using but for Windows you could do the following:

  1. Put this in a file (we’ll call it C:\Duplicati-backupDB.bat)
@echo off
copy "%DUPLICATI__dbpath%" "%DUPLICATI__dbpath%.old"
  1. Select the run-script-after entry from the “Add advanced option” selector on step 5 (Options) of editing your backup job

  2. Put C:\Duplicati-backupDB.bat in the run-script-after field

  3. Click the “Save” button

Now whenever the job runs it will copy your current (just finished being updated) .sqlite file to a “.old” file in the same folder.

Of course you can do other things in the batch file such as:

  • check the “success” variable before copying the DB (if you care)
  • copy the DB somewhere else (like to a USB drive if you’re backing up to one)
  • compress the DB (they can get big so having multiple copies could get “painful”) - note that this could cause your backup job to appear to run longer due to the time spend compressing
  • etc.

Thx…when db recreation will finish I’ll try your bat!

Good luck - please let me know how it works out for you!

as an alternative, what you think about create a duplicati job to backup on google drive all the db?

Yep, that works too. I think I saw another user talking about setting up a 2nd backup job with the sole purpose of backing up the 1st job’s database file.

If you’re storing in the cloud, that makes sense as the de-duplication will cut down on data transfer. The drawback is that if you end up needing to restore the 1st job you’ll first have to restore the 2nd job to get the DB to then use on the 1st job.

It’s not a big deal - having a small backup of just the database file(s) should restore very quickly, but it’s another step to keep in mind in case of disaster recovery.

You got the point.
I use duplicati for a 12 tb backup on google drive (4 jobs)
Adding a 5th job with only the db file make me secure with versioning…

How about the recreation speed? Why is taking so long?

Is about my 4tb film collection…and the db was approximately 8 GB…

I haven’t look specifically at the database recreate process but my guess is there’s just a lot of database processing going on. Remember, this process isn’t just restoring a file from a remote destination - it’s reading through every index file on the destination and adding the entire history of every file contents in 100k (by default) blocks of data.

With a 12TB data source chopped into 100k blocks that’s 128.8 million block records being recreated in the local database for the initial backup. Then add to that however many blocks changed with each new file or historical version of an existing file.

Once some of the more commonly used functionality has been updated and finalized (like the pending improvement in restore browsing speed) I’m hoping time can be found to review the database recreate process for performance (as well as reporting) improvements.

1 Like

Another question…if the recreation jobs stops at half I have to restart from the beginning?

Hmm…that I’m not sure about.

My guess is that one a recreate is started a database will have been created at which point a Repair should be enough to continue from any interruption point, but @kenkendk would know better.

Edit: Note that my guess is WRONG and it appears once a “The database was attempted repaired, but the repair did not complete.” message is received a Recreate action is necessary to recover.

so we wait a answer from @kenkendk!

uppp @kenkendk can you give me a feedback?

Is this variable available in the current beta release Is there a full list of variables documented somewhere?

@Bazzu85, have a little patience - kenkendk generally only has time to hit the forums once or twice a week.

@drwtsn32, I don’t know of any list but pretty much any parameter is exposed with a DUPLICATI__ prefix (yes, two ending underscores) and any “-” replaced with “_”.

This post might help:

1 Like

I ended up putting a ‘set > filename’ in my post-backup script so I could see all the values. here they are in case others are curious. (Some values have been redacted.) This was from version

DUPLICATI__send_mail_subject=Duplicati %OPERATIONNAME% report for xxx-%backup-name%: %PARSEDRESULT%
1 Like

Thanks for that list!

However, I should note that the list generated from set won’t include ones with no value, so what you got is probably everything you have values for in the particular job on which you ran the set command.

So there are a lot of optional/advanced/less-commonly-used parameters not represented in the above list.

Good to know! There were a few listed that had no values set (throttle up/down for instance).