Database recreate slowness, Duplicati 2.0.6.3_beta_2021-06-17 status?

Hi,

The complete fool I am, I tried to get out of an issue by hitting “delete and recreate database” button. This was about 24 hours ago. The latest update I got from the log is that I’m at Pass 3 of 3, processing blocklist volume 170 of 2019. If this is a linear rate process, it will finish in about two weeks… if my vpn connection doesn’t get interrupted (spoiler alert: it inevitably will).

PS: what had happened: a backup was interrupted by an operating system shutdown (a normal one, so duplicati should have had a clean shutdown request from systemd, all filesystems where unmounted properly). But when the next backup ran, I got into the “unexpected difference in fileset” error. So I deleted the version mentioned in the error. And again, and again. After deleting 3 versions, and seeing no progress, I wondered; well, if the issue is that the local database contains different information than on the remote storage, maybe I should just recreate the information that is in the local db with what is on the remote storage, rather than just keep deleting backups entirely (note, it were not the latest versions from the past days, but versions from months ago or even last year).

When searching the forum I see several posts over the past few year of people having the same question, why so slow?
To answer a recurring question:
Yes, this backup set is already > 2 years old, so despite running 2.0.6.3_beta_2021-06-17 now, it may have been upgraded several times by now, at least versions 2.0.4.23 and 2.0.6.1 have also been used on this system.
Other quantitative info:

  • source backup volume is about 350GB, remote backup storage around 550GB
  • sqlite file is around 3GB
  • on the remote storage: 14 dlist files, 2063 dblock files and 2076 dindex files; dblock files are ~250MB each, dlist 65~80MB, dindex most <200KB, but some up to 1.x or even 30 MB

Why posting a new question:

  1. As stated before, at some point I will not be there to reconnect my vpn in time, a download from remote storage will fail and the recovery process will most likely do the same.
  2. I prefer not having to wait several weeks before I can run a backup again; at this stage I don’t even know if this backup set is recoverable at all (a backup got interrupted, I didn’t try the delete and recreate database just for fun)
  3. In the thread Very slow database recreation there’s mention of some different behavior in the then canary release; so maybe there’s interesting change I need to know about. Specifically, older versions of duplicati could have introduced some error or at least inefficiency in the data stored; are there tools to clean up this situation without losing the backup versions (and without having a local database).
  4. discuss other options:
  • I have 2 backups of this machine, so the database that is restoring now, I have a week-old version of it in an offline backup; since then a couple new backups ran on this remote storage… so does restoring this out-dated version of the sqlite db help anything? Or will it just throw some other inconsistency error in my face and fail anyway?
  • There’s RecoveryTool, it can recreate indexes, but would that help at all?

When you do a database delete+recreate, Duplicati should only need to download dlist and dindex files from the remote storage. The operation usually doesn’t take very long but it depends on how many of those files exist, how large they are, etc.

If it’s downloading dblocks, it’s because Duplicati found something inconsistent about the dlist or dindex files. It could be just a few dblocks are needed or in a worst case all the dblocks. Unfortunately from my experience it is not a “linear rate” - each dblock takes a little longer to process than the one before. As such your two week estimate is probably not accurate.

My best understanding of the root issue is that some older versions of Duplicati sometimes incorrectly wrote dindex files to back end storage. Simply upgrading Duplicati isn’t enough because that won’t fix the dindex files that were already written.

If you have an in-tact database, you can fix this pretty easily (after upgrading Duplicati) by deleting all the remote dindex files and letting Duplicati rewrite them. It will use information in the local database to write new files.

I have done this over a year ago on at least a dozen backups and then tested database recreate and it completed quite rapidly without needing to download any dblocks. Recently I retested database recreate and it still completed without downloading dblocks, so I’m confident that Duplicati (for at least a year now) hasn’t had the issue where it writes dindex files incorrectly.

If you’ve already deleted your local database, then you really have no choice but to let the recreate operation complete. Well, your other option is to abandon that effort and just start a new backup from scratch. You could keep the old backup data around for restores in the future if you are ever able to rebuild the database.

This is sad to hear. I tested Duplicati with over 2500 interrupted backups and didn’t get this sort of fail. There were some others, but the two main ones have open issues queued for skilled-developer work.

I’m not sure if you feel like trying to reproduce yours, but if you do, you can file an Issue stating steps. Although there’s a developer shortage now, perhaps someday a well-written issue will get attention…

number-of-retries and retry delay can be increased as much as you need in order to get connection up.

If you have the destination space, you can run a new backup (or most important partial) while wrestling troublesome backup. You’d need a second copy of Duplicati though. It can be on same system using a different port for the web UI (and maybe a different browser to avoid confusion over a localhost cookie). Second Linux system to attempt DB recreate would be another option, while current does new backup.

If old backup won’t rebuild a DB, and you have destination space, you can keep it for the RecoveryTool.

Outdated databases are both potentially helpful and rather dangerous, depending on the exact situation. Backup won’t run. It will find files at the destination the DB doesn’t know, so “Repair” will delete the files. Whether or not this helps depends on what’s happened since the old database. Compacting rearranges destination files, so if you see in the logs that it has run, the old database will also perceive missing files.

The narrow path to success may be if no compact has run, then removing extra files may be the right fix.

It’s an index only for its own use to restore what it can from damaged backup. It handles no dindex files, building its own block index directly from the dblock files. A dlist file IDs source blocks. Index finds them.

That’s probably what I’ll try then. Start a new backup to the same destination but different folder (still enough disk space left), and test-run the recovery tool to see if it’s still worth anything.

Probably did compact, I have the smart retention scheme, and I usually see a phase “Deleting unnecessary files” or something at the end of a backup cycle, which coincides with uploading and downloading the blocks of data. Can’t tell for sure from the Log, it seems to only have logged failures and not successful backups.

Hello, can you please point me to this information? Was it in changelog? I have many Duplicati jobs and recommendation to delete all dinxed files after some Duplicati version with a goal to fix some old bug seem very important:)
Thanks

Check out this post… please proceed with caution…

Got a link to the recommendation and the issue you intend to fix? You don’t want to make things worse.
This is far from a universal fix. It regenerates dindex files from database data – if database has the info.
Prior problems and fix attempts by recreating the database may prevent it. I can test that, if it helps any.

I’m not sure how reliably database condition can be determined. What would the cautious approach be? Sometimes I suggest renaming (change duplicati- prefix), or moving to a subfolder, instead of delete.
Looking inside database to do a sanity check is possible (but hard), or study an upload-verification-file.

The technique mentioned is probably not. I can think of at least one fix to a Canary dindex creation bug.
About → Changelog will show some dindex fixes, but I have no idea what problem you’re experiencing.

EDIT:

Tested a tiny clean new backup. Delete the dindex, run Repair, dindex comes back. All good so far then
delete dindex, run Recreate (delete and repair) while watching About → Show log → Live → Verbose to notice that it feels the need to download dblock files because it’s searching for blocks not in any dindex.

It does find them, but it leaves a destination without the old dindex. It has no record of its name because the database and the destination were the only places that used to know it. It doesn’t make a new name although it seems like it might be possible. That might be a nice enhancement if some expert volunteers.

I think the above is a seemingly functioning local database because the block information was recovered after some extra searching which can lead to a slow recreate, especially in the last 10% on progress bar, however it’s not functioning well enough to regenerate the dindex file even under a newly invented name.

If looking at the database with DB Browser for SQLite, one sanity check for this might be to count Blocks (dblock files) in Remotevolume table and make sure there are at least that many rows in IndexBlockLink, which is where Duplicati records the association between the dindex and the dblock file that it’s indexing.

Things that make you go, “Uh-oh.” gets technical about some cases of mismatching dindex and dblocks, however I don’t know whether that’s your issue. Missing dindex can definitely slow DB recreates though.

1 Like

Thanks for reply
My interest in this “trick” is because i have old Duplicati backup (from 2018) with 7GB sqlite DB (after vacuum) and it’s pretty slow. Rebuilding in case of problem could be pain, so the idea that I could prepare for it in a controlled way sounds tempting.
But your right that it’s pretty dangerous

Of course… definitely don’t use Recreate with the recipe mentioned in my post! Only use Repair.

If you follow the recipe exactly, I don’t think it is particularly dangerous. But just to be safe you should keep a backup of your existing local database and keep a backup of the dindex files before you delete them. Do NOT use the Recreate option. It is inherently incompatible with the idea of the recipe: to use the local database to rebuild the dindex files.

I don’t think it’s particularly dangerous, especially if you use the just to be safe precautions just above.
It has limits, but might help if Restoring files if your Duplicati installation is lost is downloading dblocks.
If everything is “pretty slow” then it might not be a dindex file problem, so fixing dindex files won’t help.