Very slow database recreation

adhawkins · September 29, 2020, 12:50pm

Hi all

Running what I believe is the latest beta: 2.0.5.1_beta_2020-01-18, I had the database of one of my backups corrupt recently, so I’m having to rebuild it.

As usual, it’s got to 90% quite quickly, and is now putting out a message like this every 10 minutes or so:

29 Sep 2020 13:37: Pass 3 of 3, processing blocklist volume 213 of 7902

Does it really have to process all 7902 volumes? If so, that’s going to take it another 50 odd days at this rate!

The backup is ‘only’ about 350 Gig in size. Should I increase the blocksize above its default of 50 MBytes in future? Would that speed this process up?

Thanks

Andy

ts678 · September 30, 2020, 11:43pm

About → System information can confirm that. Is the backup old? Older Duplicati had more problems.
Also a working database (which you used to have) might unfortunately mask issues at the destination.

Any recollection of its error messages or other symptoms?

Ideally it should get to 70% and be done. 90% to 100% is last-ditch search for data blocks that weren’t recorded in seen dindex files. Duplicati doesn’t know what dblocks they live in without reading them all.

I think the final-10% search goes all the way, without option to stop, and without periodic status checks. Clearly there is room for improvement, though it might sacrifice other things such as getting data back.

Sometimes it gets to the end and is still missing data, due to backup corruption or some other problem. Recording error messages at that time is important, because we don’t want to repeat any lengthy runs.

People who don’t care much about old file versions find it faster to start again. People who do care may attempt to speed things up by adding resources, e.g. using fast parallel downloader to a local drive that Duplicati can read from for Recreate (the backup files are portable), adding ramdisks, SSDs, and so on.

People who just want their data back have other options for restore that don’t need usual database build.

Manual efforts to kill the Recreate and take some losses have not always worked, but it could be tried…

blocksize defaults to 100 KB, which might be slightly low for that size backup. Raising it may speed up Recreate in early section that ran quickly, but won’t help in the abnormal search-everything you’re in…

Choosing sizes in Duplicati

Regardless, blocksize can’t be changed on an existing backup, but what direction would you like to try?

adhawkins · October 1, 2020, 12:58pm

Hi,

Thanks for the response.

That version string is definitely what I’m running, I just wasn’t 100% sure that I’m running latest Beta.

I gave up in the end and aborted the rebuild. Am in the process of backing up from scratch again. It’s not the end of the world losing the history in this case.

This is the error I got by email when the database was reported as being corrupted:

Failed: The database disk image is malformed
database disk image is malformed
Details: Mono.Data.Sqlite.SqliteException (0x80004005): The database disk image is malformed
database disk image is malformed
  at Mono.Data.Sqlite.SQLite3.Reset (Mono.Data.Sqlite.SqliteStatement stmt) [0x00084] in <a0e5f0a9e64d4445ba1b60592daf818a>:0 
  at Mono.Data.Sqlite.SQLite3.Step (Mono.Data.Sqlite.SqliteStatement stmt) [0x0003d] in <a0e5f0a9e64d4445ba1b60592daf818a>:0 
  at Mono.Data.Sqlite.SqliteDataReader.NextResult () [0x00104] in <a0e5f0a9e64d4445ba1b60592daf818a>:0 
  at Mono.Data.Sqlite.SqliteDataReader..ctor (Mono.Data.Sqlite.SqliteCommand cmd, System.Data.CommandBehavior behave) [0x0004e] in <a0e5f0a9e64d4445ba1b60592daf818a>:0 
  at (wrapper remoting-invoke-with-check) Mono.Data.Sqlite.SqliteDataReader..ctor(Mono.Data.Sqlite.SqliteCommand,System.Data.CommandBehavior)
  at Mono.Data.Sqlite.SqliteCommand.ExecuteReader (System.Data.CommandBehavior behavior) [0x00006] in <a0e5f0a9e64d4445ba1b60592daf818a>:0 
  at Mono.Data.Sqlite.SqliteCommand.ExecuteDbDataReader (System.Data.CommandBehavior behavior) [0x00000] in <a0e5f0a9e64d4445ba1b60592daf818a>:0 
  at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader () [0x00000] in <9191a157b70249c1b9d6e225e09734a7>:0 
  at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteScalarInt64 (System.Data.IDbCommand self, System.Boolean writeLog, System.String cmd, System.Int64 defaultvalue, System.Object[] values) [0x00061] in <8f1de655bd1240739a78684d845cecc8>:0 
  at Duplicati.Library.Main.Database.ExtensionMethods.ExecuteScalarInt64 (System.Data.IDbCommand self, System.String cmd, System.Int64 defaultvalue, System.Object[] values) [0x00000] in <8f1de655bd1240739a78684d845cecc8>:0 
  at Duplicati.Library.Main.Database.LocalDatabase.GetBlocksLargerThan (System.Int64 fhblocksize) [0x0000c] in <8f1de655bd1240739a78684d845cecc8>:0 
  at Duplicati.Library.Main.Utility.VerifyParameters (Duplicati.Library.Main.Database.LocalDatabase db, Duplicati.Library.Main.Options options, System.Data.IDbTransaction transaction) [0x001f0] in <8f1de655bd1240739a78684d845cecc8>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.RunAsync (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x0009c] in <8f1de655bd1240739a78684d845cecc8>:0 
  at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <9a758ff4db6c48d6b3d4d0e5c2adf6d1>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter, System.Threading.CancellationToken token) [0x00009] in <8f1de655bd1240739a78684d845cecc8>:0 
  at Duplicati.Library.Main.Controller+<>c__DisplayClass14_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x0004b] in <8f1de655bd1240739a78684d845cecc8>:0 
  at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0011c] in <8f1de655bd1240739a78684d845cecc8>:0

I’ve set the ‘Remote Volume Size’ to 250 MB to see what effect this has.

It’s strange, as other database rebuilds I’ve done recently haven’t had this issue, but they are all newly seeded backups (in the last month or so I split my large backup set from 2 or 3 TB into smaller chunks, max size around 500 Gig). I’ve had to rebuild one or two of these for one reason and another, and they’ve always completed relatively quickly.

This particular backup is probably quite old though, so that could explain it?

Andy

ts678 · October 1, 2020, 10:01pm

adhawkins:

Failed: The database disk image is malformed
database disk image is malformed
Details: Mono.Data.Sqlite.SqliteException (0x80004005): The database disk image is malformed
database disk image is malformed

These errors are bad news. It’s an SQLite corruption below the level that Duplicati uses for its records.
These are sometimes fixable to the point where the DB can be opened, but I’m not sure if all is healthy.
How To Corrupt An SQLite Database File gives ways this can happen . One forum report had repeats, seemingly tied to (I think) crashes on Linux. I hope this is a very infrequent headache on your system…
If you can think of what might have preceded corruption, it can add to the data of the issue in the forum.

It’s hard to say, but it’s possible. 2.0.5.1 is much better than 2.0.4.23 (though not perfect) in terms of the backup reliability. Usually issues would be more obvious. Yours seems like a latent issue in backup files. Probably doing an occasional test restore (especially valid with no-local-blocks set) can give assurance.
Another option is to rename the database (as a precaution) occasionally to prove that a Recreate works.

dvpeace · October 2, 2020, 9:28am

I’m new to Duplicati and have run my first backup with forther daily backups for 6 days. I’m now trying to restore to a different machine as I woudl in a DR situation. The database recreation is taking a very long time. The backup is on OneDrive, I noticed that you say it would be faster to bring the backup locally. Should I sync the Onedrive folder locally on my DR machine and then restore from that?

I just want to be sure of the best approach, I do not want to be unsure when in a real DR situation.

Any thoughts or advice would be welcome.

Cheers
David

ts678 · October 2, 2020, 7:40pm

How big a backup? What percent is progress bar at? See earlier discussion of 90%-100% slow spot, although I’d certainly hope a new backup is coming out well. If backup is too huge, there are solutions.
For a better look (as seen in original post), use About → Show log → Live → Verbose to see status.

dvpeace · October 5, 2020, 8:29am

Backup was 360GB, it came down eventually so DR strategy is proven. However, I wonder if it might be an idea to backup the database witht he backup? That way it can be restored or rebuilt as needed. Just a thought.

Thanks for the help.

David

drwtsn32 · October 5, 2020, 2:41pm

If Duplicati had to download dblocks during a database recreation, there is a way to fix this from my experience. The procedure requires an in-tact, functioning local database.

I have tested db recreation on all my machines and where dblocks needed to be downloaded, I’ve run through this procedure and then retested. Afterward db recreation would no longer require dblock processing, and would therefore recreate the database much, much faster.

Procedure:

Again, you must have a functioning database. Do not do proceed otherwise.
Delete all the dindex files on the back end. (For added safety, you could move the files and not delete until procedure is successful.)
Run “Repair” in the Duplicati Web UI.
Duplicati will regenerate and reupload all of the dindex files based on information in the database.
Retest database recreation. (For added safety, you could move the database and not delete until procedure is successful.)
Watch the Live Log / Verbose to watch if dblocks still need to be downloaded.

On one of my machines the database recreation time went from over 5 days to 15 minutes.

The root cause of the issue seems to be due to some bugs in older versions of Duplicati that wrote dindex files incorrectly, but I’m not entirely sure.

f_to_the_m · October 9, 2020, 6:26am

Thanks for all the answers, I have the same issue.

People who don’t care much about old file versions find it faster to start again.

What is the easiest way to do this?

adhawkins · October 9, 2020, 7:01am

Delete the local database.
Move the files out of the way in the backup location
Run the backup

Andy

DCinME · October 14, 2020, 5:07pm

I am also experiencing problems with extremely slow database recreation.

Recently one of my backups lost access to it’s storage partway through a backup and I decided to try a delete/recreate of the database to clear up some of the resulting problems. As many others have indicated, the initial steps go fairly quickly before slowing down dramatically.

I looked at the profiling logs, and the massive slowdown appears to be due to some database queries that are taking upwards of 20 minutes or more each to run.

The machine this backup is running on is not resource constrained in any way. It has a relatively fast and modern CPU (12 core Xeon), plenty of RAM, local NVMe SSDs, and gigabit access to the backup destination. Resource availability doesn’t seem to be a problem, because during these queries there is no disk or network activity, and Duplicati appears to be consuming very little CPU power.

I have experienced this issue both on the latest beta release, as well as the latest canary release.

The backup size is about 1TB (14 versions). The block size is 256KB. The database is about 4GB. The d-block size is 50MB.

If it matters, the backup contains a small number of relatively large files; primarily VHDX disk images.

Luckily, I’m not in a position where I need these backups right now.

Is there a way to improve this, or is it just a thing that is hopelessly bottlenecked at the database for now?

ts678 · October 14, 2020, 10:23pm

If it’s in SQLite the question (whose answer isn’t clear) is how many of the 12 cores SQLite can use.
If it can’t use many, then “pegged” would look like about 8% usage on Task Manager for all the cores.

Raising blocksize as mentioned earlier helps for large backups. Is 1 TB the destination size or source?
SQL tweaking has been attempted in at least some places before, but we need experts. Any out there?

Initial backup stuck at 100% was slowness of initial backup due to query plan doing SCAN not SEARCH which was fixable by running ANALYZE. Maybe Recreate has the same slowness inserting all its blocks, however if you follow the thread to its GitHub PR, the proposal of ANALYZE for initial backup got rejected. Seemingly we’re wanting other paths to speed, but I’m not sure who’ll do it. Any volunteer SQL experts?

Meanwhile, I have some tentative benchmarks showing slowdowns at about a million blocks, and it’s a pretty hard wall (at size cubed). I used tiny blocksize to test this (don’t have TBs). Anyone care to test?

The BlocksetEntry table points fo the blocks of a file (known as a blockset). I’m guessing your status bar hasn’t gotten to the 90% point where the pain is mostly downloading dblock files in search of lost blocks.
Issues that cause this search are slowly being identified. Many were fixed in 2.0.5.1, but there are more.

To know for sure where you are, you can look at About → Show log → Live → Verbose for progress…
Profiling logs would, of course, be a superset of that. Look for lines with “Processing” and current/totals.

drwtsn32 · October 15, 2020, 12:32am

Pretty sure this part is single-threaded and will never use more than one core, unfortunately.

The problem is that when database recreation requires dblocks, it’s slow. I think this is due to dindex files not being written properly by older versions of Duplicati. My evidence is comparing database recreation times before and after rewriting dindex files by a recent canary version. Unfortunately if you lose your database you can’t do the rewrite trick, and I think the only option is to wait for a database recreation.

ts678 · October 15, 2020, 2:16am

That’s one version. Another is just block insert volume. The problem is telling them apart.
The progress bar is a good indicator, and the Processing messages in the log are better.

This morning I was doing a test Recreate but Google Drive gave lots of dindex get errors.
I had 0 retries allowed (also a test), so effect was to force the dblock downloads, like this:

2020-10-14 08:02:51 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 62 of 82
2020-10-14 08:02:51 -04 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-IndexFileProcessingFailed]: Failed to process index file: duplicati-i007318f96b63413eb78712cd4d8a3c41.dindex.zip.aes
System.Net.WebException: The remote server returned an error: (403) Forbidden.
2020-10-14 08:02:51 -04 - [Retry-Duplicati.Library.Main.BackendManager-RetryGet]: Operation Get with file duplicati-ib41f94fc6d524443bc36661736178cd1.dindex.zip.aes attempt 1 of 0 failed with message: The remote server returned an error: (403) Forbidden.
System.Net.WebException: The remote server returned an error: (403) Forbidden.
2020-10-14 08:02:51 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 63 of 82
2020-10-14 08:02:51 -04 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-IndexFileProcessingFailed]: Failed to process index file: duplicati-ib41f94fc6d524443bc36661736178cd1.dindex.zip.aes
System.Net.WebException: The remote server returned an error: (403) Forbidden.
...
2020-10-14 08:03:28 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 82 of 82
2020-10-14 08:03:40 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: INSERT INTO "BlocksetEntry" ("BlocksetID", "Index", "BlockID") SELECT DISTINCT "H"."BlocksetID", "H"."Index", "H"."BlockID" FROM (SELECT "E"."BlocksetID" AS "BlocksetID", "D"."FullIndex" AS "Index", "F"."ID" AS "BlockID" FROM (  
2020-10-14 08:03:41 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: INSERT INTO "BlocksetEntry" ("BlocksetID", "Index", "BlockID") SELECT DISTINCT "H"."BlocksetID", "H"."Index", "H"."BlockID" FROM (SELECT "E"."BlocksetID" AS "BlocksetID", "D"."FullIndex" AS "Index", "F"."ID" AS "BlockID" FROM (  
2020-10-14 08:03:43 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingBlocklistVolumes]: Pass 2 of 3, processing blocklist volume 1 of 26
2020-10-14 08:03:43 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: INSERT INTO "BlocksetEntry" ("BlocksetID", "Index", "BlockID") SELECT DISTINCT "H"."BlocksetID", "H"."Index", "H"."BlockID" FROM (SELECT "E"."BlocksetID" AS "BlocksetID", "D"."FullIndex" AS "Index", "F"."ID" AS "BlockID" FROM (

In its full multi-line glory, an example INSERT line looks like:

2020-10-14 08:03:43 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery: INSERT INTO "BlocksetEntry" ("BlocksetID", "Index", "BlockID") SELECT DISTINCT "H"."BlocksetID", "H"."Index", "H"."BlockID" FROM (SELECT "E"."BlocksetID" AS "BlocksetID", "D"."FullIndex" AS "Index", "F"."ID" AS "BlockID" FROM (  
        SELECT
            "E"."BlocksetID",
            "F"."Index" + ("E"."BlocklistIndex" * 3200) AS "FullIndex",
            "F"."BlockHash",
            MIN(102400, "E"."Length" - (("F"."Index" + ("E"."BlocklistIndex" * 3200)) * 102400)) AS "BlockSize",
            "E"."Hash",
            "E"."BlocklistSize",
            "E"."BlocklistHash"
        FROM
            (
                    SELECT * FROM
                    (
                        SELECT 
                            "A"."BlocksetID",
                            "A"."Index" AS "BlocklistIndex",
                            MIN(3200 * 32, ((("B"."Length" + 102400 - 1) / 102400) - ("A"."Index" * (3200))) * 32) AS "BlocklistSize",
                            "A"."Hash" AS "BlocklistHash",
                            "B"."Length"
                        FROM 
                            "BlocklistHash" A,
                            "Blockset" B
                        WHERE 
                            "B"."ID" = "A"."BlocksetID"
                    ) C,
                    "Block" D
                WHERE
                   "C"."BlocklistHash" = "D"."Hash"
                   AND
                   "C"."BlocklistSize" = "D"."Size"
            ) E,
            "TempBlocklist-27BF81D3E2CDA84EB525819B020D8BFE" F
        WHERE
           "F"."BlocklistHash" = "E"."Hash"
        ORDER BY 
           "E"."BlocksetID",
           "FullIndex"
) D, "BlocklistHash" E, "Block" F, "Block" G WHERE "D"."BlocksetID" = "E"."BlocksetID" AND "D"."BlocklistHash" = "E"."Hash" AND "D"."BlocklistSize" = "G"."Size" AND "D"."BlocklistHash" = "G"."Hash" AND "D"."Blockhash" = "F"."Hash" AND "D"."BlockSize" = "F"."Size"  UNION SELECT "Blockset"."ID" AS "BlocksetID", 0 AS "Index", "Block"."ID" AS "BlockID" FROM "Blockset", "Block", "TempSmalllist-B9514307EF75024FA1FA5DFBED5F4B68" S WHERE "Blockset"."Fullhash" = "S"."FileHash" AND "S"."BlockHash" = "Block"."Hash" AND "S"."BlockSize" = "Block"."Size" AND "Blockset"."Length" = "S"."BlockSize" AND "Blockset"."Length" <= 102400 ) H WHERE ("H"."BlocksetID" || ':' || "H"."Index") NOT IN (SELECT ("ExistingBlocksetEntries"."BlocksetID" || ':' || "ExistingBlocksetEntries"."Index") FROM "BlocksetEntry" "ExistingBlocksetEntries" ) took 0:00:00:00.221

which looks similar to the reported log line, and is possibly from FindMissingBlocklistHashes, called here.
If that’s the case, then this is probably the dblock-download case, but user logs would solve such doubts.

So it looks like latest Canary may go to dblock download if Recreate exhausts retries on dindex download. Interrupting backup can leave hidden damage so Recreate will download all dblocks then fail from missing blocks #4341 is another way, and for Beta it might be new for 2.0.5.1, offsetting general high improvement.

EDIT: Not saying user report was latest canary (though it doesn’t say). Older release had their own issues.

DCinME · October 15, 2020, 3:39am

I think that was the genesis of my problem. One of my backups (on 2.0.5.1) had the destination disappear partway through and as a result had a 0KB dindex file written in it. Subsequent backups were running, but were throwing warnings about it each time. I might have been able to get away with a Repair rather than a Recreate, but that’s water under the bridge now.

@ts678 mentioned that in experimentation they see significant slowdown once there were more than a million blocks. My backup source was roughly 1.1 TB, so with a 256KB blocksize that would put me at least 4 million blocks. If there really is an n-cubed slowdown versus the number of blocks during database recreation that would explain why there are reports of it taking weeks to finish.

This time around I gave up and started the backup set from scratch; this time with a 1MB blocksize. In my own experimentation this will speed things up considerably since I only have a few very large files, at the cost of losing some of the benefits of deduplication.