Recreating database hanging for days

One of my backups got a "non-empty blocksets with no associated blocks” error and through searching repairing or eventually recreating the database seems to be the solution.

I had done that in the past just fine, but now it’s hung the past 5 times I’ve tried to recreate the database. According to the local live log set to profile, it’s hanging on a local database query:

ExecuteScalarInt64: SELECT “ID” FROM “Remotevolume” WHERE “Name” = “duplicati-######.dindex.zip.aes” took 0:00:00:00.000

What are the next steps to debug and fix this?

Duplicati Version: 2.0.6.3_beta_2021-06-17
System: Unraid with duplicati docker.
Intel i7-3770, 16 GB RAM
Lots of free CPU and RAM. Little system usage. Database files are on SSD, though I have always run with --use-block-cache=true.

Backend: Google Drive. Can still manually upload/download on the web GUI, so not at a quota.

Network: Wired gigabit ethernet directly to router with 300/300 internet
Backup set:
Source: ~300 GB
Backup: ~800 GB / 150+ Versions

I have several other backup sets running just fine on this same machine.

I created a bug report but the scrubbed db file is 1.1 GB (~ 600 MB compressed). I can upload that if needed, and have attached the system-info.txt by itself for now.

20210713_duplicati_report_info.zip (1.3 KB)

I do also have lots of backup database files, some before getting into this hanging on recreating state if those are helpful.

For the original “non-empty blocksets with no associated blocks”, I was able to identify the likely culprits using the hints from this thread:

Using a backup of the sqlite database, I found the ones causing the original error. They are real files in the source directory, though they have some special UTF-8 foreign characters. Is it possible to just restore this sqlite database and remove those files?

Although this isn’t what is causing the current hanging issue though. I inspected my current database file and in the file view or remote view, I’m nowhere near the files that caused the "non-empty blocksets with no associated blocks”

What I’m confused about is why I don’t see anything in the live profiling log. The last statement is a select statement that runs very quickly. It doesn’t seem to indicate what is attempted to be run next. Will running the recreate command via command line give me more verbose output?

It’s not hanging there. It’s saying the SELECT completed, and saying how long it took.
SELECT like this is very common. It would be useful to see context to it, for example:

2021-07-13 20:33:47 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 1 of 1
2021-07-13 20:33:47 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-bd248ac5356e442f0abc5fa103d6a783c.dblock.zip"
2021-07-13 20:33:47 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-bd248ac5356e442f0abc5fa103d6a783c.dblock.zip" took 0:00:00:00.000
2021-07-13 20:33:47 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "VolumeID" FROM "Block" WHERE "Hash" = "LsafOYpQ9X5rnNnD/0F89/AJLs/Ej9DiR9UbN3iKy5E=" AND "Size" = 137 
2021-07-13 20:33:47 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "VolumeID" FROM "Block" WHERE "Hash" = "LsafOYpQ9X5rnNnD/0F89/AJLs/Ej9DiR9UbN3iKy5E=" AND "Size" = 137  took 0:00:00:00.001
2021-07-13 20:33:47 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "VolumeID" FROM "Block" WHERE "Hash" = "06CMxdyOi3kfKwq1k4XDtubnkvVDmN5/0AvCmK6+Tt8=" AND "Size" = 17 
2021-07-13 20:33:47 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "VolumeID" FROM "Block" WHERE "Hash" = "06CMxdyOi3kfKwq1k4XDtubnkvVDmN5/0AvCmK6+Tt8=" AND "Size" = 17  took 0:00:00:00.000
2021-07-13 20:33:47 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-i4004eeff82ad4ea0a5d567d0291679d3.dindex.zip"
2021-07-13 20:33:47 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT "ID" FROM "Remotevolume" WHERE "Name" = "duplicati-i4004eeff82ad4ea0a5d567d0291679d3.dindex.zip" took 0:00:00:00.000
2021-07-13 20:33:47 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-CommitRecreateDb]: Starting - CommitRecreatedDb
2021-07-13 20:33:47 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-CommitRecreateDb]: CommitRecreatedDb took 0:00:00:00.079

You’ll need to copy or scroll, due to width, but that Processing indexlist volume 1 of 1 would look different in yours, as the above is a tiny backup with one dblock and one associated dindex. Does yours process all the indexlist volumes then fail to continue to CommitRecreatedDb, or does it stop too early?

Or maybe it does something else, but the above section is the only one in my Recreate that was similar.
This was from a log-file=<path> with log-file-log-level=Profiling. Later entries are lower (unlike in live log).

This is an interesting find. How did you determine they’re UTF-8 and not Latin-1 (ISO 8859-1) or similar?

If you mean restore from a backup of the database, it probably doesn’t match the current destination files. Dropping it in can even damage the destination, e.g. Repair may “fix” unknown new files by deleting them.
If compact has run since old DB was saved, that DB will see missing files as well as new unknown ones.

Are these files with the foreign characters seemingly recently added, e.g. if you look at time with ls -lc.?
Keeping them around would allow further testing after current crisis passes to see how they break things.
Are these your files, or do you need to consult with someone else to move them, let you take a copy, etc.?

The PURGE command is what’s normally used to purge unwanted files from backup, but it needs the DB.
That’s probably the best place to work, assuming you’re trying to preserve old file versions, not start fresh.

Are the suspected troublesome files located somewhere (e.g. in their own folder) that could be excluded?
This is just thinking ahead to when (if?) the DB gets its Recreate, and if it won’t do that, then to fresh start.

That’s why you need to look earlier to see if it’s in the middle of processing dindex files, or did the last one.
There shouldn’t be a hang either way, but at least it would be nice to find out where it was when it stopped.

By default it’s pretty quiet. You can add logging, but you could add logging to the GUI Recreate just as well.

I used python, though apparently it’s not definitive. The characters are Korean.

chardet.detect(os.fsencode(filename))
{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

I should correct myself, I found the foreign characters from a later SELECT statement finding surrounding blocks, since the block in question is of course missing association (from your SELECT statement).

SELECT * FROM File WHERE BlocksetID BETWEEN #### AND ####

It’s these neighboring files have the foreign characters. But important point is that plenty these files with these characters were backed up correctly, so maybe the issue has nothing to do with it. There is a gap of 18 BlocksetIDs that are missing in the File view, 1 of which contains the block causing the issue.

Indeed, the creation date of these is exactly right after the last successful backup time. They are actually from a google takeout archive of all their products, so there are a ton of new files introduced. Hopefully, since the BlocksetID looks somewhat linear with respect to directories and filenames, the actual files can be inferred, though the ordering does not match my OS’s alphabetical listing. But I am open to doing more testing with these files to see if it breaks things.

Good point. I did mean to say “after” not “on” and clarified this in my 1st reply.

Ah, so the issue was due to using the most verbose profiling output as mentioned, the web interface was flooded, and it only keeps some small amount of buffer. That was entirely filled with the similar SELECT statements, which as you mention, aren’t very helpful. I added an output logging file and can now see more useful messages.

Running while logging to an external file now shows me everything. I was getting a surprising amount of Google Drive 403 errors. I had previously thought this meant upload/download quota, but right when I saw that message I checked and verified I can download and upload without error. I’m thinking this also means transactions per time window quota as well. Other processes are accessing this same google drive at the same time. EDIT: I did stop all other processes and still get intermittent 403 Forbidden occasionally for some reason, but it succeeds after a few retries.

It hasn’t gotten to the point of hanging yet, but I’m guessing when exhausting the retry count and retry delays (mine was set to 25 retries, 10s delay), duplicati does not error out correctly. Based on this theory, I increased the retries to 30 with 30s of delay and restarted the database recreation and it is going strong.

Using the Profiling timer was probably a mistake, the profiling output log is up to 20 GBs now, though at least it’s being compressed by the filesystem.

If this completes successfully, I can see how backing up the files with foreign characters goes as well as see if I can reproduce the hanging scenario by changing the retry delay / limit.

Ok, I may have spoken too soon, it looks like it might be hanging again, over 2 hours since last Profiling log output.

At the higher verbose level:

2021-07-13 23:44:56 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingFilelistVolumes]: Processing filelist volume 100 of 100
2021-07-14 14:30:13 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 99999 of 100000
2021-07-14 14:30:14 -04 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingIndexlistVolumes]: Processing indexlist volume 100000 of 100000
2021-07-14 14:37:01 -04 - [Verbose-Duplicati.Library.Main.Database.LocalRecreateDatabase-ReplacedMissingVolumes]: Replaced blocks for 3 missing volumes; there are now 0 missing volumes

At the profiling level, it seems like this last command is still running, which was started over 2 hours ago.

2021-07-14 14:30:30 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery:
CREATE TEMPORARY TABLE "SwapBlocks-########" AS
SELECT "A"."ID" AS "BlockID", "A"."VolumeID" AS "SourceVolumeID", "A"."State" AS "SourceVolumeState", "B"."VolumeID" AS "TargetVolumeID", "B"."State" AS "TargetVolumeState" FROM (SELECT "Block"."ID", "Block"."VolumeID", "Remotevolume"."State" FROM "Block", "Remotevolume" WHERE "Block"."VolumeID" = "Remotevolume"."ID" and "Remotevolume"."State" = "Temporary") A, (SELECT "DuplicateBlock"."BlockID",  "DuplicateBlock"."VolumeID", "Remotevolume"."State" FROM "DuplicateBlock", "Remotevolume" WHERE "DuplicateBlock"."VolumeID" = "Remotevolume"."ID" and "Remotevolume"."State" = "Verified") B WHERE "A"."ID" = "B"."BlockID";

UPDATE "Block" SET "VolumeID" = (SELECT "TargetVolumeID" FROM "SwapBlocks-########" WHERE "Block"."ID" = "SwapBlocks-########"."BlockID") WHERE "Block"."ID" IN (SELECT "BlockID" FROM "SwapBlocks-########");

UPDATE "DuplicateBlock" SET "VolumeID" = (SELECT "SourceVolumeID" FROM "SwapBlocks-########" WHERE "DuplicateBlock"."BlockID" = "SwapBlocks-########"."BlockID") WHERE "DuplicateBlock"."BlockID" IN (SELECT "BlockID" FROM "SwapBlocks-########");
DROP TABLE "SwapBlocks-########";

DELETE FROM "IndexBlockLink" WHERE "BlockVolumeID" IN (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks" AND "State" = "Temporary" AND "ID" NOT IN (SELECT DISTINCT "VolumeID" FROM "Block"));
DELETE FROM "DuplicateBlock" WHERE "VolumeID" IN (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks" AND "State" = "Temporary" AND "ID" NOT IN (SELECT DISTINCT "VolumeID" FROM "Block"));
DELETE FROM "RemoteVolume" WHERE "Type" = "Blocks" AND "State" = "Temporary" AND "ID" NOT IN (SELECT DISTINCT "VolumeID" FROM "Block");

2021-07-14 14:37:01 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery:
CREATE TEMPORARY TABLE "SwapBlocks-########" AS
SELECT "A"."ID" AS "BlockID", "A"."VolumeID" AS "SourceVolumeID", "A"."State" AS "SourceVolumeState", "B"."VolumeID" AS "TargetVolumeID", "B"."State" AS "TargetVolumeState" FROM (SELECT "Block"."ID", "Block"."VolumeID", "Remotevolume"."State" FROM "Block", "Remotevolume" WHERE "Block"."VolumeID" = "Remotevolume"."ID" and "Remotevolume"."State" = "Temporary") A, (SELECT "DuplicateBlock"."BlockID",  "DuplicateBlock"."VolumeID", "Remotevolume"."State" FROM "DuplicateBlock", "Remotevolume" WHERE "DuplicateBlock"."VolumeID" = "Remotevolume"."ID" and "Remotevolume"."State" = "Verified") B WHERE "A"."ID" = "B"."BlockID";

UPDATE "Block" SET "VolumeID" = (SELECT "TargetVolumeID" FROM "SwapBlocks-########" WHERE "Block"."ID" = "SwapBlocks-########"."BlockID") WHERE "Block"."ID" IN (SELECT "BlockID" FROM "SwapBlocks-########");

UPDATE "DuplicateBlock" SET "VolumeID" = (SELECT "SourceVolumeID" FROM "SwapBlocks-########" WHERE "DuplicateBlock"."BlockID" = "SwapBlocks-########"."BlockID") WHERE "DuplicateBlock"."BlockID" IN (SELECT "BlockID" FROM "SwapBlocks-########");
DROP TABLE "SwapBlocks-########";

DELETE FROM "IndexBlockLink" WHERE "BlockVolumeID" IN (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks" AND "State" = "Temporary" AND "ID" NOT IN (SELECT DISTINCT "VolumeID" FROM "Block"));
DELETE FROM "DuplicateBlock" WHERE "VolumeID" IN (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks" AND "State" = "Temporary" AND "ID" NOT IN (SELECT DISTINCT "VolumeID" FROM "Block"));
DELETE FROM "RemoteVolume" WHERE "Type" = "Blocks" AND "State" = "Temporary" AND "ID" NOT IN (SELECT DISTINCT "VolumeID" FROM "Block");
 took 0:00:06:30.597
2021-07-14 14:37:01 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: Starting - ExecuteScalarInt64: SELECT COUNT(*) FROM "RemoteVolume" WHERE "State" = "Temporary" AND "Type" = "Blocks"
2021-07-14 14:37:01 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteScalarInt64]: ExecuteScalarInt64: SELECT COUNT(*) FROM "RemoteVolume" WHERE "State" = "Temporary" AND "Type" = "Blocks"  took 0:00:00:00.088
2021-07-14 14:37:01 -04 - [Verbose-Duplicati.Library.Main.Database.LocalRecreateDatabase-ReplacedMissingVolumes]: Replaced blocks for 3 missing volumes; there are now 0 missing volumes
2021-07-14 14:37:01 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: Starting - ExecuteNonQuery: INSERT INTO "Block" ("Hash", "Size", "VolumeID") SELECT "FullHash" AS "Hash", "Length" AS "Size", -1 AS "VolumeID"  FROM (SELECT "A"."FullHash", "A"."Length", CASE WHEN "B"."Hash" IS NULL THEN '' ELSE "B"."Hash" END AS "Hash", CASE WHEN "B"."Size" is NULL THEN -1 ELSE "B"."Size" END AS "Size" FROM (SELECT DISTINCT "FullHash", "Length" FROM (SELECT "BlockHash" AS "FullHash", "BlockSize" AS "Length" FROM (
        SELECT
            "E"."BlocksetID",
            "F"."Index" + ("E"."BlocklistIndex" * 3200) AS "FullIndex",
            "F"."BlockHash",
            MIN(102400, "E"."Length" - (("F"."Index" + ("E"."BlocklistIndex" * 3200)) * 102400)) AS "BlockSize",
            "E"."Hash",
            "E"."BlocklistSize",
            "E"."BlocklistHash"
        FROM
            (
                    SELECT * FROM
                    (
                        SELECT
                            "A"."BlocksetID",
                            "A"."Index" AS "BlocklistIndex",
                            MIN(3200 * 32, ((("B"."Length" + 102400 - 1) / 102400) - ("A"."Index" * (3200))) * 32) AS "BlocklistSize",
                            "A"."Hash" AS "BlocklistHash",
                            "B"."Length"
                        FROM
                            "BlocklistHash" A,
                            "Blockset" B
                        WHERE
                            "B"."ID" = "A"."BlocksetID"
                    ) C,
                    "Block" D
                WHERE
                   "C"."BlocklistHash" = "D"."Hash"
                   AND
                   "C"."BlocklistSize" = "D"."Size"
            ) E,
            "TempBlocklist-#######" F
        WHERE
           "F"."BlocklistHash" = "E"."Hash"
        ORDER BY
           "E"."BlocksetID",
           "FullIndex"
 ) UNION SELECT "BlockHash", "BlockSize" FROM "TempSmalllist-#########" )) A LEFT OUTER JOIN "Block" B ON "B"."Hash" =  "A"."FullHash" AND "B"."Size" = "A"."Length" ) WHERE "FullHash" != "Hash" AND "Length" != "Size"

The Duplicati process is using ~1% CPU and iotop reports it is at 99% IOWAIT. So maybe this command is just very slow and inefficient? Again, on an SSD and use-block-cache True. Will report back if this changes.

EDIT: IOWAIT went to 0~5%, CPU is still at 1%. These SQL Statements are taking several hours each.

EDIT2: And yay it finished recreating