I have an older machine where I’ve done a test database Recreate that took about 4.5 days (for a 335M sqlite file) - so I thought I’d take a look at why database recreates take so long.
I started by copying the destination files to another folder pointed to by a test backup created on a faster machine, then running a Recreate there. This took 3.7 days - which I expected to be a lot shorter as this other machine is a lot more powerful and the destination files were local.
Note that all numbers / times here are approximate - all I did was get counts for time ranges then multiply by the middle of the time range. If I get access to a best enough of a machine to open all 2G of the SQL calls log file maybe I can narrow things down a bit…
What I found was that of the 89.5 hours spent doing the recreate, 24.5 of them involved SQL commands with a duration breakdown as follows (percents are of total SQL time, not total recreate):
Pct | Dur | Calls | Duration |
---|---|---|---|
0% | 0.0 hrs | 0 | 0+ days |
0% | 0.0 hrs | 0 | 10-24 hours |
10% | 2.5 hrs | 1 | 1-10 hours |
33% | 8.0 hrs | 27 | 10-60 minutes |
20% | 5.0 hrs | 61 | 1-10 minutes |
20% | 5.0 hrs | 594 | 10-60 seconds |
9% | 2.3 hrs | 1,672 | 1-10 seconds |
6% | 1.5 hrs | 10.468 | 0.1 to 1.0 seconds |
1% | 0.2 hrs | 16,537 | 0.01 to 0.1 seconds |
0% | 0.2 min | 2,206 | .001 to .01 seconds |
0% | 0.9 min | 105,165 | under 0.001 seconds |
(Overlapping ranges assume upper range ends at .999999…)
Some of the slow execs seem to be these…
2.5 hrs / 1 exec STYLE call
2019-01-12 02:28:34 -06 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery:
INSERT INTO "BlocksetEntry" ("BlocksetID", "Index", "BlockID") SELECT DISTINCT "H"."BlocksetID", "H"."Index", "H"."BlockID" FROM (SELECT "E"."BlocksetID" AS "BlocksetID", "D"."FullIndex" AS "Index", "F"."ID" AS "BlockID" FROM (
SELECT
"E"."BlocksetID",
"F"."Index" + ("E"."BlocklistIndex" * 3200) AS "FullIndex",
"F"."BlockHash",
MIN(102400, "E"."Length" - (("F"."Index" + ("E"."BlocklistIndex" * 3200)) * 102400)) AS "BlockSize",
"E"."Hash",
"E"."BlocklistSize",
"E"."BlocklistHash"
FROM
(
SELECT * FROM
(
SELECT
"A"."BlocksetID",
"A"."Index" AS "BlocklistIndex",
MIN(3200 * 32, ((("B"."Length" + 102400 - 1) / 102400) - ("A"."Index" * (3200))) * 32) AS "BlocklistSize",
"A"."Hash" AS "BlocklistHash",
"B"."Length"
FROM
"BlocklistHash" A,
"Blockset" B
WHERE
"B"."ID" = "A"."BlocksetID"
) C,
"Block" D
WHERE
"C"."BlocklistHash" = "D"."Hash"
AND
"C"."BlocklistSize" = "D"."Size"
) E,
"TempBlocklist-C0F15C75B70BE74D98B94D68ECAA5A6B" F
WHERE
"F"."BlocklistHash" = "E"."Hash"
ORDER BY
"E"."BlocksetID",
"FullIndex"
) D, "BlocklistHash" E, "Block" F, "Block" G WHERE "D"."BlocksetID" = "E"."BlocksetID" AND "D"."BlocklistHash" = "E"."Hash" AND "D"."BlocklistSize" = "G"."Size" AND "D"."BlocklistHash" = "G"."Hash" AND "D"."Blockhash" = "F"."Hash" AND "D"."BlockSize" = "F"."Size" UNION SELECT "Blockset"."ID" AS "BlocksetID", 0 AS "Index", "Block"."ID" AS "BlockID" FROM "Blockset", "Block", "TempSmalllist-61B9E4C601A2E24BA94356D21CC3248A" S WHERE "Blockset"."Fullhash" = "S"."FileHash" AND "S"."BlockHash" = "Block"."Hash" AND "S"."BlockSize" = "Block"."Size" AND "Blockset"."Length" = "S"."BlockSize" AND "Blockset"."Length" <= 102400 ) H WHERE ("H"."BlocksetID" || ':' || "H"."Index") NOT IN (SELECT ("ExistingBlocksetEntries"."BlocksetID" || ':' || "ExistingBlocksetEntries"."Index") FROM "BlocksetEntry" "ExistingBlocksetEntries" )
took 0:02:30:27.740
8.0 hrs / 27 execs STYLE call
2019-01-11 23:58:06 -06 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery:
INSERT INTO "Block" ("Hash", "Size", "VolumeID") SELECT "FullHash" AS "Hash", "Length" AS "Size", -1 AS "VolumeID" FROM (SELECT "A"."FullHash", "A"."Length", CASE WHEN "B"."Hash" IS NULL THEN '' ELSE "B"."Hash" END AS "Hash", CASE WHEN "B"."Size" is NULL THEN -1 ELSE "B"."Size" END AS "Size" FROM (SELECT DISTINCT "FullHash", "Length" FROM (SELECT "BlockHash" AS "FullHash", "BlockSize" AS "Length" FROM (
SELECT
"E"."BlocksetID",
"F"."Index" + ("E"."BlocklistIndex" * 3200) AS "FullIndex",
"F"."BlockHash",
MIN(102400, "E"."Length" - (("F"."Index" + ("E"."BlocklistIndex" * 3200)) * 102400)) AS "BlockSize",
"E"."Hash",
"E"."BlocklistSize",
"E"."BlocklistHash"
FROM
(
SELECT * FROM
(
SELECT
"A"."BlocksetID",
"A"."Index" AS "BlocklistIndex",
MIN(3200 * 32, ((("B"."Length" + 102400 - 1) / 102400) - ("A"."Index" * (3200))) * 32) AS "BlocklistSize",
"A"."Hash" AS "BlocklistHash",
"B"."Length"
FROM
"BlocklistHash" A,
"Blockset" B
WHERE
"B"."ID" = "A"."BlocksetID"
) C,
"Block" D
WHERE
"C"."BlocklistHash" = "D"."Hash"
AND
"C"."BlocklistSize" = "D"."Size"
) E,
"TempBlocklist-C0F15C75B70BE74D98B94D68ECAA5A6B" F
WHERE
"F"."BlocklistHash" = "E"."Hash"
ORDER BY
"E"."BlocksetID",
"FullIndex"
) UNION SELECT "BlockHash", "BlockSize" FROM "TempSmalllist-61B9E4C601A2E24BA94356D21CC3248A" )) A LEFT OUTER JOIN "Block" B ON "B"."Hash" = "A"."FullHash" AND "B"."Size" = "A"."Length" ) WHERE "FullHash" != "Hash" AND "Length" != "Size"
took 0:00:37:33.427
Hopefully I’ll get some time to dig into those two above that account for 43% of the current SQL execution time. Of course if somebody else wants to take a look at them that would be OK too…
Oh, and I’m running another Recreate test on a smaller job to see if the same percentages apply to the same call types.