Database recreate performance


#1

I have an older machine where I’ve done a test database Recreate that took about 4.5 days (for a 335M sqlite file) - so I thought I’d take a look at why database recreates take so long.

I started by copying the destination files to another folder pointed to by a test backup created on a faster machine, then running a Recreate there. This took 3.7 days - which I expected to be a lot shorter as this other machine is a lot more powerful and the destination files were local.

Note that all numbers / times here are approximate - all I did was get counts for time ranges then multiply by the middle of the time range. If I get access to a best enough of a machine to open all 2G of the SQL calls log file maybe I can narrow things down a bit… :slight_smile:

What I found was that of the 89.5 hours spent doing the recreate, 24.5 of them involved SQL commands with a duration breakdown as follows (percents are of total SQL time, not total recreate):

Pct Dur Calls Duration
0% 0.0 hrs 0 0+ days
0% 0.0 hrs 0 10-24 hours
10% 2.5 hrs 1 1-10 hours
33% 8.0 hrs 27 10-60 minutes
20% 5.0 hrs 61 1-10 minutes
20% 5.0 hrs 594 10-60 seconds
9% 2.3 hrs 1,672 1-10 seconds
6% 1.5 hrs 10.468 0.1 to 1.0 seconds
1% 0.2 hrs 16,537 0.01 to 0.1 seconds
0% 0.2 min 2,206 .001 to .01 seconds
0% 0.9 min 105,165 under 0.001 seconds

(Overlapping ranges assume upper range ends at .999999…)


Some of the slow execs seem to be these…

2.5 hrs / 1 exec STYLE call
2019-01-12 02:28:34 -06 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery:
INSERT INTO "BlocksetEntry" ("BlocksetID", "Index", "BlockID") SELECT DISTINCT "H"."BlocksetID", "H"."Index", "H"."BlockID" FROM (SELECT "E"."BlocksetID" AS "BlocksetID", "D"."FullIndex" AS "Index", "F"."ID" AS "BlockID" FROM (  
        SELECT
            "E"."BlocksetID",
            "F"."Index" + ("E"."BlocklistIndex" * 3200) AS "FullIndex",
            "F"."BlockHash",
            MIN(102400, "E"."Length" - (("F"."Index" + ("E"."BlocklistIndex" * 3200)) * 102400)) AS "BlockSize",
            "E"."Hash",
            "E"."BlocklistSize",
            "E"."BlocklistHash"
        FROM
            (
                    SELECT * FROM
                    (
                        SELECT 
                            "A"."BlocksetID",
                            "A"."Index" AS "BlocklistIndex",
                            MIN(3200 * 32, ((("B"."Length" + 102400 - 1) / 102400) - ("A"."Index" * (3200))) * 32) AS "BlocklistSize",
                            "A"."Hash" AS "BlocklistHash",
                            "B"."Length"
                        FROM 
                            "BlocklistHash" A,
                            "Blockset" B
                        WHERE 
                            "B"."ID" = "A"."BlocksetID"
                    ) C,
                    "Block" D
                WHERE
                   "C"."BlocklistHash" = "D"."Hash"
                   AND
                   "C"."BlocklistSize" = "D"."Size"
            ) E,
            "TempBlocklist-C0F15C75B70BE74D98B94D68ECAA5A6B" F
        WHERE
           "F"."BlocklistHash" = "E"."Hash"
        ORDER BY 
           "E"."BlocksetID",
           "FullIndex"
) D, "BlocklistHash" E, "Block" F, "Block" G WHERE "D"."BlocksetID" = "E"."BlocksetID" AND "D"."BlocklistHash" = "E"."Hash" AND "D"."BlocklistSize" = "G"."Size" AND "D"."BlocklistHash" = "G"."Hash" AND "D"."Blockhash" = "F"."Hash" AND "D"."BlockSize" = "F"."Size"  UNION SELECT "Blockset"."ID" AS "BlocksetID", 0 AS "Index", "Block"."ID" AS "BlockID" FROM "Blockset", "Block", "TempSmalllist-61B9E4C601A2E24BA94356D21CC3248A" S WHERE "Blockset"."Fullhash" = "S"."FileHash" AND "S"."BlockHash" = "Block"."Hash" AND "S"."BlockSize" = "Block"."Size" AND "Blockset"."Length" = "S"."BlockSize" AND "Blockset"."Length" <= 102400 ) H WHERE ("H"."BlocksetID" || ':' || "H"."Index") NOT IN (SELECT ("ExistingBlocksetEntries"."BlocksetID" || ':' || "ExistingBlocksetEntries"."Index") FROM "BlocksetEntry" "ExistingBlocksetEntries" )
took 0:02:30:27.740
8.0 hrs / 27 execs STYLE call
2019-01-11 23:58:06 -06 - [Profiling-Timer.Finished-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteNonQuery]: ExecuteNonQuery:
INSERT INTO "Block" ("Hash", "Size", "VolumeID") SELECT "FullHash" AS "Hash", "Length" AS "Size", -1 AS "VolumeID"  FROM (SELECT "A"."FullHash", "A"."Length", CASE WHEN "B"."Hash" IS NULL THEN '' ELSE "B"."Hash" END AS "Hash", CASE WHEN "B"."Size" is NULL THEN -1 ELSE "B"."Size" END AS "Size" FROM (SELECT DISTINCT "FullHash", "Length" FROM (SELECT "BlockHash" AS "FullHash", "BlockSize" AS "Length" FROM (  
        SELECT
            "E"."BlocksetID",
            "F"."Index" + ("E"."BlocklistIndex" * 3200) AS "FullIndex",
            "F"."BlockHash",
            MIN(102400, "E"."Length" - (("F"."Index" + ("E"."BlocklistIndex" * 3200)) * 102400)) AS "BlockSize",
            "E"."Hash",
            "E"."BlocklistSize",
            "E"."BlocklistHash"
        FROM
            (
                    SELECT * FROM
                    (
                        SELECT 
                            "A"."BlocksetID",
                            "A"."Index" AS "BlocklistIndex",
                            MIN(3200 * 32, ((("B"."Length" + 102400 - 1) / 102400) - ("A"."Index" * (3200))) * 32) AS "BlocklistSize",
                            "A"."Hash" AS "BlocklistHash",
                            "B"."Length"
                        FROM 
                            "BlocklistHash" A,
                            "Blockset" B
                        WHERE 
                            "B"."ID" = "A"."BlocksetID"
                    ) C,
                    "Block" D
                WHERE
                   "C"."BlocklistHash" = "D"."Hash"
                   AND
                   "C"."BlocklistSize" = "D"."Size"
            ) E,
            "TempBlocklist-C0F15C75B70BE74D98B94D68ECAA5A6B" F
        WHERE
           "F"."BlocklistHash" = "E"."Hash"
        ORDER BY 
           "E"."BlocksetID",
           "FullIndex"
 ) UNION SELECT "BlockHash", "BlockSize" FROM "TempSmalllist-61B9E4C601A2E24BA94356D21CC3248A" )) A LEFT OUTER JOIN "Block" B ON "B"."Hash" =  "A"."FullHash" AND "B"."Size" = "A"."Length" ) WHERE "FullHash" != "Hash" AND "Length" != "Size"
took 0:00:37:33.427

Hopefully I’ll get some time to dig into those two above that account for 43% of the current SQL execution time. Of course if somebody else wants to take a look at them that would be OK too… :slight_smile:

Oh, and I’m running another Recreate test on a smaller job to see if the same percentages apply to the same call types.


Duplicati database rebuild - 2gb client database for 750gb of backup files
#2

For the record, the way this SQL is built dynamically makes it very difficult to optimize things - not just the SQL but also the calling method (I suspect we’re not benefiting from store execution plans on these repeated calls).

Testing (and ultimately final code) for this may entail “hard coded” SQL. If it improves performance enough, that should be a valid reason to suffer through the pain of maintaining it.