Slow Database Delete and Repair Queries

So with the release of the new beta I decided to upgrade from 2.0.3.5. However, after that finished I tried running one of my backups and got a “Failed: Unexpected difference in fileset version 14: 4/10/2018 2:00:00 AM (database id: 161), found 105124 entries, but expected 105125” error. Repair did nothing, so I tried running the recreate. It’s been running for 3 going on 4 days. The source is only 164GB and the backup is 187GB, with 16 versions. The destination is a NAS on the local network. My CPU usage shows 15%, and the disk and ethernet usage on my NAS also looks pretty negligible overall.

Checking the logs and setting it to profiling it looks like it’s running an insert query per chunk (49MB). Each of these queries is taking 30-40 seconds to run. This is just based on what shows in the logs so I don’t know if there’s anything else that might influence this. I’ll admit it’s hard to see exactly what this query is trying to do (looks like the general idea is looking to make sure the hash isn’t already in the db?), and my SQL isn’t the best, but it looks really ugly. Select statements nested 7 deep, with a union and an outer join.

INSERT INTO
   "block" ("hash", "size", "volumeid") 
   SELECT
      "fullhash" AS "Hash",
      "length" AS "Size",
      - 1 AS "VolumeID" 
   FROM
      (
         SELECT
            "A"."fullhash",
            "A"."length",
            CASE
               WHEN
                  "B"."hash" IS NULL 
               THEN
                  '' 
               ELSE
                  "B"."hash" 
            END
            AS "Hash", 
            CASE
               WHEN
                  "B"."size" IS NULL 
               THEN
                  - 1 
               ELSE
                  "B"."size" 
            END
            AS "Size" 
         FROM
            (
               SELECT DISTINCT
                  "fullhash",
                  "length" 
               FROM
                  (
                     SELECT
                        "blockhash" AS "FullHash",
                        "blocksize" AS "Length" 
                     FROM
                        (
                           SELECT
                              "E"."blocksetid",
                              "F"."index" + ( "E"."blocklistindex" * 3200 ) AS "FullIndex",
                              "F"."blockhash",
                              Min(102400, "E"."length" - ( ( "F"."index" + ( "E"."blocklistindex" * 3200 ) ) * 102400 )) AS "BlockSize",
                              "E"."hash",
                              "E"."blocklistsize",
                              "E"."blocklisthash" 
                           FROM
                              (
                                 SELECT
                                    * 
                                 FROM
                                    (
                                       SELECT
                                          "A"."blocksetid",
                                          "A"."index" AS "BlocklistIndex",
                                          Min(3200 * 32, 
                                          (
( ( "B"."length" + 102400 - 1 ) / 102400 ) - ( "A"."index" * ( 3200 ) ) 
                                          )
                                          * 32 ) AS "BlocklistSize",
                                          "A"."hash" AS "BlocklistHash",
                                          "B"."length" 
                                       FROM
                                          "blocklisthash" A,
                                          "blockset" B 
                                       WHERE
                                          "B"."id" = "A"."blocksetid"
                                    )
                                    C,
                                    "block" D 
                                 WHERE
                                    "C"."blocklisthash" = "D"."hash" 
                                    AND "C"."blocklistsize" = "D"."size"
                              )
                              E,
                              "tempblocklist - f672c0e9da6ec741ad0420aacddaf83b" F 
                           WHERE
                              "F"."blocklisthash" = "E"."hash" 
                           ORDER BY
                              "E"."blocksetid",
                              "fullindex"
                        )
                     UNION
                     SELECT
                        "blockhash",
                        "blocksize" 
                     FROM
                        "tempsmalllist - 8f523a800627b14e910898be1249d607"
                  )
            )
            A 
            LEFT OUTER JOIN
               "block" B 
               ON "B"."hash" = "A"."fullhash" 
               AND "B"."size" = "A"."length"
      )
   WHERE
      "fullhash" != "hash" 
      AND "length" != "size"

Has there been much look into the performance of this? Spending multiple days to restore after a database corruption for only a 180GB backup seems really excessive.

1 Like

There really are some slow queries throughout the codebase, my SQL is not the best also, so I can’t be of much help for now. I hope to be able to help improving this in the future.

Got exactly the same thing. Destination is also a Synology NAS in local network (2xGBit link aggregation). My DBs (two of them) where fairly big (around 2-3 GB each). It took 5 days to complete ONE DB to recreate (on a fast system i might add, but no significant load).
And the error was still there after that. So it was WAY faster to upload just the whole thing again (8 hrs) than to try to repair it.

What also bugs me, is, that this seems to be an error connected to the upgrade in the first place…

Is this fixed in canary or is this issue still open?

(It’s really bad)

Am I wrong or this is not really needed:

      ORDER BY
                          "E"."blocksetid",
                          "fullindex"

The canary fixes an issue where duplicati would scavenge for blocks in all the volumes if it couldn’t find the block for an empty file (which doesn’t exist).

This makes significant differences if you are affected by that issue but of course the rest of the backup can still take time.