Repairing a big database

lhs_azevedo · October 3, 2019, 4:54pm

As discussed here, we are running Duplicati in an environment without permission to delete (and overwrite) files from cloud storage, which would theoretically be possible except in cases where a repair is required. And this happened on the first backup run.

The backend being used is rclone and the target is Google Cloud Storage.

1st run rclone log: (Duplicati log didn’t output any errors)
For simplicity, some errors due to user files in use have been removed and file hashes renamed.

2019/08/15 03:21:35 ERROR : Attempt 1/3 failed with 1 errors and: Get (...) duplicati-hash-a.dblock.zip: oauth2: cannot fetch token
2019/08/15 03:21:48 ERROR : Attempt 2/3 succeeded
2019/08/15 03:22:54 ERROR : dup-hash-b: Failed to copy: googleapi: Error 403: backup--rclone@***.iam.gserviceaccount.com does not have storage.objects.delete access to duplicati-hash-c.dblock.zip., forbidden
2019/08/15 03:22:54 ERROR : Attempt 1/3 failed with 2 errors and: googleapi: Error 403: backup--rclone@***.iam.gserviceaccount.com does not have storage.objects.delete access to duplicati-hash-c.dblock.zip., forbidden
2019/08/15 03:22:54 ERROR : Attempt 2/3 succeeded

I tried to perform the backup again, 2nd run Duplicati log:

2019-08-23 10:45:38 -03 - [Warning-Duplicati.Library.Main.Operation.Backup.UploadSyntheticFilelist-MissingTemporaryFilelist]: Expected there to be a temporary fileset for synthetic filelist (1, duplicati-20190812T181116Z.dlist.zip), but none was found?
2019-09-01 06:37:00 -03 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingFile]: Missing file: duplicati-hash-d.dblock.zip
2019-09-01 06:37:00 -03 - [Warning-Duplicati.Library.Main.Operation.FilelistProcessor-MissingFile]: Missing file: duplicati-hash-e.dindex.zip
2019-09-01 06:37:00 -03 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-MissingRemoteFiles]: Found 2 files that are missing from the remote storage, please run repair
2019-09-01 06:37:00 -03 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
Duplicati.Library.Interface.UserInformationException: Found 2 files that are missing from the remote storage, please run repair
   em Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, String protectedfile)
   em Duplicati.Library.Main.Operation.BackupHandler.PostBackupVerification()
   em Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__19.MoveNext()

2nd run rclone log:

2019/08/29 09:51:47 ERROR : Attempt 1/3 failed with 1 errors and: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-d.dblock.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: no such host
2019/08/29 09:52:01 ERROR : Attempt 2/3 failed with 1 errors and: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-d.dblock.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually a temporary error during host name resolution and means that the local server has not received a response from an authorized server.
2019/08/29 09:52:13 ERROR : Attempt 3/3 failed with 1 errors and: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-d.dblock.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually (...)
2019/08/29 09:52:13 Failed to copyto: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-d.dblock.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually (...)
2019/08/29 09:52:38 ERROR : Attempt 1/3 failed with 1 errors and: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-e.dindex.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually (...)
2019/08/29 09:52:50 ERROR : Attempt 2/3 failed with 1 errors and: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-e.dindex.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually (...)
2019/08/29 09:53:02 ERROR : Attempt 3/3 failed with 1 errors and: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-e.dindex.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually (...)
2019/08/29 09:53:02 Failed to copyto: Get https://www.googleapis.com/storage/v1/b/***duplicati-hash-e.dindex.zip?alt=json&prettyPrint=false: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp: lookup oauth2.googleapis.com: getaddrinfow: This is usually (...)

I checked and in fact the duplicati-hash-d.dblock.zip and duplicati-hash-e.dindex.zip files do not exist in the cloud.

The database has over 5.4GB and the commands take a long time to execute, so I kindly ask you for tips to fix it.

Thanks in advance.

Wim_Jansen · October 3, 2019, 5:45pm

This seems more like an rclone issue I think?

lhs_azevedo · October 3, 2019, 6:41pm

I’m not sure what happened, in the 1st run rclone tries to delete duplicati-hash-c.dblock.zip for some reason (maybe a connection issue during upload) and fails the 1st attempt because it doesn’t have permission, but then it succeeds on the 2nd attempt!?
The file still exists on the cloud storage.

Anyway, on the 2nd run Duplicati Expected there to be a temporary fileset for synthetic filelist (1, duplicati-20190812T181116Z.dlist.zip), I don’t quite understand where this temporary file should be, so I can’t try to figure out why it wasn’t created.

lhs_azevedo · October 3, 2019, 9:16pm

I think I found out the cause (or one of) of the slow repair, this query:

SELECT COUNT(*) FROM "Block" WHERE "Size" > 102400

My “Block” table has 29.468.226 rows. The repair has been running for more than 3 hours, most of the time running this query that is not finished yet.

Using DB Browser for SQLite I was able to add an index to an copy of the database and the query took only 11ms to execute.

It’s safe to create this index on the live database?

lhs_azevedo · October 4, 2019, 2:07pm

Stuck on another slow query:

SELECT "A"."Hash",
       "C"."Hash"
FROM   (SELECT "BlocklistHash"."BlocksetID",
               "Block"."Hash",
               *
        FROM   "BlocklistHash",
               "Block"
        WHERE  "BlocklistHash"."Hash" = "Block"."Hash"
               AND "Block"."VolumeID" = 79369) A,
       "BlocksetEntry" B,
       "Block" C
WHERE  "B"."BlocksetID" = "A"."BlocksetID"
       AND "B"."Index" >= ( "A"."Index" * 3200 )
       AND "B"."Index" < ( ( "A"."Index" + 1 ) * 3200 )
       AND "C"."ID" = "B"."BlockID"
ORDER  BY "A"."BlocksetID",
          "B"."Index"

EDIT
Query finished now, took 0:02:51:05.703

warwickmm · October 6, 2019, 9:39pm

There’s a GitHub issue related to this here:

lhs_azevedo · October 7, 2019, 12:08pm

This one has been running since Friday, over the weekend, and it’s not over yet.

SELECT DISTINCT
   "B"."Timestamp",
   "A"."FilesetID",
   COUN T("A"."FileID") AS "FileCount" 
FROM
   "FilesetEntry" A,
   "Fileset" B 
WHERE
   "A"."FilesetID" = "B"."ID" 
   AND "A"."FileID" IN 
   (
      SELECT DISTINCT "ID" FROM
         (
            SELECT
               "ID" AS "ID",
               "BlocksetID" AS "BlocksetID" 
            FROM "File" 
            WHERE
               "BlocksetID"!= - 100 
               AND "BlocksetID" != - 200 
            UNION
            SELECT
               "A"."ID" AS "ID",
               "B"."BlocksetID" AS "BlocksetID" 
            FROM
               "File" A LEFT JOIN "Metadataset" B 
               ON "A"."MetadataID" = "B"."ID" 
         )
      WHERE
         "BlocksetID" IS NULL 
         OR "BlocksetID" IN 
         (
            SELECT DISTINCT "BlocksetID" FROM
               (
                  SELECT "BlocksetID" FROM "BlocksetEntry" 
                  WHERE "BlockID" NOT IN 
                     (
                        SELECT "ID" FROM "Block"
                        WHERE "VolumeID" IN 
                           (
                              SELECT "ID" FROM "RemoteVolume" 
                              WHERE "Type" = "Blocks"
                           )
                     )
                  UNION
                  SELECT "BlocksetID" FROM "BlocklistHash" 
                  WHERE "Hash" NOT IN 
                     (
                        SELECT "Hash" FROM "Block" 
                        WHERE "VolumeID" IN 
                           (
                              SELECT "ID" FROM "RemoteVolume" 
                              WHERE "Type" = "Blocks"
                           )
                     )
                  UNION
                  SELECT "A"."ID" AS "BlocksetID" 
                  FROM "Blockset" A 
                  LEFT JOIN
                     "BlocksetEntry" B 
                     ON "A"."ID" = "B"."BlocksetID" 
                  WHERE
                     "A"."Length" > 0 
                     AND "B"."BlocksetID" IS NULL 
               )
            WHERE "BlocksetID" NOT IN 
               (
                  SELECT "ID" FROM "Blockset" 
                  WHERE "Length" == 0
               )
         )
   )
GROUP BY "A"."FilesetID"

EDIT: Query plan:

SEARCH TABLE FilesetEntry AS A USING COVERING INDEX FilesetentryFileIdIndex (FileID=?)
LIST SUBQUERY 12
    CO-ROUTINE 2
        COMPOUND QUERY
            LEFT-MOST SUBQUERY
                SCAN TABLE File
            UNION USING TEMP B-TREE
                SCAN TABLE File AS A
                SEARCH TABLE Metadataset AS B USING INTEGER PRIMARY KEY (rowid=?)
    SCAN SUBQUERY 2
    LIST SUBQUERY 11
        CO-ROUTINE 9
            COMPOUND QUERY
                LEFT-MOST SUBQUERY
                    SCAN TABLE BlocksetEntry USING COVERING INDEX
                      BlocksetEntry_IndexIdsBackwards
                    LIST SUBQUERY 4
                        SEARCH TABLE Block USING COVERING INDEX Block_IndexByVolumeId
                          (VolumeID=?)
                        LIST SUBQUERY 3
                            SCAN TABLE RemoteVolume
                UNION USING TEMP B-TREE
                    SCAN TABLE BlocklistHash
                    LIST SUBQUERY 7
                        SEARCH TABLE Block USING INDEX Block_IndexByVolumeId
                          (VolumeID=?)
                        LIST SUBQUERY 6
                            SCAN TABLE RemoteVolume
                UNION USING TEMP B-TREE
                    SCAN TABLE Blockset AS A
                    SEARCH TABLE BlocksetEntry AS B USING PRIMARY KEY (BlocksetID=?)
        SCAN SUBQUERY 9
        LIST SUBQUERY 10
            SCAN TABLE Blockset
        USE TEMP B-TREE FOR DISTINCT
    USE TEMP B-TREE FOR DISTINCT
SEARCH TABLE Fileset AS B USING INTEGER PRIMARY KEY (rowid=?)
USE TEMP B-TREE FOR GROUP BY
USE TEMP B-TREE FOR DISTINCT

Ingmyv · October 7, 2019, 1:03pm

During the time of index creation, it will prevent writes globally because of the very limited concurrency of SQLite; all writers will ‘wait’ for the index and may or may not time out. The application could “cancel” activity due to waiting too long, I believe, but that should be it. It won’t cause corruption, etc.

(And even enterprise DB do not necessarily support concurrent index creation. Features may vary/. But in that case a ‘table’ lock is taken, rather than a global lock.)

lhs_azevedo · October 7, 2019, 5:41pm

Thanks, I made sure that no operation was being performed during creation.

drwtsn32 · October 7, 2019, 7:36pm

This looks really promising. What are the downsides to indexing the Block column? Increased database size?

lhs_azevedo · October 7, 2019, 7:43pm

This index increased the database size from around 5.4GB to 6GB - 11%

lhs_azevedo · October 8, 2019, 12:55pm

The list-broken-files command runs this query, this comment from the source code explains which blocksets are considered invalid:

Invalid blocksets include those that:

Have BlocksetEntries with unknown/invalid blocks (meaning the data to rebuild the blockset isn’t available)

Invalid blocks include those that appear to be in non-Blocks volumes (e.g., are listed as being in an Index or Files volume) or that appear in an unknown volume (-1)

Have BlocklistHash entries with unknown/invalid blocks (meaning the data which defines the list of hashes that makes up the blockset isn’t available)

Are defined in the Blockset table but have no entries in the BlocksetEntries table (this can happen during recreate if Files volumes reference blocksets that are not found in any Index files)
However, blocksets with a length of 0 are excluded from this check, as the corresponding blocks for these are not needed.

warwickmm · April 11, 2020, 7:42pm

A pull request that adds some indices to the Block table has been submitted: