Duplicati became extremely slow

obones · January 8, 2021, 8:59am

I’m using Duplicati under Linux in a bash script called daily by cron and it had been churning just nice in the past, taking around 25 minutes to process almost 500G of data.
I do multiple calls in the batch using Duplicati.CommandLine.exe for multiple backups, which means the emails from cron contains the log for all those calls. As I was not expecting anything to fail, I became lax in my reading of these emails which lead to an almost 10 months delay before I figured out that the very first backup was failing with this error message:

database disk image is malformed database disk image is malformed

Now, looking around here, it clearly appeared that this was because the database was corrupted, so I decided to use the .recover command from the SQLite CLI to recover as much of the database that I could.
And sure enough, after I did that I was able to run a repair on the backup to make sure everything was back in order. This took in excess of 20 hours to complete!

I thought that this was due to the repair that had to be done, so I started again a normal backup with the usual command line and it too took in excess of 20 hours to complete.

I was baffled but I suddenly remembered that the .recover command from SQLite tries to get the most it can from a broken database, but it may leave out elements that are too broken.
So I issued the .schema command and there was an index missing:

CREATE UNIQUE INDEX "FileLookupPath" ON "FileLookup" ("PrefixID", "Path", "BlocksetID", "MetadataID");

I went ahead, added it back into the database and rerun the regular backup.

This time, instead of processing at most 30 files every 5 seconds, it was processing 3000 files in the same duration, finishing the entire backup process in around 22 minutes.

So, this whole message is actually not requiring any help from you guys, it’s just here to help me or others the next time one gets a corrupted SQLite database from Duplicati.

For reference, here is the schema of a valid database as of today, with version 2.5.0.111 as taken from the AUR:

CREATE TABLE IF NOT EXISTS "Operation" (
	"ID" INTEGER PRIMARY KEY,
	"Description" TEXT NOT NULL,
	"Timestamp" INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS "Remotevolume" (
	"ID" INTEGER PRIMARY KEY,
	"OperationID" INTEGER NOT NULL,
	"Name" TEXT NOT NULL,
	"Type" TEXT NOT NULL,
	"Size" INTEGER NULL,
	"Hash" TEXT NULL,
	"State" TEXT NOT NULL,
	"VerificationCount" INTEGER NOT NULL,
	"DeleteGraceTime" INTEGER NOT NULL
);
CREATE UNIQUE INDEX "RemotevolumeName" ON "Remotevolume" ("Name", "State");
CREATE TABLE IF NOT EXISTS "IndexBlockLink" (
	"IndexVolumeID" INTEGER NOT NULL,
	"BlockVolumeID" INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS "FilesetEntry" (
	"FilesetID" INTEGER NOT NULL,
	"FileID" INTEGER NOT NULL,
	"Lastmodified" INTEGER NOT NULL,
	CONSTRAINT "FilesetEntry_PK_FilesetIdFileId" PRIMARY KEY ("FilesetID", "FileID")
)  WITHOUT ROWID ;
CREATE INDEX "FilesetentryFileIdIndex" on "FilesetEntry" ("FileID");
CREATE TABLE IF NOT EXISTS "PathPrefix" (
    "ID" INTEGER PRIMARY KEY,
    "Prefix" TEXT NOT NULL
);
CREATE UNIQUE INDEX "PathPrefixPrefix" ON "PathPrefix" ("Prefix");
CREATE TABLE IF NOT EXISTS "FileLookup" (
    "ID" INTEGER PRIMARY KEY,
    "PrefixID" INTEGER NOT NULL,
    "Path" TEXT NOT NULL,
    "BlocksetID" INTEGER NOT NULL,
    "MetadataID" INTEGER NOT NULL
);
CREATE UNIQUE INDEX "FileLookupPath" ON "FileLookup" ("PrefixID", "Path", "BlocksetID", "MetadataID");
CREATE VIEW "File" AS SELECT "A"."ID" AS "ID", "B"."Prefix" || "A"."Path" AS "Path", "A"."BlocksetID" AS "BlocksetID", "A"."MetadataID" AS "MetadataID" FROM "FileLookup" "A", "PathPrefix" "B" WHERE "A"."PrefixID" = "B"."ID"
/* File(ID,Path,BlocksetID,MetadataID) */;
CREATE TABLE IF NOT EXISTS "BlocklistHash" (
	"BlocksetID" INTEGER NOT NULL,
	"Index" INTEGER NOT NULL,
	"Hash" TEXT NOT NULL
);
CREATE UNIQUE INDEX "BlocklistHashBlocksetIDIndex" ON "BlocklistHash" ("BlocksetID", "Index");
CREATE TABLE IF NOT EXISTS "Blockset" (
	"ID" INTEGER PRIMARY KEY,
	"Length" INTEGER NOT NULL,
	"FullHash" TEXT NOT NULL
);
CREATE UNIQUE INDEX "BlocksetFullHash" ON "Blockset" ("FullHash", "Length");
CREATE TABLE IF NOT EXISTS "BlocksetEntry" (
	"BlocksetID" INTEGER NOT NULL,
	"Index" INTEGER NOT NULL,
	"BlockID" INTEGER NOT NULL,
	CONSTRAINT "BlocksetEntry_PK_IdIndex" PRIMARY KEY ("BlocksetID", "Index")
)  WITHOUT ROWID ;
CREATE INDEX "BlocksetEntry_IndexIdsBackwards" ON "BlocksetEntry" ("BlockID");
CREATE TABLE IF NOT EXISTS "Block" (
	"ID" INTEGER PRIMARY KEY,
    "Hash" TEXT NOT NULL,
	"Size" INTEGER NOT NULL,
	"VolumeID" INTEGER NOT NULL
);
CREATE UNIQUE INDEX "BlockHashSize" ON "Block" ("Hash", "Size");
CREATE INDEX "Block_IndexByVolumeId" ON "Block" ("VolumeID");
CREATE TABLE IF NOT EXISTS "DeletedBlock" (
	"ID" INTEGER PRIMARY KEY,
    "Hash" TEXT NOT NULL,
	"Size" INTEGER NOT NULL,
	"VolumeID" INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS "DuplicateBlock" (
    "BlockID" INTEGER NOT NULL,
    "VolumeID" INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS "Metadataset" (
	"ID" INTEGER PRIMARY KEY,
	"BlocksetID" INTEGER NOT NULL
);
CREATE INDEX "MetadatasetBlocksetID" ON "Metadataset" ("BlocksetID");
CREATE TABLE IF NOT EXISTS "RemoteOperation" (
	"ID" INTEGER PRIMARY KEY,
	"OperationID" INTEGER NOT NULL,
	"Timestamp" INTEGER NOT NULL,
	"Operation" TEXT NOT NULL,
	"Path" TEXT NOT NULL,
	"Data" BLOB NULL
);
CREATE TABLE IF NOT EXISTS "LogData" (
	"ID" INTEGER PRIMARY KEY,
	"OperationID" INTEGER NOT NULL,
	"Timestamp" INTEGER NOT NULL,
	"Type" TEXT NOT NULL,
	"Message" TEXT NOT NULL,
	"Exception" TEXT NULL
);
CREATE TABLE IF NOT EXISTS "Version" (
    "ID" INTEGER PRIMARY KEY,
    "Version" INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS "Configuration" (
	"Key" TEXT PRIMARY KEY NOT NULL,
	"Value" TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS "ChangeJournalData" (
    "ID" INTEGER PRIMARY KEY,
    "FilesetID" INTEGER NOT NULL,       
    "VolumeName" TEXT NOT NULL,         
    "JournalID" INTEGER NOT NULL,       
    "NextUsn" INTEGER NOT NULL,         
    "ConfigHash" TEXT NOT NULL  
);
CREATE TABLE sqlite_stat1(tbl,idx,stat);
CREATE TABLE IF NOT EXISTS "Fileset" ("ID" INTEGER PRIMARY KEY,
			     "OperationID" INTEGER NOT NULL,
			     "VolumeID" INTEGER NOT NULL,
			     "IsFullBackup" INTEGER NOT NULL,
			     "Timestamp" INTEGER NOT NULL);
CREATE INDEX "BlockSize" ON "Block" ("Size");
CREATE UNIQUE INDEX "BlockHashVolumeID" ON "Block" ("Hash", "VolumeID");

The statements as returned by .schema may be in a different order, especially in a recovered database, but it’s not that difficult to use a diff too to find what’s missing.

Now I’m off to find a scheduler that allows me to run a job, made of tasks, and that sends me an email summary of how the tasks went, along with their output. Under Windows, I’m using CruiseControl.net and it should be able to run under Linux with Mono like Duplicati does, but if anyone has any other suggestion, I’m all ears.

Hope this helps.