Performance: 30 hours "starting backup" every time since 2.0.5.1 upgrade

ts678 · March 5, 2020, 6:32pm

Pretty-printing that line in poorsql.com makes it a little more readable.

SELECT COUNT(*)
FROM (
	SELECT DISTINCT "Path"
	FROM (
		SELECT "L"."Path"
			,"L"."Lastmodified"
			,"L"."Filelength"
			,"L"."Filehash"
			,"L"."Metahash"
			,"L"."Metalength"
			,"L"."BlocklistHash"
			,"L"."FirstBlockHash"
			,"L"."FirstBlockSize"
			,"L"."FirstMetaBlockHash"
			,"L"."FirstMetaBlockSize"
			,"M"."Hash" AS "MetaBlocklistHash"
		FROM (
			SELECT "J"."Path"
				,"J"."Lastmodified"
				,"J"."Filelength"
				,"J"."Filehash"
				,"J"."Metahash"
				,"J"."Metalength"
				,"K"."Hash" AS "BlocklistHash"
				,"J"."FirstBlockHash"
				,"J"."FirstBlockSize"
				,"J"."FirstMetaBlockHash"
				,"J"."FirstMetaBlockSize"
				,"J"."MetablocksetID"
			FROM (
				SELECT "A"."Path" AS "Path"
					,"D"."Lastmodified" AS "Lastmodified"
					,"B"."Length" AS "Filelength"
					,"B"."FullHash" AS "Filehash"
					,"E"."FullHash" AS "Metahash"
					,"E"."Length" AS "Metalength"
					,"A"."BlocksetID" AS "BlocksetID"
					,"F"."Hash" AS "FirstBlockHash"
					,"F"."Size" AS "FirstBlockSize"
					,"H"."Hash" AS "FirstMetaBlockHash"
					,"H"."Size" AS "FirstMetaBlockSize"
					,"C"."BlocksetID" AS "MetablocksetID"
				FROM "File" A
				LEFT JOIN "Blockset" B ON "A"."BlocksetID" = "B"."ID"
				LEFT JOIN "Metadataset" C ON "A"."MetadataID" = "C"."ID"
				LEFT JOIN "FilesetEntry" D ON "A"."ID" = "D"."FileID"
				LEFT JOIN "Blockset" E ON "E"."ID" = "C"."BlocksetID"
				LEFT JOIN "BlocksetEntry" G ON "B"."ID" = "G"."BlocksetID"
				LEFT JOIN "Block" F ON "G"."BlockID" = "F"."ID"
				LEFT JOIN "BlocksetEntry" I ON "E"."ID" = "I"."BlocksetID"
				LEFT JOIN "Block" H ON "I"."BlockID" = "H"."ID"
				WHERE "A"."BlocksetId" >= 0
					AND "D"."FilesetID" = 307
					AND (
						"I"."Index" = 0
						OR "I"."Index" IS NULL
						)
					AND (
						"G"."Index" = 0
						OR "G"."Index" IS NULL
						)
				) J
			LEFT OUTER JOIN "BlocklistHash" K ON "K"."BlocksetID" = "J"."BlocksetID"
			ORDER BY "J"."Path"
				,"K"."Index"
			) L
		LEFT OUTER JOIN "BlocklistHash" M ON "M"."BlocksetID" = "L"."MetablocksetID"
		)
	
	UNION
	
	SELECT DISTINCT "Path"
	FROM (
		SELECT "G"."BlocksetID"
			,"G"."ID"
			,"G"."Path"
			,"G"."Length"
			,"G"."FullHash"
			,"G"."Lastmodified"
			,"G"."FirstMetaBlockHash"
			,"H"."Hash" AS "MetablocklistHash"
		FROM (
			SELECT "B"."BlocksetID"
				,"B"."ID"
				,"B"."Path"
				,"D"."Length"
				,"D"."FullHash"
				,"A"."Lastmodified"
				,"F"."Hash" AS "FirstMetaBlockHash"
				,"C"."BlocksetID" AS "MetaBlocksetID"
			FROM "FilesetEntry" A
				,"File" B
				,"Metadataset" C
				,"Blockset" D
				,"BlocksetEntry" E
				,"Block" F
			WHERE "A"."FileID" = "B"."ID"
				AND "B"."MetadataID" = "C"."ID"
				AND "C"."BlocksetID" = "D"."ID"
				AND "E"."BlocksetID" = "C"."BlocksetID"
				AND "E"."BlockID" = "F"."ID"
				AND "E"."Index" = 0
				AND (
					"B"."BlocksetID" = - 100
					OR "B"."BlocksetID" = - 200
					)
				AND "A"."FilesetID" = 307
			) G
		LEFT OUTER JOIN "BlocklistHash" H ON "H"."BlocksetID" = "G"."MetaBlocksetID"
		ORDER BY "G"."Path"
			,"H"."Index"
		)
	)

and might be the below in VerifyConsistency() which checks the current backups before adding next. Because every backup builds on the previous ones (only changes are uploaded), this is a good idea.

github.com

duplicati/duplicati/blob/082a14fc22efdef226c1bee956f85009f7104fb3/Duplicati/Library/Main/Database/LocalDatabase.cs#L832-L838


if (verifyfilelists)

{

    using (var cmd2 = m_connection.CreateCommand(transaction))

        foreach (var filesetid in cmd.ExecuteReaderEnumerable(@"SELECT ""ID"" FROM ""Fileset"" ").Select(x => x.ConvertValueToInt64(0, -1)))

        {

            var expandedCmd = string.Format(@"SELECT COUNT(*) FROM (SELECT DISTINCT ""Path"" FROM ({0}) UNION SELECT DISTINCT ""Path"" FROM ({1}))", LocalDatabase.LIST_FILESETS, LocalDatabase.LIST_FOLDERS_AND_SYMLINKS);

            var expandedlist = cmd2.ExecuteScalarInt64(expandedCmd, 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID, filesetid);

If it’s VerifyConsistency, you would see that in the log before and after it goes through all the backups.

might avoid the issue. I don’t know why it’s not in the manual. Possibly was just never noticed. Code:

Added an option to disable filelist consistency checks as it was reported to slow backups with a large number of filesets.

There were new options added to 2.0.4.5 to thin out backup versions as they age. That will speed up.

New retention policy deletes old backups in a smart way