Performance: 30 hours "starting backup" every time since 2.0.5.1 upgrade

conrad · March 5, 2020, 2:49pm

Since upgrading from 2.0.3.3 to 2.0.5.1, backup runs have extended from 7 hours each to 30+ hours (it’s on its fourth run now). The vast majority of this time is spent “Starting backup” — I haven’t yet caught the progress bar showing the backup in progress! Looking at the live profiling log, it is executing a series of queries like the below. Seeing that FileSetID and the average query runtime of 3.5 minutes, I’m guessing that the fileset number might refer to each individual historic backup run: there are 470 of them, and 470*3.5 minutes = 27 hours, which would account for the slowdown versus historic performance. Any idea what this is about? Is it avoidable?

Database has been integrity_checked, vacuumed and analyzed.

5 Mar 2020 14:40: ExecuteScalarInt64: SELECT COUNT(*) FROM (SELECT DISTINCT "Path" FROM ( SELECT "L"."Path", "L"."Lastmodified", "L"."Filelength", "L"."Filehash", "L"."Metahash", "L"."Metalength", "L"."BlocklistHash", "L"."FirstBlockHash", "L"."FirstBlockSize", "L"."FirstMetaBlockHash", "L"."FirstMetaBlockSize", "M"."Hash" AS "MetaBlocklistHash" FROM ( SELECT "J"."Path", "J"."Lastmodified", "J"."Filelength", "J"."Filehash", "J"."Metahash", "J"."Metalength", "K"."Hash" AS "BlocklistHash", "J"."FirstBlockHash", "J"."FirstBlockSize", "J"."FirstMetaBlockHash", "J"."FirstMetaBlockSize", "J"."MetablocksetID" FROM ( SELECT "A"."Path" AS "Path", "D"."Lastmodified" AS "Lastmodified", "B"."Length" AS "Filelength", "B"."FullHash" AS "Filehash", "E"."FullHash" AS "Metahash", "E"."Length" AS "Metalength", "A"."BlocksetID" AS "BlocksetID", "F"."Hash" AS "FirstBlockHash", "F"."Size" AS "FirstBlockSize", "H"."Hash" AS "FirstMetaBlockHash", "H"."Size" AS "FirstMetaBlockSize", "C"."BlocksetID" AS "MetablocksetID" FROM "File" A LEFT JOIN "Blockset" B ON "A"."BlocksetID" = "B"."ID" LEFT JOIN "Metadataset" C ON "A"."MetadataID" = "C"."ID" LEFT JOIN "FilesetEntry" D ON "A"."ID" = "D"."FileID" LEFT JOIN "Blockset" E ON "E"."ID" = "C"."BlocksetID" LEFT JOIN "BlocksetEntry" G ON "B"."ID" = "G"."BlocksetID" LEFT JOIN "Block" F ON "G"."BlockID" = "F"."ID" LEFT JOIN "BlocksetEntry" I ON "E"."ID" = "I"."BlocksetID" LEFT JOIN "Block" H ON "I"."BlockID" = "H"."ID" WHERE "A"."BlocksetId" >= 0 AND "D"."FilesetID" = 307 AND ("I"."Index" = 0 OR "I"."Index" IS NULL) AND ("G"."Index" = 0 OR "G"."Index" IS NULL) ) J LEFT OUTER JOIN "BlocklistHash" K ON "K"."BlocksetID" = "J"."BlocksetID" ORDER BY "J"."Path", "K"."Index" ) L LEFT OUTER JOIN "BlocklistHash" M ON "M"."BlocksetID" = "L"."MetablocksetID" ) UNION SELECT DISTINCT "Path" FROM ( SELECT "G"."BlocksetID", "G"."ID", "G"."Path", "G"."Length", "G"."FullHash", "G"."Lastmodified", "G"."FirstMetaBlockHash", "H"."Hash" AS "MetablocklistHash" FROM ( SELECT "B"."BlocksetID", "B"."ID", "B"."Path", "D"."Length", "D"."FullHash", "A"."Lastmodified", "F"."Hash" AS "FirstMetaBlockHash", "C"."BlocksetID" AS "MetaBlocksetID" FROM "FilesetEntry" A, "File" B, "Metadataset" C, "Blockset" D, "BlocksetEntry" E, "Block" F WHERE "A"."FileID" = "B"."ID" AND "B"."MetadataID" = "C"."ID" AND "C"."BlocksetID" = "D"."ID" AND "E"."BlocksetID" = "C"."BlocksetID" AND "E"."BlockID" = "F"."ID" AND "E"."Index" = 0 AND ("B"."BlocksetID" = -100 OR "B"."BlocksetID" = -200) AND "A"."FilesetID" = 307 ) G LEFT OUTER JOIN "BlocklistHash" H ON "H"."BlocksetID" = "G"."MetaBlocksetID" ORDER BY "G"."Path", "H"."Index" )) took 0:00:03:29.786

(Running on Synology DSM 6.2.2-24922 Update 4 and Mono 5.18.0.240-12)

ts678 · March 5, 2020, 6:32pm

Pretty-printing that line in poorsql.com makes it a little more readable.

SELECT COUNT(*)
FROM (
	SELECT DISTINCT "Path"
	FROM (
		SELECT "L"."Path"
			,"L"."Lastmodified"
			,"L"."Filelength"
			,"L"."Filehash"
			,"L"."Metahash"
			,"L"."Metalength"
			,"L"."BlocklistHash"
			,"L"."FirstBlockHash"
			,"L"."FirstBlockSize"
			,"L"."FirstMetaBlockHash"
			,"L"."FirstMetaBlockSize"
			,"M"."Hash" AS "MetaBlocklistHash"
		FROM (
			SELECT "J"."Path"
				,"J"."Lastmodified"
				,"J"."Filelength"
				,"J"."Filehash"
				,"J"."Metahash"
				,"J"."Metalength"
				,"K"."Hash" AS "BlocklistHash"
				,"J"."FirstBlockHash"
				,"J"."FirstBlockSize"
				,"J"."FirstMetaBlockHash"
				,"J"."FirstMetaBlockSize"
				,"J"."MetablocksetID"
			FROM (
				SELECT "A"."Path" AS "Path"
					,"D"."Lastmodified" AS "Lastmodified"
					,"B"."Length" AS "Filelength"
					,"B"."FullHash" AS "Filehash"
					,"E"."FullHash" AS "Metahash"
					,"E"."Length" AS "Metalength"
					,"A"."BlocksetID" AS "BlocksetID"
					,"F"."Hash" AS "FirstBlockHash"
					,"F"."Size" AS "FirstBlockSize"
					,"H"."Hash" AS "FirstMetaBlockHash"
					,"H"."Size" AS "FirstMetaBlockSize"
					,"C"."BlocksetID" AS "MetablocksetID"
				FROM "File" A
				LEFT JOIN "Blockset" B ON "A"."BlocksetID" = "B"."ID"
				LEFT JOIN "Metadataset" C ON "A"."MetadataID" = "C"."ID"
				LEFT JOIN "FilesetEntry" D ON "A"."ID" = "D"."FileID"
				LEFT JOIN "Blockset" E ON "E"."ID" = "C"."BlocksetID"
				LEFT JOIN "BlocksetEntry" G ON "B"."ID" = "G"."BlocksetID"
				LEFT JOIN "Block" F ON "G"."BlockID" = "F"."ID"
				LEFT JOIN "BlocksetEntry" I ON "E"."ID" = "I"."BlocksetID"
				LEFT JOIN "Block" H ON "I"."BlockID" = "H"."ID"
				WHERE "A"."BlocksetId" >= 0
					AND "D"."FilesetID" = 307
					AND (
						"I"."Index" = 0
						OR "I"."Index" IS NULL
						)
					AND (
						"G"."Index" = 0
						OR "G"."Index" IS NULL
						)
				) J
			LEFT OUTER JOIN "BlocklistHash" K ON "K"."BlocksetID" = "J"."BlocksetID"
			ORDER BY "J"."Path"
				,"K"."Index"
			) L
		LEFT OUTER JOIN "BlocklistHash" M ON "M"."BlocksetID" = "L"."MetablocksetID"
		)
	
	UNION
	
	SELECT DISTINCT "Path"
	FROM (
		SELECT "G"."BlocksetID"
			,"G"."ID"
			,"G"."Path"
			,"G"."Length"
			,"G"."FullHash"
			,"G"."Lastmodified"
			,"G"."FirstMetaBlockHash"
			,"H"."Hash" AS "MetablocklistHash"
		FROM (
			SELECT "B"."BlocksetID"
				,"B"."ID"
				,"B"."Path"
				,"D"."Length"
				,"D"."FullHash"
				,"A"."Lastmodified"
				,"F"."Hash" AS "FirstMetaBlockHash"
				,"C"."BlocksetID" AS "MetaBlocksetID"
			FROM "FilesetEntry" A
				,"File" B
				,"Metadataset" C
				,"Blockset" D
				,"BlocksetEntry" E
				,"Block" F
			WHERE "A"."FileID" = "B"."ID"
				AND "B"."MetadataID" = "C"."ID"
				AND "C"."BlocksetID" = "D"."ID"
				AND "E"."BlocksetID" = "C"."BlocksetID"
				AND "E"."BlockID" = "F"."ID"
				AND "E"."Index" = 0
				AND (
					"B"."BlocksetID" = - 100
					OR "B"."BlocksetID" = - 200
					)
				AND "A"."FilesetID" = 307
			) G
		LEFT OUTER JOIN "BlocklistHash" H ON "H"."BlocksetID" = "G"."MetaBlocksetID"
		ORDER BY "G"."Path"
			,"H"."Index"
		)
	)

and might be the below in VerifyConsistency() which checks the current backups before adding next. Because every backup builds on the previous ones (only changes are uploaded), this is a good idea.

github.com

duplicati/duplicati/blob/082a14fc22efdef226c1bee956f85009f7104fb3/Duplicati/Library/Main/Database/LocalDatabase.cs#L832-L838


if (verifyfilelists)

{

    using (var cmd2 = m_connection.CreateCommand(transaction))

        foreach (var filesetid in cmd.ExecuteReaderEnumerable(@"SELECT ""ID"" FROM ""Fileset"" ").Select(x => x.ConvertValueToInt64(0, -1)))

        {

            var expandedCmd = string.Format(@"SELECT COUNT(*) FROM (SELECT DISTINCT ""Path"" FROM ({0}) UNION SELECT DISTINCT ""Path"" FROM ({1}))", LocalDatabase.LIST_FILESETS, LocalDatabase.LIST_FOLDERS_AND_SYMLINKS);

            var expandedlist = cmd2.ExecuteScalarInt64(expandedCmd, 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID, filesetid);

If it’s VerifyConsistency, you would see that in the log before and after it goes through all the backups.

might avoid the issue. I don’t know why it’s not in the manual. Possibly was just never noticed. Code:

Added an option to disable filelist consistency checks as it was reported to slow backups with a large number of filesets.

There were new options added to 2.0.4.5 to thin out backup versions as they age. That will speed up.

New retention policy deletes old backups in a smart way

conrad · March 6, 2020, 12:00pm

Thanks very much for this (and for the poorsql pointer: I had a look on my box for an SQL prettyprinter, but no joy).

That does indeed appear to be the reason for the slow start: adding disable-filelist-consistency-checks has resulted in a backup starting to show progress only half an hour after after “Starting”.

However, I’ve tried searching for “check”, “filelist” and “consistency” in the manual and can’t seem to find anything about how to “run regular check commands”. Is this a question of manually clicking “Verify files” in the web GUI, or running “Repair” on the database?

ts678 · March 6, 2020, 5:52pm

Verify files seems to do it. I think Database Repair only checks if it has to change certain things.

conrad · April 8, 2020, 2:55pm

So, a final question; after this I’ll try writing a little performance FAQ in case that’s useful to other people, since I seem to have run into (and largely solved, through this forum’s help) quite a wide variety of performance issues:

Having disabled file list consistency checks I’m still getting a distinctly bimodal distribution on backups: two thirds of them take two hours, and one third take twelve hours. Is there another not-always-necessary process that would trigger a file list consistency check or something similar? The log emails unfortunately don’t show all the messages from each run, so I can’t tell for certain but I have a vague suspicion that the slow runs might involve compaction — could this be the cause?

drwtsn32 · April 8, 2020, 5:31pm

Maybe try setting --no-auto-compact as a test. I seem to recall that with large sqlite databases, this process can take a while (determining which volumes should be compacted).

conrad · April 8, 2020, 5:55pm

Thanks. Will give that a few days & then write up results!

ts678 · April 8, 2020, 6:04pm

Viewing the log files of a backup job would be a way to see, Look at source files change amount too. Under Complete log, there are also "BackendStatistics" where you might spot some variations.

Something is slower sometimes, but without info (not even status bar display), it’s hard to say where.
More logs are at. Viewing the Duplicati Server Logs, e.g. About → Show log → Live → Information.

conrad · April 8, 2020, 7:55pm

Hah! The logs seem to be waayy more informative now than what I was used to (recent upgrade to 2.0.5, used to 2.0.3). This exactly confirms @drwtsn32’s advice. Thanks @ts678 — again!

ts678 · April 8, 2020, 8:19pm

Credit to @mikaelmello for the new log page look. The old page had info, but it was hard to follow…

The COMPACT command respects some settings that also affect the possible compact after backup. Basically, I think you can compact more often to increase time consistency, or just live with variability.

On disabling checking, that’s a bit dangerous long-term. I forget if we’ve talked about this, but letting versions build up forever (unless that’s what you want) can slow things. The screen 5 Options has a Backup retention setting for either fixed settings, or a progressive thinning as versions grow older.

Choosing sizes in Duplicati can also matter for large backups, otherwise tracking the files as 100 KB deduplication blocks can eventually slow the SQL queries (the one in your post took some minutes). Unfortunately you can’t change –blocksize without starting the backup over again, losing old history.

conrad · April 22, 2020, 3:14pm

Yes, we talked about checking. I’ve made it a monthly event that I trigger manually.

So I’m a little confused about the compaction stuff though. I’ve used the web UI to set auto-compact-interval to 1 week, but there doesn’t seem to have been a compaction now in two weeks since I set that. Previously I’d see a compaction once in every three daily backups. Having set the interval to 1 week would I now expect in practice to see it only once every three weeks? If so, then doesn’t that make the heuristic a bit weird? Really if there’s usually reason to run a compaction every three days you’d anticipate it to be almost guaranteed necessary within two weeks, no? I haven’t got --no-auto-vacuum by the way!

When I export the backup as a command line, I get these switches (plus a bunch of email related ones, path specifications, etc.):

--disable-filelist-consistency-checks=true
--auto-vacuum=true
--auto-vacuum-interval=1M
--auto-compact-interval=1W

ts678 · April 22, 2020, 4:43pm

  --auto-compact-interval (Timespan): Minimum time between auto compactions
    The minimum amount of time that must elapse after the last compaction
    before another will be automatically triggered at the end of a backup
    job. Automatic compaction can be a long-running process and may not be
    desirable to run after every single backup.
    * default value: 0m

This is a new option to avoid compacting too often, and it’s possible there’s a problem with it.
I don’t use it, but if you normally compact every three days, I’d have thought you’d get weekly.
Does removing --auto-compact-interval=1W revert back to compacting every three days?

You can get detailed information on the compacting decision if you look at the logs sufficiently.

2020-04-19 10:41:30 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2020-04-19 11:41:00 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2020-04-19 12:41:24 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there are 1 fully deletable volume(s)
2020-04-19 12:41:26 -04 - [Information-Duplicati.Library.Main.Operation.CompactHandler-CompactResults]: Deleted 2 files, which reduced storage by 23.09 MB
2020-04-19 13:41:31 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there are 1 fully deletable volume(s)
2020-04-19 13:41:33 -04 - [Information-Duplicati.Library.Main.Operation.CompactHandler-CompactResults]: Deleted 2 files, which reduced storage by 37.56 MB
2020-04-19 14:41:19 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2020-04-19 15:41:25 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required
2020-04-19 16:41:30 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there are 1 fully deletable volume(s)
2020-04-19 16:41:31 -04 - [Information-Duplicati.Library.Main.Operation.CompactHandler-CompactResults]: Deleted 2 files, which reduced storage by 36.48 MB
2020-04-19 17:41:32 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting not required

–log-file and –log-file-log-level=Information can get that. Profiling can get more details, but log is huge. Verbose is sort of midway. You can also run –log-file-log-filter to focus on certain messages of interest.

2020-03-27 20:41:48 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-FullyDeletableCount]: Found 0 fully deletable volume(s)
2020-03-27 20:41:48 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-SmallVolumeCount]: Found 0 small volumes(s) with a total size of 0 bytes
2020-03-27 20:41:48 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-WastedSpaceVolumes]: Found 11 volume(s) with a total of 27.31% wasted space (418.07 MB of 1.49 GB)
2020-03-27 20:41:48 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there is 27.31% wasted space and the limit is 25%

2020-03-29 17:41:19 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-FullyDeletableCount]: Found 1 fully deletable volume(s)
2020-03-29 17:41:19 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-SmallVolumeCount]: Found 0 small volumes(s) with a total size of 0 bytes
2020-03-29 17:41:19 -04 - [Verbose-Duplicati.Library.Main.Database.LocalDeleteDatabase-WastedSpaceVolumes]: Found 14 volume(s) with a total of 29.16% wasted space (509.61 MB of 1.71 GB)
2020-03-29 17:41:19 -04 - [Information-Duplicati.Library.Main.Database.LocalDeleteDatabase-CompactReason]: Compacting because there are 1 fully deletable volume(s)

drwtsn32 · April 23, 2020, 12:53am

Think of --auto-compact-interval as prohibiting Duplicati from evaluating and/or performing an automatic compaction. Setting it to 1W doesn’t mean it will definitely compact every 1 week. Instead it’s the minimum amount of time between compactions. 1 week after Duplicati has done its last compaction, it will begin to evaluate if another compaction is warranted (at the end of the next backup job). If it decides it’s not needed, then no compaction will occur.

On the other hand, the --auto-vacuum-interval option will work more as you may expect. According to your setting the vacuum will happen once per month. That’s because Duplicati doesn’t do any other checks to see if a vacuum operation is warranted - it will just do it.

conrad · April 23, 2020, 9:29am

Understood, but this is still surprising given previous once-every-three-daily-backups behaviour. In fact what would happen was that every three (ish) backups it would spend eight hours deciding whether to compact, and then compact or not. If not, it’d say “Compaction not required” or something like that. In my current logs, the only mention of compaction is a "CompactResults": null in every backup. I’ll try removing the flag, see what happens, and then set it two 2 days to see what that does.

drwtsn32 · April 23, 2020, 2:33pm

Try watching the Live Log. Open a Duplicati Web UI tab and click About -> Show log -> Live -> and set the dropdown to Verbose.

Open a second Duplicati Web UI tab and start a backup job, then quickly switch back to the Live Log tab.

At the beginning of the job, you should see a skipping auto compaction until <timestamp> message, unless of course you are no longer inside that interval window:

If you are no longer inside the interval window, then at the end of the job Duplicati will perform an evaluation to see if compaction is needed. You will see a message like Compacting not required if none is needed, or if Duplicati does think it’s needed, you’ll see Compacting because <reason>. In either case, the interval is reset so the evaluation won’t happen again for 1W (your setting).

CompactResults: null in your log means that you are within the interval window and no evaluation or actual compaction occurred. If you are outside of the interval window, then at a minimum an evaluation would have occurred and CompactResults would not be null (even if Duplicati decided no compaction was needed).

Use the interval end timestamp you discover in your Live Log, subtract your interval setting (1W), and then look closely at that older job’s complete log. You should be able to see either the Compacting not required message or Compacting because <reason>. Here’s one from my system: