Query during backup takes too long, does not finish

Joel_Harris · November 18, 2021, 2:16pm

I have a large backup, over 300 gigs. The initial backup succeeded. Subsequent backups never complete. They get stuck on this query:

ExecuteScalarInt64: SELECT COUNT(*) FROM (SELECT DISTINCT "Path" FROM ( SELECT "L"."Path", "L"."Lastmodified", "L"."Filelength", "L"."Filehash", "L"."Metahash", "L"."Metalength", "L"."BlocklistHash", "L"."FirstBlockHash", "L"."FirstBlockSize", "L"."FirstMetaBlockHash", "L"."FirstMetaBlockSize", "M"."Hash" AS "MetaBlocklistHash" FROM ( SELECT "J"."Path", "J"."Lastmodified", "J"."Filelength", "J"."Filehash", "J"."Metahash", "J"."Metalength", "K"."Hash" AS "BlocklistHash", "J"."FirstBlockHash", "J"."FirstBlockSize", "J"."FirstMetaBlockHash", "J"."FirstMetaBlockSize", "J"."MetablocksetID" FROM ( SELECT "A"."Path" AS "Path", "D"."Lastmodified" AS "Lastmodified", "B"."Length" AS "Filelength", "B"."FullHash" AS "Filehash", "E"."FullHash" AS "Metahash", "E"."Length" AS "Metalength", "A"."BlocksetID" AS "BlocksetID", "F"."Hash" AS "FirstBlockHash", "F"."Size" AS "FirstBlockSize", "H"."Hash" AS "FirstMetaBlockHash", "H"."Size" AS "FirstMetaBlockSize", "C"."BlocksetID" AS "MetablocksetID" FROM "File" A LEFT JOIN "Blockset" B ON "A"."BlocksetID" = "B"."ID" LEFT JOIN "Metadataset" C ON "A"."MetadataID" = "C"."ID" LEFT JOIN "FilesetEntry" D ON "A"."ID" = "D"."FileID" LEFT JOIN "Blockset" E ON "E"."ID" = "C"."BlocksetID" LEFT JOIN "BlocksetEntry" G ON "B"."ID" = "G"."BlocksetID" LEFT JOIN "Block" F ON "G"."BlockID" = "F"."ID" LEFT JOIN "BlocksetEntry" I ON "E"."ID" = "I"."BlocksetID" LEFT JOIN "Block" H ON "I"."BlockID" = "H"."ID" WHERE "A"."BlocksetId" >= 0 AND "D"."FilesetID" = 2 AND ("I"."Index" = 0 OR "I"."Index" IS NULL) AND ("G"."Index" = 0 OR "G"."Index" IS NULL) ) J LEFT OUTER JOIN "BlocklistHash" K ON "K"."BlocksetID" = "J"."BlocksetID" ORDER BY "J"."Path", "K"."Index" ) L LEFT OUTER JOIN "BlocklistHash" M ON "M"."BlocksetID" = "L"."MetablocksetID" ) UNION SELECT DISTINCT "Path" FROM ( SELECT "G"."BlocksetID", "G"."ID", "G"."Path", "G"."Length", "G"."FullHash", "G"."Lastmodified", "G"."FirstMetaBlockHash", "H"."Hash" AS "MetablocklistHash" FROM ( SELECT "B"."BlocksetID", "B"."ID", "B"."Path", "D"."Length", "D"."FullHash", "A"."Lastmodified", "F"."Hash" AS "FirstMetaBlockHash", "C"."BlocksetID" AS "MetaBlocksetID" FROM "FilesetEntry" A, "File" B, "Metadataset" C, "Blockset" D, "BlocksetEntry" E, "Block" F WHERE "A"."FileID" = "B"."ID" AND "B"."MetadataID" = "C"."ID" AND "C"."BlocksetID" = "D"."ID" AND "E"."BlocksetID" = "C"."BlocksetID" AND "E"."BlockID" = "F"."ID" AND "E"."Index" = 0 AND ("B"."BlocksetID" = -100 OR "B"."BlocksetID" = -200) AND "A"."FilesetID" = 2 ) G LEFT OUTER JOIN "BlocklistHash" H ON "H"."BlocksetID" = "G"."MetaBlocksetID" ORDER BY "G"."Path", "H"."Index" ))

The previous query using FilesetID = 1 completes (after 35 minutes), but this query with FilesetID=2 will not finish. I left it running while I went on vacation for 10 days and it did not complete.

I am using 2.0.6.1_beta_2021-05-03

drwtsn32 · November 22, 2021, 5:03pm

Hello and welcome!

What kind of system are you running Duplicati on? I’m curious if it’s a really low powered machine, like a Raspberry Pi or something.

Joel_Harris · November 22, 2021, 5:19pm

It’s an older computer but nothing so low-powered as a pi.

Operating System: Windows Server 2008 R2 Standard 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_ldr.170427-1518)
Processor: Intel(R) Pentium(R) CPU G3258 @ 3.20GHz (2 CPUs), ~3.2GHz
Memory: 8192MB RAM

drwtsn32 · November 22, 2021, 5:32pm

Ok, that should be fine. How many files are in this 300GB of data?

Joel_Harris · November 22, 2021, 6:12pm

There are about 100,000 files ranging in size from bytes to (approximately 2 to 4) gigs

drwtsn32 · November 22, 2021, 6:28pm

Ok, that’s not a big deal either. Kind of grasping at straws at this point, but what .NET 4.x version do you have installed?

Joel_Harris · November 22, 2021, 6:45pm

PS C:> Get-ChildItem 'HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP' -Recurse | Get-ItemProperty -N
ame version -EA 0 | Where { $_.PSChildName -Match '^(?!S)\p{L}'} | Select PSChildName, version

PSChildName                                                 Version
-----------                                                 -------
v2.0.50727                                                  2.0.50727.5420
v3.0                                                        3.0.30729.5420
Windows Communication Foundation                            3.0.4506.5420
Windows Presentation Foundation                             3.0.6920.5011
v3.5                                                        3.5.30729.5420
Client                                                      4.7.03062
Full                                                        4.7.03062
Client                                                      4.0.0.0

drwtsn32 · November 22, 2021, 7:01pm

Ok, you’re good there… Duplicati requires 4.7 or higher. I believe 4.7 is the latest version available on 2008 R2 / Win7.

Maybe one of the other forum volunteers more familiar with slow/hung SQL queries can offer some advice.

Joel_Harris · December 2, 2021, 3:42pm

I’ve opened a bug for this.

ts678 · December 2, 2021, 10:47pm

To confirm what issue and topic here suggest, was this just a second run without any known large changes?

Database should be similar unless things were changing. If this is still running, could you see whether it does lots of I/O Read Bytes in Sysinternals Process Explorer? You might have to turn that column on to see that.

If it’s already killed, could you post a link to a database bug report (which unfortunately might rollback tables)? Duplicati recently got an offer of help from an SQL expert (I am not one) but a database with issue is needed.

Joel_Harris · December 3, 2021, 5:45am

Yes, this was just a second run without any known large changes.
SysInternals ProcExp shows I/O Read Bytes at 23785.9 GB. That is not a typo.
I haven’t killed it yet. It is still running while I hope for a resolution. Let me know if you want me to kill it to create a bug report.

ts678 · December 3, 2021, 1:52pm

AFAIK there’s no way to force it to finish. The main reason to keep it up is to look at it more (see below) but the main drawback of it being up is I think you won’t even be able to copy the database to preserve its data. You can try, using whatever tools you can get. Also see if you can copy any file with similar name and with different suffix, such as -journal. If you can’t copy, maybe you can at least get the timestamp from them. That would be a quick way to see if you’re making any changes, although the slow query seems like reads.

I’m going to ask if @rsevero has any ideas on what to do while it’s up. After it’s down, we can try to grab a database bug report to look at, but it might take some work to get one that actually shows the slowness. Fortunately, knowing the query can sometimes find it in source, and I think this one might be requested on demand, or avoidable for testing purposes so that second backup can finish so a bug report can be made.

I have had a query running for about 17 days on a DB bug report from here that’s reading about 1 GB every five seconds (which is far above the drive’s abilities, but it acts like Windows is supplying data from cache). Currently at 218953 GB which is a ballpark match for calculated value based on duration and rate estimate.

Watching the process (which is actually DB Browser for SQLite because this is a manual test of the query) using Sysinternals Process Monitor shows it doing a huge read rate from the DB. If I filter out the ReadFile operations there’s nothing else going on to the drive, which rules against the thought of temporary file write. Storing small results in memory is hard to rule out, but the commit size isn’t exploding. Others were larger.

Reads are always 4 KB which is the SQLite default page size. The access pattern is sequential with gaps. I’ve been looking to see if there’s any way to tell exactly what it’s doing, but I’m not sure SQLite offers any…

Process Explorer thread view shows the obvious (for DB Browser). The busy thread has SQLite on stack. You could check yours to see if it has a similar view, although that’s suggested by your non-finishing query.

Although I couldn’t find any other reports of this query not finishing (despite knowing multi-TB backups and multi-million file backups sometimes work, according to forum reports), this query did show erratic speed:

Time to complete backup job has trippled in the last week
with a quick me-too (and some experiments with trying to see if a page size boost could speed things) at:
Backups suddenly taking much longer to complete

There were a few other reports, which one can find with a query on parts of the query (I tried with Google).

One strange thing is I think this code (which I think is here) was probably run at the end of the first backup. Second backup would run it at start, then add FilesetID=2 as its work, then get stuck at end (right?). Why?

I’m not an SQL (or SQLite) expert, but The SQLite Query Optimizer Overview explains how the method of getting to the answer can be hugely varied, and SQLite tries to pick a good one. Maybe sometimes it fails?

There are apparently combinations of data and query that can loop, but I don’t know if this can do it. Query:

SELECT COUNT(*)
FROM (
	SELECT DISTINCT "Path"
	FROM (
		SELECT "L"."Path"
			,"L"."Lastmodified"
			,"L"."Filelength"
			,"L"."Filehash"
			,"L"."Metahash"
			,"L"."Metalength"
			,"L"."BlocklistHash"
			,"L"."FirstBlockHash"
			,"L"."FirstBlockSize"
			,"L"."FirstMetaBlockHash"
			,"L"."FirstMetaBlockSize"
			,"M"."Hash" AS "MetaBlocklistHash"
		FROM (
			SELECT "J"."Path"
				,"J"."Lastmodified"
				,"J"."Filelength"
				,"J"."Filehash"
				,"J"."Metahash"
				,"J"."Metalength"
				,"K"."Hash" AS "BlocklistHash"
				,"J"."FirstBlockHash"
				,"J"."FirstBlockSize"
				,"J"."FirstMetaBlockHash"
				,"J"."FirstMetaBlockSize"
				,"J"."MetablocksetID"
			FROM (
				SELECT "A"."Path" AS "Path"
					,"D"."Lastmodified" AS "Lastmodified"
					,"B"."Length" AS "Filelength"
					,"B"."FullHash" AS "Filehash"
					,"E"."FullHash" AS "Metahash"
					,"E"."Length" AS "Metalength"
					,"A"."BlocksetID" AS "BlocksetID"
					,"F"."Hash" AS "FirstBlockHash"
					,"F"."Size" AS "FirstBlockSize"
					,"H"."Hash" AS "FirstMetaBlockHash"
					,"H"."Size" AS "FirstMetaBlockSize"
					,"C"."BlocksetID" AS "MetablocksetID"
				FROM "File" A
				LEFT JOIN "Blockset" B ON "A"."BlocksetID" = "B"."ID"
				LEFT JOIN "Metadataset" C ON "A"."MetadataID" = "C"."ID"
				LEFT JOIN "FilesetEntry" D ON "A"."ID" = "D"."FileID"
				LEFT JOIN "Blockset" E ON "E"."ID" = "C"."BlocksetID"
				LEFT JOIN "BlocksetEntry" G ON "B"."ID" = "G"."BlocksetID"
				LEFT JOIN "Block" F ON "G"."BlockID" = "F"."ID"
				LEFT JOIN "BlocksetEntry" I ON "E"."ID" = "I"."BlocksetID"
				LEFT JOIN "Block" H ON "I"."BlockID" = "H"."ID"
				WHERE "A"."BlocksetId" >= 0
					AND "D"."FilesetID" = 2
					AND (
						"I"."Index" = 0
						OR "I"."Index" IS NULL
						)
					AND (
						"G"."Index" = 0
						OR "G"."Index" IS NULL
						)
				) J
			LEFT OUTER JOIN "BlocklistHash" K ON "K"."BlocksetID" = "J"."BlocksetID"
			ORDER BY "J"."Path"
				,"K"."Index"
			) L
		LEFT OUTER JOIN "BlocklistHash" M ON "M"."BlocksetID" = "L"."MetablocksetID"
		)
	
	UNION
	
	SELECT DISTINCT "Path"
	FROM (
		SELECT "G"."BlocksetID"
			,"G"."ID"
			,"G"."Path"
			,"G"."Length"
			,"G"."FullHash"
			,"G"."Lastmodified"
			,"G"."FirstMetaBlockHash"
			,"H"."Hash" AS "MetablocklistHash"
		FROM (
			SELECT "B"."BlocksetID"
				,"B"."ID"
				,"B"."Path"
				,"D"."Length"
				,"D"."FullHash"
				,"A"."Lastmodified"
				,"F"."Hash" AS "FirstMetaBlockHash"
				,"C"."BlocksetID" AS "MetaBlocksetID"
			FROM "FilesetEntry" A
				,"File" B
				,"Metadataset" C
				,"Blockset" D
				,"BlocksetEntry" E
				,"Block" F
			WHERE "A"."FileID" = "B"."ID"
				AND "B"."MetadataID" = "C"."ID"
				AND "C"."BlocksetID" = "D"."ID"
				AND "E"."BlocksetID" = "C"."BlocksetID"
				AND "E"."BlockID" = "F"."ID"
				AND "E"."Index" = 0
				AND (
					"B"."BlocksetID" = - 100
					OR "B"."BlocksetID" = - 200
					)
				AND "A"."FilesetID" = 2
			) G
		LEFT OUTER JOIN "BlocklistHash" H ON "H"."BlocksetID" = "G"."MetaBlocksetID"
		ORDER BY "G"."Path"
			,"H"."Index"
		)
	)

Joel_Harris · December 3, 2021, 2:04pm

The timestamp of the database is not changing, so the db isn’t being written to.
If I kill the duplicati process and restart it, the backup job will run to the same point. I’ve done it before, so whatever is happening is replicable.

ts678 · December 3, 2021, 6:10pm

Replicable is good. Maybe wait a bit to see if an SQL expert arrives. Duplicati project much needs one. Meanwhile if you’re willing you can see if performance tool info looks like my ones from the other issue.

I’m less sure now that my Process Monitor watch-for-writes was working correctly. At the moment, I tell Process Monitor to filter the reads and it seems to do so from its own stats, but memory size increases. Stopping it briefly lets it catch up, according to Task Manager Performance tab and Details Commit size.
Running only to peek at reads may be safer than leaving the tool alone to potential exhaust all memory., and you’ve verified from the timestamp that it’s not writing the DB (but not ruled other activities on files).

Meaning it acts like it discarded ID 1, made ID 2 again, then slowed? Such a rollback is what I feared.
Alternatively does it slow before really going into backup? The difference between the two checks are

github.com

duplicati/duplicati/blob/a4a02ff884d59a964a0a9d9cdc3945db0560c8e6/Duplicati/Library/Main/Operation/BackupHandler.cs#L445-L446


      
          // Make sure the database is sane

          await db.VerifyConsistencyAsync(m_options.Blocksize, m_options.BlockhashSize, !m_options.DisableFilelistConsistencyChecks);

github.com

duplicati/duplicati/blob/a4a02ff884d59a964a0a9d9cdc3945db0560c8e6/Duplicati/Library/Main/Operation/BackupHandler.cs#L524-L526


      
          // Ensure the database is in a sane state after adding data

          using (new Logging.Timer(LOGTAG, "VerifyConsistency", "VerifyConsistency"))

              await db.VerifyConsistencyAsync(m_options.Blocksize, m_options.BlockhashSize, false);