Initial backup stuck at 100%

Hello, I just attempted my first full backup of about 200GB using the GUI. I’m new to Duplicati so apologies if this is covered elsewhere. It appears to have successfully uploaded all of the data. The progress bar says 0 files (0 bytes) to go and progress is 100%.
Screen Shot 2020-09-18 at 11.40.45 AM
Screen Shot 2020-09-18 at 11.40.01 AM

I previously tried smaller test runs with a few MB of files and did not see this issue.

I’ve seen a few similar threads in the forum:
Slow Duplicati Bakcup on Mac OS
Slow backups on macOS

Those seem to be somewhat different as they discuss slowness during the initial compare stage or during data transfer whereas the problem I’m seeing is at end of the backup.

The profiling log shows:

Sep 18, 2020 4:59 AM: Uploaded 49.92 MB in 00:04:41.4400100, 181.64 KB/s
Sep 18, 2020 4:58 AM: Uploaded 49.91 MB in 00:02:53.3169290, 294.88 KB/s
Sep 18, 2020 4:58 AM: Uploaded 49.91 MB in 00:05:31.5733560, 154.13 KB/s
Sep 18, 2020 4:57 AM: Uploaded 15.89 MB in 00:01:21.6477720, 199.35 KB/s
Sep 18, 2020 4:56 AM: Starting - ExecuteReader: SELECT "CalcLen", "Length", "A"."BlocksetID", "File"."Path" FROM ( SELECT "A"."ID" AS "BlocksetID", IFNULL("B"."CalcLen", 0) AS "CalcLen", "A"."Length" FROM "Blockset" A LEFT OUTER JOIN ( SELECT "BlocksetEntry"."BlocksetID", SUM("Block"."Size") AS "CalcLen" FROM "BlocksetEntry" LEFT OUTER JOIN "Block" ON "Block"."ID" = "BlocksetEntry"."BlockID" GROUP BY "BlocksetEntry"."BlocksetID" ) B ON "A"."ID" = "B"."BlocksetID" ) A, "File" WHERE "A"."BlocksetID" = "File"."BlocksetID" AND "A"."CalcLen" != "A"."Length"
Sep 18, 2020 4:56 AM: Starting - VerifyConsistency
Sep 18, 2020 4:56 AM: BackupMainOperation took 3:15:02:08.902

There has been nothing logged in 7 hours. Duplicati is currently using 100% of CPU with little else running. Is there a way to tell if this is an SQLite issue as suggested in the other threads? It looks like the query has not completed?

Is it safe to stop the backup and restart? Will I need to reupload everything?

Think you’re “Stuck” on “Verifying backend data”? appears to be another similar thread. The UI issue reported there is not exactly the same but seems similar.

I’m on version 2.0.5.1_beta_2020-01-18

I tried executing the query from the command line and they do seem to be quite slow.

SELECT "BlocksetEntry"."BlocksetID", SUM("Block"."Size") AS "CalcLen"
FROM "BlocksetEntry" LEFT OUTER JOIN "Block" ON "Block"."ID" = "BlocksetEntry"."BlockID"
GROUP BY "BlocksetEntry"."BlocksetID"

Is pretty slow printing out about 1-2 rows per second.

sqlite> select count(distinct BlocksetID) from BlocksetEntry;
260934

Even at 2/s that would be 36 hours just for this part of the query.

SELECT "A"."ID" AS "BlocksetID", IFNULL("B"."CalcLen", 0) AS "CalcLen", "A"."Length"
FROM "Blockset" A LEFT OUTER JOIN (
	SELECT "BlocksetEntry"."BlocksetID", SUM("Block"."Size") AS "CalcLen"
	FROM "BlocksetEntry" LEFT OUTER JOIN "Block" ON "Block"."ID" = "BlocksetEntry"."BlockID"
	GROUP BY "BlocksetEntry"."BlocksetID"
) B ON "A"."ID" = "B"."BlocksetID"

Hangs for several minutes and pegs one CPU core at 100%. I haven’t had the patience to see if it keeps running for several hours.

I ended up killing Duplicati since it seemed unlikely to finish. I tested running ANALYZE on the database and the query went from hanging indefinitely to completing in seconds. Doing this changed the query plan from SCAN to SEARCH.

I see that Duplicati is already using PRAGMA optimize:

Maybe this needs the chance to run earlier?

After restarting Duplicati, it verified successfully and did not need to reupload any data.

Welcome to the forum @projeculatort

I’m glad you figured it out.

Run PRAGMA optimize upon closing database connection #3745 was past issue, and it links to its PR.

I’m not familiar with when the database connection is closed, but I would suspect it’s not during backup.
Initial backup might be particularly hard. Doesn’t ANALYZE need sample data in the DB to guide it? e.g.:

You appear to be more familiar with DB technologies than I am. Any interest in contributing to Duplicati?
SQL fine-tuning progress has been slowing as limits of available expertise and time have been reached.

There are also big speedups possible, such as Recreate batch-inserts (likely faster than one-at-a-time).
That comes around sometimes, especially with large backups. All I can advise is blocksize of 100 KB is inappropriate for large backups, because the Block table (and maybe others) take too long for INSERTs.

This was mostly just from Googling and reading the forum and Github. There’re lots of good discussions on here :slight_smile:
I’m familiar with SQL in general, but have virtually no experience with SQLite. I’d be open to contributing if I find something where I can make progress. I’m also unfamiliar with C#. I’ll poke through the code some more and see if I feel like I have enough understanding to contribute.

That is my understanding, but I don’t know for sure. I agree that the initial backup seems like it could be problematic if there is no optimization performed before complex queries are issued.

That makes sense. I may end up redoing the backup, but I’ll wait and experiment some more with other backups first.

Some docs to go with that idea:

How the backup process works
How the restore process works
Developer documentation
Local database format

Thanks!

Is there a Contributor License Agreement required? I don’t see one listed anywhere, but just wanted to verify.

I have not heard of one.

I created:

I haven’t been able to verify that this actually improves anything so I’m interested in feedback in the best way to validate this is worthwhile.