Initial backup stuck at 100%

projeculatort · September 18, 2020, 4:13pm

Hello, I just attempted my first full backup of about 200GB using the GUI. I’m new to Duplicati so apologies if this is covered elsewhere. It appears to have successfully uploaded all of the data. The progress bar says 0 files (0 bytes) to go and progress is 100%.
Screen Shot 2020-09-18 at 11.40.45 AM

I previously tried smaller test runs with a few MB of files and did not see this issue.

I’ve seen a few similar threads in the forum:
Slow Duplicati Bakcup on Mac OS
Slow backups on macOS

Those seem to be somewhat different as they discuss slowness during the initial compare stage or during data transfer whereas the problem I’m seeing is at end of the backup.

The profiling log shows:

Sep 18, 2020 4:59 AM: Uploaded 49.92 MB in 00:04:41.4400100, 181.64 KB/s
Sep 18, 2020 4:58 AM: Uploaded 49.91 MB in 00:02:53.3169290, 294.88 KB/s
Sep 18, 2020 4:58 AM: Uploaded 49.91 MB in 00:05:31.5733560, 154.13 KB/s
Sep 18, 2020 4:57 AM: Uploaded 15.89 MB in 00:01:21.6477720, 199.35 KB/s
Sep 18, 2020 4:56 AM: Starting - ExecuteReader: SELECT "CalcLen", "Length", "A"."BlocksetID", "File"."Path" FROM ( SELECT "A"."ID" AS "BlocksetID", IFNULL("B"."CalcLen", 0) AS "CalcLen", "A"."Length" FROM "Blockset" A LEFT OUTER JOIN ( SELECT "BlocksetEntry"."BlocksetID", SUM("Block"."Size") AS "CalcLen" FROM "BlocksetEntry" LEFT OUTER JOIN "Block" ON "Block"."ID" = "BlocksetEntry"."BlockID" GROUP BY "BlocksetEntry"."BlocksetID" ) B ON "A"."ID" = "B"."BlocksetID" ) A, "File" WHERE "A"."BlocksetID" = "File"."BlocksetID" AND "A"."CalcLen" != "A"."Length"
Sep 18, 2020 4:56 AM: Starting - VerifyConsistency
Sep 18, 2020 4:56 AM: BackupMainOperation took 3:15:02:08.902

There has been nothing logged in 7 hours. Duplicati is currently using 100% of CPU with little else running. Is there a way to tell if this is an SQLite issue as suggested in the other threads? It looks like the query has not completed?

Is it safe to stop the backup and restart? Will I need to reupload everything?

projeculatort · September 18, 2020, 4:30pm

Think you’re “Stuck” on “Verifying backend data”? appears to be another similar thread. The UI issue reported there is not exactly the same but seems similar.

I’m on version 2.0.5.1_beta_2020-01-18

projeculatort · September 18, 2020, 5:02pm

I tried executing the query from the command line and they do seem to be quite slow.

SELECT "BlocksetEntry"."BlocksetID", SUM("Block"."Size") AS "CalcLen"
FROM "BlocksetEntry" LEFT OUTER JOIN "Block" ON "Block"."ID" = "BlocksetEntry"."BlockID"
GROUP BY "BlocksetEntry"."BlocksetID"

Is pretty slow printing out about 1-2 rows per second.

sqlite> select count(distinct BlocksetID) from BlocksetEntry;
260934

Even at 2/s that would be 36 hours just for this part of the query.

SELECT "A"."ID" AS "BlocksetID", IFNULL("B"."CalcLen", 0) AS "CalcLen", "A"."Length"
FROM "Blockset" A LEFT OUTER JOIN (
	SELECT "BlocksetEntry"."BlocksetID", SUM("Block"."Size") AS "CalcLen"
	FROM "BlocksetEntry" LEFT OUTER JOIN "Block" ON "Block"."ID" = "BlocksetEntry"."BlockID"
	GROUP BY "BlocksetEntry"."BlocksetID"
) B ON "A"."ID" = "B"."BlocksetID"

Hangs for several minutes and pegs one CPU core at 100%. I haven’t had the patience to see if it keeps running for several hours.

projeculatort · September 18, 2020, 8:18pm

I ended up killing Duplicati since it seemed unlikely to finish. I tested running ANALYZE on the database and the query went from hanging indefinitely to completing in seconds. Doing this changed the query plan from SCAN to SEARCH.

I see that Duplicati is already using PRAGMA optimize:

github.com

duplicati/duplicati/blob/174edf422ec828cf0802366470c9e39ad2293930/Duplicati/Library/Main/Database/LocalDatabase.cs#L1379




if (ShouldCloseConnection && m_connection != null)

{

    if (m_connection.State == System.Data.ConnectionState.Open)

    {

        using (IDbTransaction transaction = m_connection.BeginTransaction())

        {

            using (IDbCommand command = m_connection.CreateCommand(transaction))

            {

                // SQLite recommends that PRAGMA optimize is run just before closing each database connection.

                command.ExecuteNonQuery("PRAGMA optimize");

                transaction.Commit();

            }

        }



        m_connection.Close();

    }



    m_connection.Dispose();

}

Maybe this needs the chance to run earlier?

After restarting Duplicati, it verified successfully and did not need to reupload any data.

ts678 · September 20, 2020, 2:36pm

Welcome to the forum @projeculatort

I’m glad you figured it out.

Run PRAGMA optimize upon closing database connection #3745 was past issue, and it links to its PR.

I’m not familiar with when the database connection is closed, but I would suspect it’s not during backup.
Initial backup might be particularly hard. Doesn’t ANALYZE need sample data in the DB to guide it? e.g.:

You appear to be more familiar with DB technologies than I am. Any interest in contributing to Duplicati?
SQL fine-tuning progress has been slowing as limits of available expertise and time have been reached.

There are also big speedups possible, such as Recreate batch-inserts (likely faster than one-at-a-time).
That comes around sometimes, especially with large backups. All I can advise is blocksize of 100 KB is inappropriate for large backups, because the Block table (and maybe others) take too long for INSERTs.

projeculatort · September 20, 2020, 3:31pm

This was mostly just from Googling and reading the forum and Github. There’re lots of good discussions on here
I’m familiar with SQL in general, but have virtually no experience with SQLite. I’d be open to contributing if I find something where I can make progress. I’m also unfamiliar with C#. I’ll poke through the code some more and see if I feel like I have enough understanding to contribute.

That is my understanding, but I don’t know for sure. I agree that the initial backup seems like it could be problematic if there is no optimization performed before complex queries are issued.

That makes sense. I may end up redoing the backup, but I’ll wait and experiment some more with other backups first.

ts678 · September 21, 2020, 1:09am

Some docs to go with that idea:

How the backup process works
How the restore process works
Developer documentation
Local database format

Thanks!

projeculatort · September 21, 2020, 11:41pm

Is there a Contributor License Agreement required? I don’t see one listed anywhere, but just wanted to verify.

ts678 · September 22, 2020, 12:47am

I have not heard of one.

projeculatort · September 26, 2020, 3:11pm

I created:

I haven’t been able to verify that this actually improves anything so I’m interested in feedback in the best way to validate this is worthwhile.

Brunni · February 27, 2021, 12:15pm

Hi. It seems, that I have a similar issue.
Backup hangs now at “0 files (0 bytes) to go at 324.64 KB/s” but still uses ~20% CPU.

I only see “Uploaded” messages in the Profiling Log.

After a reboot, I got the message “Found 4 remote files that are not recorded in local storage, please run repair”
I now ran the repair and restarted the backup.

And again: The backup is executed. Some files are uploaded.
But again: “0 files (0 bytes) to go at 3.25 MB/s”

OK. I could not believe it, but I ran the ANALYZE command manually against the sql liste database. And now the backup worked and ran through. After hours of trying and failing…
The sqllite db seems to be really broken. I would suggest to switch to another storage format, which works more reliable. e.g. XML, JSON, LiteDb…

ts678 · February 27, 2021, 11:53pm

It’s not just a storage format. It’s an SQL database, and SQL is used heavily. Are you expert? I’m not,
although I just did a little web research. LiteDB does have a query language, but can it do SQL JOIN?

Again, I’m not an expert, but I believe some of the query plans that SQLite uses initially are quite slow. ANALYZE possibly put it on a better path. I’m not sure where to wedge it in, in middle of first backup…

Actually, I see that was mentioned in the solution as SCAN versus SEARCH. Was your backup large? Possibly what somebody needs to do (if there are any DB volunteers) is to look at issue more closely.

Brunni · February 28, 2021, 10:48am

Hi!
My backup sqllite db is 35 MB big. Don’t know if this is considered big.
I’m not sure, why it is necessary for Duplicati to do SQL querys. Why not just load the “database” into RAM for a backup and then use it from there? What is the biggest size a duplicati db can get?
But I’m currently not too familiar with all use cases and requirements, and why SQLLite is beiing needed. Is it about consistency, when Duplicati crashes etc.? My experience was: The “Repair” or “Re-Create” feature is heavily required and in this thread, we see one further issue with the SQLLite db.

ts678 · February 28, 2021, 1:17pm

Not in my opinion.

Because that’s the core of the design. There are lots of complicated relationships that need tracking.
I was going to post links for you to read, but it looks like I already posted quite a nice set already here.

Aside from RAM use (see below), RAM does not persist across reboots, etc., and the database must.
It can be rebuilt from the backup using the Recreate button, but it takes awhile. You want it preserved.

There’s also the matter of turning RAM into an SQL database (see earlier remark). If you want to view processing at the SQL level, go to About → Show log → Live → Profiling during a backup, and watch.

Limits In SQLite says 4294967294 pages, which at a 4 KB page, about 16384 GB, but Duplicati would overload before that. A brief forum search saw reports of databases in the high single digit GBs, e.g. 9.