Stuck on Verifying Backend Data [2.0.3.6]

I’ll let you know, though I’m doubtful. The backup set is only 236GB, but the rebuild has been running for five days. Since I’m running Windows 10, at this rate I doubt that I can keep Windows from forcing an update on me before the rebuild completes.

Yeah…Microsoft is doing a great job these days of driving me to Linux. :slight_smile:

And yes - rebuilds are a known slow point that we hope to improve upon, but for now we’re kinda stuck with what we’ve got.

If / when the rebuild finishes, you could also consider trying a run with the --no-backend-verification setting enabled. It’s not really a fix, but it could help track down where the issue is.

I have the same problem.

I use very well a backup on my Laptop (Lenovo P50), but not on by PC.

I have “checking back-end data” non stop since 24h.
The first time, I killed the process and start again Duplicati. Checking back-end data was back…
I killed again the process, I deleted the database, then started Duplicaty, the database was re-created, and the backup was running (during many hours, I had lot of new data).

Next day… new backup started and now I have “chacking back end data” since 24h.

the computer is a I7 3770k, 16Go, Windows 7 Pro.
Only the duplicaty process use the CPU.

:frowning:

Please, do you have an idea?

The first thing I’d suggest is the same - try running the backup with --no-backend-verification enabled. If the job runs without error, then we’ve verified where the problem is.

(Note that this is NOT a fix as disabling backend-verification means if something goes wrong with a backend file you won’t know about it until you try to restore from it.)

The next thing you might want to do is use --log-file=<path> and --log-level=profiling to create an actual text file log of what Duplicati is doing. If the backend-verification process again seems stuck, check the log file and see what the last few commands here and how long ago they happened.

It’s possible the check is working but going very slowly because of bandwidth or reliability (many failed downloads) issues. Or maybe it’s gotten stuck on something (which would be indicated by a the most recent log entry being many hours old).

Well, it took over two weeks, but the the restore finally crashed the computer and failed. At least it made a valiant effort. I restored the database from a copy I made and tried, once again, to back up. It stuck on verifying. I disabled backend verification… and it still stuck on verifying. OK. I enabled logging as suggested, and it got stuck pretty much right away at this point:

2018-01-26 00:39:56Z - Profiling: Starting - ExecuteReader: SELECT “A”.“Hash”, “C”.“Hash” FROM (SELECT “BlocklistHash”.“BlocksetID”, “Block”.“Hash”, * FROM “BlocklistHash”,“Block” WHERE “BlocklistHash”.“Hash” = “Block”.“Hash” AND “Block”.“VolumeID” = ?) A, “BlocksetEntry” B, “Block” C WHERE “B”.“BlocksetID” = “A”.“BlocksetID” AND “B”.“Index” >= (“A”.“Index” * 3200) AND “B”.“Index” < ((“A”.“Index” + 1) * 3200) AND “C”.“ID” = “B”.“BlockID” ORDER BY “A”.“BlocksetID”, “B”.“Index”

Well at least we’ve got a confirmation that the original error is still happening AND happens to match at least two other topics:

I’m going to shout out to @kenkendk to see if he has thought’s on this particular SQL that seems to be coming from GetBlocklists() in LocalDatabase.cs. (Perhaps he’d be open to including some time-based logging of the read loop such that if more than x min. have progressed since the last log message we could log what record we’re on, or at least that it’s still moving…)

Do you have any suggestions as to how to proceed in the meantime? I obviously can’t rebuild the database from scratch, since that crashes after two weeks of waiting. I would prefer not to lose the backup entirely.

This might also be related:

I’m seeing a similar issue. See my post here: Verifying Backend Data for more detail.

My gut is that this is a problem for a large backup set (3.5TB of data) that was backed up using small (50MB) chunks. The result is a sqlite DB that is over 8GB and likely with a huge number of entries that end up in that join.

The plan for the query looks like:

sqlite> EXPLAIN QUERY PLAN  SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM  "BlocklistHash","Block" WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ?) A,  "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND  "B"."Index" >= ("A"."Index" * 3200) AND "B"."Index" < (("A"."Index" + 1) * 3200) AND "C"."ID" = "B"."BlockID"  ORDER BY "A"."BlocksetID", "B"."Index";
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE BlocklistHash USING INDEX BlocklistHashBlocksetIDIndex
0     1              2     SEARCH TABLE BlocksetEntry AS B USING PRIMARY KEY (BlocksetID=? AND Index>? AND Index<?)
0     2              3     SEARCH TABLE Block AS C USING INTEGER PRIMARY KEY (rowid=?)
0     3              1     SEARCH TABLE Block USING INDEX Block_IndexByVolumeId (VolumeID=?)
0     0              0     USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
1 Like

Here are my table sizes for that query:

sqlite> select count(*) from BlocklistHash;
coun
----
21255
sqlite> select count(*) from BlocksetEntry
coun
----
39639076
sqlite> select count(*) from Block;
coun
----
39821578

I can see why it would take a while. The index scan is only 21k rows but the three index searches hit 39m rows each. That is a rather large fan out :frowning:

1 Like

Just adding my 2c that since moving my clients over from another backup program, I’m seeing this problem at at least 3 sites… Watching this thread, hoping a fix comes soon.

Is there any commonality you know of among those sites? For example, are they all:

  • high file count
  • high file size
  • high TOTAL size
  • non-standard blocksize or dblocksize (Upload volume size)

We agree there’s an issue, we just haven’t been able to pin down exactly what’s causing it. In fact, I’m not even sure yet if it’s an actual “the program silently crashes while verifying” or simply “it’s running so slowly it might as well have crashed”. :frowning:

I will see if I can get some information for you as I diagnose each client. The first one I noticed it on is our own backup which is a 300GB dataset, backing up to our hosted FTP server.

After posting I ran delete/recreate, which took several hours. I just started the backup job again, about an hour has passed and it looks like it is going through the remote indexes on the FTP.

Yep, that’s what a recreate will do.

It looks like things are now working again, but I have no idea how. I restored an old version of the local sql file. That gave some errors, so I tried repairing the database. That reported that there were no changes. OK, tried backing up again. Eventually I got to the point where it would go through the “verifying” step and get stuck, so I just let it keep running. This time it eventually finished, but I got errors stating that there were unreachable blocks from the index. Run repair again, same error. Rinse and repeat. Today I bring up the interface and it says that it has successfully backed up.

So who knows. My expert strategy of hitting it until it works seems to have paid off (knock on wood).

Um…hooray? :smiley:

I have the same problem with a backup to Google Drive. The blocksize is 50MB and I have a large database of around 8GB. Duplicati gets stuck for days at the same SQL query (using one CPU core) and I’ve let it run for several times up to a week. A database repair also leads to a crash or doesn’t end in less than a week. In some runs I also saw it upload a block once in a while, but very rarely. It normally just runs that SQL query.

So for me, retrying didn’t help. I even tried to use mono to optimize the duplicati binary (mono --aot=full -O=all ), but it could not compile some methods. Any other ideas ? (I’m using Linux and duplicati 2.0.3.6).

If you downgrade to 2.0.3.5 does the problem go away? A few issues have been found in 2.0.3.6 (including performance related ones) so this might be related.

I downgraded to 2.0.3.5. With the existing database it would still get stuck in the same query. Then I deleted and recreated it and it finally finished to upload. It also successfully verified the remote state. No errors, everything seems fine. I even tried to restore an older file to make sure the backup is useful.

Great!

While I don’t recall specific 2.0.3.6 database errors (though @kenkendk might know of some) it sure looks like there might be a database corruption issue with 2.0.3.6 (whether it’s with normal backups or database maintenance it’s unclear).

I’ve gone ahead and flagged your comment as the solution and updated the title to better reflect it seems to have been a 2.0.3.6 related issue. Please let me know if you disagree.