Stuck on Verifying Backend Data [2.0.3.6]

sheepdot · December 31, 2017, 3:52am

I have three backups on this computer. Two of them work fine (when they can run), but the third is now stuck on Verifying Backend Data. If I kill the process and restart it, it immediately goes back to verifying. The last entry in the log (using the profiling setting) is

“Dec 30, 2017 5:14 PM: Starting - ExecuteReader: SELECT “A”.“Hash”, “C”.“Hash” FROM (SELECT “BlocklistHash”.“BlocksetID”, “Block”.“Hash”, * FROM “BlocklistHash”,“Block” WHERE “BlocklistHash”.“Hash” = “Block”.“Hash” AND “Block”.“VolumeID” = ?) A, “BlocksetEntry” B, “Block” C WHERE “B”.“BlocksetID” = “A”.“BlocksetID” AND “B”.“Index” >= (“A”.“Index” * 3200) AND “B”.“Index” < ((“A”.“Index” + 1) * 3200) AND “C”.“ID” = “B”.“BlockID” ORDER BY “A”.“BlocksetID”, “B”.“Index””

I have allowed the process to run for over a week before restarting it again, but it never completes. Is there any way to save this backup set?

I should add that this is running on Windows 10 and the backend is Google Drive.

JonMikelV · January 2, 2018, 6:04pm

Welcome to the forum, @sheepdot - sorry to hear you’re having problems with Duplicati!

When you kill and restart the failing job, does it get stuck on the same “ExecuteReader” command every time? (I’m assuming it does…)

Also, are you using any non-standard blocksize or hashsize values?

I looked through the GetBlocklists() code and I’m not seeing anything that I think would cause it to get stuck in a week long loop…

sheepdot · January 4, 2018, 4:08am

Yes, it appears to get stuck on the the same command every time.

I did not change any of the blocksize or hashsize values.

As of right now, I have deleted and started to recreate my database. That hasn’t worked on my other computer, but I’m hoping it works this time…

JonMikelV · January 4, 2018, 4:20am

Please let us know if this db recreate helps or not.

sheepdot · January 8, 2018, 3:25am

I’ll let you know, though I’m doubtful. The backup set is only 236GB, but the rebuild has been running for five days. Since I’m running Windows 10, at this rate I doubt that I can keep Windows from forcing an update on me before the rebuild completes.

JonMikelV · January 8, 2018, 9:32pm

Yeah…Microsoft is doing a great job these days of driving me to Linux.

And yes - rebuilds are a known slow point that we hope to improve upon, but for now we’re kinda stuck with what we’ve got.

If / when the rebuild finishes, you could also consider trying a run with the --no-backend-verification setting enabled. It’s not really a fix, but it could help track down where the issue is.

Jey · January 24, 2018, 7:59am

I have the same problem.

I use very well a backup on my Laptop (Lenovo P50), but not on by PC.

I have “checking back-end data” non stop since 24h.
The first time, I killed the process and start again Duplicati. Checking back-end data was back…
I killed again the process, I deleted the database, then started Duplicaty, the database was re-created, and the backup was running (during many hours, I had lot of new data).

Next day… new backup started and now I have “chacking back end data” since 24h.

the computer is a I7 3770k, 16Go, Windows 7 Pro.
Only the duplicaty process use the CPU.

Please, do you have an idea?

JonMikelV · January 24, 2018, 11:47pm

The first thing I’d suggest is the same - try running the backup with --no-backend-verification enabled. If the job runs without error, then we’ve verified where the problem is.

(Note that this is NOT a fix as disabling backend-verification means if something goes wrong with a backend file you won’t know about it until you try to restore from it.)

The next thing you might want to do is use --log-file=<path> and --log-level=profiling to create an actual text file log of what Duplicati is doing. If the backend-verification process again seems stuck, check the log file and see what the last few commands here and how long ago they happened.

It’s possible the check is working but going very slowly because of bandwidth or reliability (many failed downloads) issues. Or maybe it’s gotten stuck on something (which would be indicated by a the most recent log entry being many hours old).

sheepdot · January 26, 2018, 6:06am

Well, it took over two weeks, but the the restore finally crashed the computer and failed. At least it made a valiant effort. I restored the database from a copy I made and tried, once again, to back up. It stuck on verifying. I disabled backend verification… and it still stuck on verifying. OK. I enabled logging as suggested, and it got stuck pretty much right away at this point:

2018-01-26 00:39:56Z - Profiling: Starting - ExecuteReader: SELECT “A”.“Hash”, “C”.“Hash” FROM (SELECT “BlocklistHash”.“BlocksetID”, “Block”.“Hash”, * FROM “BlocklistHash”,“Block” WHERE “BlocklistHash”.“Hash” = “Block”.“Hash” AND “Block”.“VolumeID” = ?) A, “BlocksetEntry” B, “Block” C WHERE “B”.“BlocksetID” = “A”.“BlocksetID” AND “B”.“Index” >= (“A”.“Index” * 3200) AND “B”.“Index” < ((“A”.“Index” + 1) * 3200) AND “C”.“ID” = “B”.“BlockID” ORDER BY “A”.“BlocksetID”, “B”.“Index”

JonMikelV · January 26, 2018, 8:15pm

Well at least we’ve got a confirmation that the original error is still happening AND happens to match at least two other topics:

I’m going to shout out to @kenkendk to see if he has thought’s on this particular SQL that seems to be coming from GetBlocklists() in LocalDatabase.cs. (Perhaps he’d be open to including some time-based logging of the read loop such that if more than x min. have progressed since the last log message we could log what record we’re on, or at least that it’s still moving…)

sheepdot · January 28, 2018, 6:12am

Do you have any suggestions as to how to proceed in the meantime? I obviously can’t rebuild the database from scratch, since that crashes after two weeks of waiting. I would prefer not to lose the backup entirely.

tophee · January 30, 2018, 8:56pm

This might also be related:

edalquist · February 4, 2018, 9:30pm

I’m seeing a similar issue. See my post here: Verifying Backend Data for more detail.

My gut is that this is a problem for a large backup set (3.5TB of data) that was backed up using small (50MB) chunks. The result is a sqlite DB that is over 8GB and likely with a huge number of entries that end up in that join.

The plan for the query looks like:

sqlite> EXPLAIN QUERY PLAN  SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM  "BlocklistHash","Block" WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ?) A,  "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND  "B"."Index" >= ("A"."Index" * 3200) AND "B"."Index" < (("A"."Index" + 1) * 3200) AND "C"."ID" = "B"."BlockID"  ORDER BY "A"."BlocksetID", "B"."Index";
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE BlocklistHash USING INDEX BlocklistHashBlocksetIDIndex
0     1              2     SEARCH TABLE BlocksetEntry AS B USING PRIMARY KEY (BlocksetID=? AND Index>? AND Index<?)
0     2              3     SEARCH TABLE Block AS C USING INTEGER PRIMARY KEY (rowid=?)
0     3              1     SEARCH TABLE Block USING INDEX Block_IndexByVolumeId (VolumeID=?)
0     0              0     USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

edalquist · February 4, 2018, 10:03pm

Here are my table sizes for that query:

sqlite> select count(*) from BlocklistHash;
coun
----
21255
sqlite> select count(*) from BlocksetEntry
coun
----
39639076
sqlite> select count(*) from Block;
coun
----
39821578

I can see why it would take a while. The index scan is only 21k rows but the three index searches hit 39m rows each. That is a rather large fan out

ninja6o4 · February 9, 2018, 5:03pm

Just adding my 2c that since moving my clients over from another backup program, I’m seeing this problem at at least 3 sites… Watching this thread, hoping a fix comes soon.

JonMikelV · February 9, 2018, 6:02pm

Is there any commonality you know of among those sites? For example, are they all:

high file count
high file size
high TOTAL size
non-standard blocksize or dblocksize (Upload volume size)

We agree there’s an issue, we just haven’t been able to pin down exactly what’s causing it. In fact, I’m not even sure yet if it’s an actual “the program silently crashes while verifying” or simply “it’s running so slowly it might as well have crashed”.

ninja6o4 · February 10, 2018, 1:02am

I will see if I can get some information for you as I diagnose each client. The first one I noticed it on is our own backup which is a 300GB dataset, backing up to our hosted FTP server.

After posting I ran delete/recreate, which took several hours. I just started the backup job again, about an hour has passed and it looks like it is going through the remote indexes on the FTP.

JonMikelV · February 10, 2018, 3:18am

Yep, that’s what a recreate will do.

sheepdot · February 14, 2018, 4:41am

It looks like things are now working again, but I have no idea how. I restored an old version of the local sql file. That gave some errors, so I tried repairing the database. That reported that there were no changes. OK, tried backing up again. Eventually I got to the point where it would go through the “verifying” step and get stuck, so I just let it keep running. This time it eventually finished, but I got errors stating that there were unreachable blocks from the index. Run repair again, same error. Rinse and repeat. Today I bring up the interface and it says that it has successfully backed up.

So who knows. My expert strategy of hitting it until it works seems to have paid off (knock on wood).

JonMikelV · February 14, 2018, 5:15am

Um…hooray?