Verifying Backend Data


#1

My computer has been “Verifying Backend Data” for several hours and does not respond to any sort of “stop” command other than shutting down duplicati completely. Even restarting my computer had no effect. How can I make it stop?


#2

This has happened to me about 3 times so far. It takes ~24hours for it to complete for me. Then after the completion happens it starts uploading data agin. So far I’ve spent about as much time verifying the backup as I have spent uploading data. I hope this means I have a perfect backup :slight_smile:

When it is verifying I’ve not found anyway to make it stop.

david


#3

While it is doing whatever is taking so long, can you try to go to the log area and enable “Profiling” under “Live log”? That should give a hint as to what it is really spending its time on.


#4

I’m seeing the same. My duplicati hung last night and I gave the VM a reboot. Now it’s been sitting at Verifying backend data … for several hours. Any idea what I can do about this? I have backed up 1,5TiB or so already, so I don’t really want to start all over. Not for the bandwidth, I currently have 300Mbps symmetric, but I can’t use the whole of that for some reason. CPU usage is 100% on a single core. Profiling logs doesn’t seem to give me anything at all. Backend is jotta.

Thanks

roy


#5

Could you try looking at the live logs so we can see what it is really doing?


#6

I have the same problem. The program only displays the status “Verify backend data”. I only have one entry in the live log.

Sept. 22nd 2017 18:58: Server started and listens to 127.0.0.0.1, port 8200

I’m using the version Duplicati - 2.0.1.73_experimental_2017-07-15

I would be grateful for any suggestions


#7

Have you found a way to fix this? It’s the same here but even rebooting my PC did not fix it and it’s stuck in this since more than a day for a backupjob that usually (after the first run of course) required about 10minutes.


#8

Not yet as it only happens to certain people, so we’re still trying to collect enough data to figure out the cause.

Backend verification is a default setting and basically says “when your backup is done, download a dblock (archive) file from the destination and test (checksum or hash, I think) the file, then delete it”.

I’m not saying any if these are what’s going on, but they are the types of things that COULD cause the issue:

  • large dblock (archive) setting [can take a long tone to download and checksum / rehash]
  • high disk usage slows testing of downloaded dblock
  • low memory cause disk thrashing
  • antivirus tools slow down file access
  • high test file count parameter cause many files to be tested every run
  • download of dblock file stalls so testing is never started
  • downloaded dblock file is locked open by something else (antivirus, other backup tool) so Duplicati gets lost waiting for file to become available

If you wouldn’t mind posting your backup exported “As command-line” (with any personal data like user, password, email address, etc. obfuscated) it could hero eliminate odd parameter settings as potential causes.

If you the issue happens every time you can try the ‘–no-backend-verification’ advanced setting to turn this step off altogether and see if the error goes away.


#9

Similar poor experience:
running Duplicati - 2.0.2.12_canary_2017-10-20
MacOS - El Capitan

Duplicati stalls showing “Verifying backend data …” in status bar.

Live profiling shows last entry is:

Nov 27, 2017 11:46 PM: Starting - ExecuteReader: SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM "BlocklistHash","Block" WHERE "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ?) A, "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND "B"."Index" >= ("A"."Index" * 3200) AND "B"."Index" < (("A"."Index" + 1) * 3200) AND "C"."ID" = "B"."BlockID" ORDER BY "A"."BlocksetID", "B"."Index"

mono-sgen64 is running at 99% CPU, but nothing is happening…


Stuck on Verifying Backend Data [2.0.3.6]
#10

The “no-backend-verification” option doesn’t seem to affect the “Verifying backend data …” behavior, either.


#11

That query is a consistency check that verifies that the database is in a sane state and that no blocks as missing or dangling.

There is currently no option to disable this verification step.


#12

Another victim here. I had an unexpected reboot while a backup run was finishing up, and now I am stuck in “Verifying backend data” for over an hour.

The chatter here does not give me optimism that it will finish soon. (Ever?)

The sole Profiling log entry is:

Dec 14, 2017 4:38 AM: Server has started and is listening on 0.0.0.0, port 8200

4:38AM being the time of the reboot. Here is my export command line

mono "/usr/lib/duplicati/Duplicati.CommandLine.exe" 
backup "file:///var/duplicati/" 
"/var/log/" "/home/david/" 
--snapshot-policy="Auto" 
--full-result="true" 
--backup-name="DCF_Desktop_To_Var" 
--dbpath="/home/david/.config/Duplicati/YKLEULKMPB.sqlite"
--encryption-module="aes" --compression-module="zip" --dblock-size="64MB" 
--keep-time="3M" --passphrase="<redacted>" --send-http-url="<redacted>" 
--disable-module="console-password-input" 
--exclude="/home/david/.pcloud/" 
--exclude="/home/david/.cache/" 

line breaks added for readability.


#13

Update: after almost two hours, it finished with a warning:

Warnings: [ Expected there to be a temporary fileset for synthetic filelist 
(79, duplicati-ib0bb3aea22da49ddab20ecd986681f63.dindex.zip.aes), 
but none was found? ] 

#14

I believe the temporary fileset warning is almost expected after a power interrupt event and not something you need to worry about as it really is a temp file.


#15

I’m seeing this as well with my largest backup set. Enabling verbose logging it seems to be stuck at

2018-02-04 12:07:47Z - Profiling: Starting - ExecuteReader: SELECT "A"."Hash", "C"."Hash"
FROM (
    SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", *
    FROM  "BlocklistHash","Block"
    WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ?) A,
"BlocksetEntry" B,
"Block" C
WHERE "B"."BlocksetID" = "A"."BlocksetID" AND
    "B"."Index" >= ("A"."Index" * 3200) AND
    "B"."Index" < (("A"."Index" + 1) * 3200) AND
    "C"."ID" = "B"."BlockID" 
ORDER BY "A"."BlocksetID", "B"."Index"

I think the problem here is I didn’t change the default 50MB chunk size and this is a 3.5TB backup set composed of files that average 10GB each. The results is my sqlite DB for this set is 8.2GB. I’ve since increased the chunk size to 500MB but that doesn’t help with the existing 3.5TB of backup data.


Stuck on Verifying Backend Data [2.0.3.6]
#16

The chunk (dblock / “Upload volume size”) is the approx max size of the files saved to your destination - they have little effect on your local database size. I suspect you’re thinking of the --blocksize parameter which specifies the size of each chunk a file is broken into. With the default of 100KB one of your 10GB source files would end up be broken into about 104,858 individual blocks, each with it’s own hash that gets stored in the database.

So this is probably the reason your verification SQL is on the slow side. I don’t know if that SQL has been targeted for performance improvements yet, but if it hasn’t already then it might be a while before it gets looked at.

Unfortunately, --blocksize (similar to encryption passphrase) is something you can’t change on an existing backup in Duplicati. So assuming this is the actual source of your performance issues, I’m afraid your options are pretty much:

  • live with it and hope a performance improvement gets rolled sooner than later (you could make it sooner by coding it yourself or supporting a bounty for it)
  • start over with a fresh backup using a larger --blocksize (you might want to do some tests to see how different sizes affect performance)
  • disable backend verification by enabling --no-backend-verification (which should skip the process that includes the slow step above, HOWEVER has the side effect of no longer making sure your backend files are all good - meaning if things go missing or get corrupted at the destination you won’t automatically be notified about it)

#17

Thanks for the detailed response. It is “Upload volume size” that I increased to 500MB.

If I understand the docs correctly the tradeoff of a larger blocksize is that if say 1 byte of a 10GB file changes the blocksize value is the smallest amount of data that must be uploaded to back up that change? With that in my case where I’m backing up large files that essentially never change setting a much larger blocksize is probably a reasonable approach.

Is there an upper limit to blocksize I should be wary of? Is 100MB okay? 250MB?

I figure at this point I’ll just re-backup all that data with a reasonable blocksize, if nothing else VACUUM on a 8.5GB sqlite DB is not exactly fast :smiley:


#18

Are there other parameters that I should be aware of for large backup sets or when dealing with large files?


#19

We’re getting a little bit off topic here, but I’d recommend looking at this post:

And here Kenkendk says in THEORY a block size of up to 2GB should be supported (though there’s no mention on whether or not that’s a good idea). :wink:

Overall I’d say a jump from a --blocksize of 100KB to 100MB should work just fine, but might be a bit drastic. At the default 100KB you’re looking at 10.7 million block-hash rows per 1 TB of source data.

Shifting to a more modest 1MB block size would take that down to just over 1 million block-hash rows. That won’t necessarily return a 10x smaller sqlite file and 10x faster performance, but should show a fair bit if improvement…


#20

I’m jumping in here with a largely similar problem. I’ve got a 1.1T backup stalled at “Verifying backend data”. I’ve been through the repair/retry loop a few times.

Profiling gives me a last query of

SELECT “A”.“Hash”, “C”.“Hash” FROM (SELECT “BlocklistHash”.“BlocksetID”, “Block”.“Hash”, * FROM “BlocklistHash”,“Block” WHERE “BlocklistHash”.“Hash” = “Block”.“Hash” AND “Block”.“VolumeID” = ?) A, “BlocksetEntry” B, “Block” C WHERE “B”.“BlocksetID” = “A”.“BlocksetID” AND “B”.“Index” >= (“A”.“Index” * 3200) AND “B”.“Index” < ((“A”.“Index” + 1) * 3200) AND “C”.“ID” = “B”.“BlockID” ORDER BY “A”.“BlocksetID”, “B”.“Index”

The profile logs stop here. It stops after a couple of minutes after the “verifying backend” status is posted.

Looking up in the thread shows that this isn’t really new information. However, I don’t think anyone has tried this: I opened the local database in DB Browser and issued the offending query, which completed in a bit over 3 minutes, returning 0 rows.

The fact that after that final log record showing the “offending” query was posted, there were no new logs posted after running for days. We know that the query finishes in 3 minutes, so it’s not that it’s stuck in the query. It was asked and answered. Therefor, it’s likely that we’re in some sort of tight loop after this “sanity check” explained by @kenkendk earlier.

The error doesn’t play nice with CPU scheduling, as it’s consuming 100% of a core. Memory impact is light and not growing. I can’t break the process by any means other than killing mono or restarting the machine. While stuck, the GUI does respond to edits, etc. It just seems to affect the backup engine.

I hope this provides enough clues to someone familiar with the code to figure this out. It seems to be affecting a number of users.

I am happy to provide more information as time allows.