Verifying Backend Data

Duplicati is usually quite good at recovering from interruptions - my guess is your local database just got so large more time was spent checking for pre-existing blocks than actually backing them up.

A 20MB block size will probably make that issue go away as the database should have about 20x fewer records now. Not that deduplication becomes less effective as block size increases so you may see more destination storage usage than before as a result.

Hi I am seeing similar issues (GUI stuck at Verifying Backend Data, system status at “Phase”:“Backup_PreBackupVerify”, unresponsive stop button), after going to 2.0.4.5. I have to add that this does not happen if you do any backups immediately after the upgrade; it will happen only after the machine got restarted in any manner (be it power outage or normal shutdown). I was able to successfully run a Repair operation on the backup’s database.

I got these from the logs:

> * Dec 11, 2018 9:29 AM: Starting - ExecuteReader: SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM "BlocklistHash","Block" WHERE "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = 97426) A, "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND "B"."Index" >= ("A"."Index" * 3200) AND "B"."Index" < (("A"."Index" + 1) * 3200) AND "C"."ID" = "B"."BlockID" ORDER BY "A"."BlocksetID", "B"."Index"
> 
> * Dec 11, 2018 9:29 AM: ExecuteReader: SELECT DISTINCT "Hash", "Size" FROM "Block" WHERE "VolumeID" = 97426 took 0:00:00:00.000
> 
> * Dec 11, 2018 9:29 AM: Starting - ExecuteReader: SELECT DISTINCT "Hash", "Size" FROM "Block" WHERE "VolumeID" = 97426
> 
> * Dec 11, 2018 9:29 AM: ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (40, "duplicati-icf794f6b7960408882c9c49b7f365651.dindex.zip.aes", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid(); took 0:00:00:00.000
> 
> * Dec 11, 2018 9:29 AM: Starting - ExecuteScalarInt64: INSERT INTO "Remotevolume" ("OperationID", "Name", "Type", "State", "Size", "VerificationCount", "DeleteGraceTime") VALUES (40, "duplicati-icf794f6b7960408882c9c49b7f365651.dindex.zip.aes", "Index", "Temporary", -1, 0, 0); SELECT last_insert_rowid();
> 
> * Dec 11, 2018 9:29 AM: Re-creating missing index file for duplicati-b6026136686564f439025237961c99a48.dblock.zip.aes
> 
> * Dec 11, 2018 9:29 AM: ExecuteReader: SELECT "Name" FROM "RemoteVolume" WHERE "Type" = "Blocks" AND NOT "ID" IN (SELECT "BlockVolumeID" FROM "IndexBlockLink") AND "State" IN ("Uploaded","Verified") took 0:00:00:00.036
> 
> * Dec 11, 2018 9:29 AM: Starting - ExecuteReader: SELECT "Name" FROM "RemoteVolume" WHERE "Type" = "Blocks" AND NOT "ID" IN (SELECT "BlockVolumeID" FROM "IndexBlockLink") AND "State" IN ("Uploaded","Verified")
> 
> * Dec 11, 2018 9:29 AM: ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC took 0:00:00:00.000
> 
> * Dec 11, 2018 9:29 AM: Starting - ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC
> 
> * Dec 11, 2018 9:29 AM: PreBackupVerify took 0:00:14:42.154
> 
> * Dec 11, 2018 9:29 AM: ExecuteReader: SELECT DISTINCT "Name", "State" FROM "Remotevolume" WHERE "Name" IN (SELECT "Name" FROM "Remotevolume" WHERE "State" IN ("Deleted", "Deleting")) AND NOT "State" IN ("Deleted", "Deleting") took 0:00:00:00.013
> 
> * Dec 11, 2018 9:29 AM: Starting - ExecuteReader: SELECT DISTINCT "Name", "State" FROM "Remotevolume" WHERE "Name" IN (SELECT "Name" FROM "Remotevolume" WHERE "State" IN ("Deleted", "Deleting")) AND NOT "State" IN ("Deleted", "Deleting")
> 
> * Dec 11, 2018 9:29 AM: RemoteOperationList took 0:00:14:38.754
> 
> * Dec 11, 2018 9:29 AM: Backend event: List - Completed: (95.15 KB)

I tried again about 24 hours ago but this time I left it alone (went to do other things + sleep). Right now it is back to normal (as in it is backing up the files) but where can I get the logs to see how long the pre-backup verification took?

PS: This seems to me to be more of a UX/UI issue rather than a database issue

This problem is still around and its definitely not a UI issue. Mine just took 2 days to verify data before resuming normal backups.

The cause should be fixed in 2.0.4.17_canary_2019-04-11 although backups will still need to recreate and upload any that got missed, but they should no longer get missed.

If 2.0.4.17 fix refers to “* Fixed an issue where index files were not generated, thanks @seantempleton” then it fixes a bug that got in 2.0.4.16 with new parallel upload code, so that was a rather temporary issue and probably wouldn’t help any older releases. For any who found the issue here to get worse in 2.0.4.16, code here might have been rebuilding the missing index files while still displaying PreBackupVerify phase which is one of two phases that show “Verifying backup data”. It helps when reports describe error timing.

Rebuilding an index file appears to get logged if at Information level, as seen in text from bug report above, or from the earlier post in this topic, which also very clearly says PreBackupVerify so reduces guessing. This “Re-creating missing index file” isn’t new, but not all of the reports here have info needed to identify it. Not every little action gets its own phase, so guesswork may be needed on what code caused an issue…

Does that mean that it’s an intermittent slowdown? I notice most reports don’t say much about frequency.

Not sure. Its currently “Deleting unwanted files.” Has been for the last 24 hours or so. But at least log is constantly updated with new files it fetches from server and analyzes whether to delete or not.

I’ll update to 2.0.4.17 and will update this post with whether it’ll go into another 2 day cycle of Verifying backend data.

I am new to Duplicati and was excited to find a decent backup solution after CrashPlan. But soon after completing first round of backups I saw that it stuck on Verifying backend data… and took long time, for some backup-sets it took over 1 day for couple of GB data verification.

After making changes (or adopt new parameters i.e. changing “block-hash-algorithm” or “blocksize”) can’t use existing backup data, so I initiate fresh backup that put me back to square one. Search on google didn’t help, but came to know that many other users having same issue. After carefull monitoring and review logs, my finding regarding the duplicati stuck on Verifying backend data… is:

  1. On destination Duplicati creates 3 types of files
  • dlist.zip.aes* - List of the source files and folders
  • dblock.zip.aes* - Data Blocks
  • dindex.zip.aes* - Indices for quick list and restore.
  1. During initial backup, Duplicati creates dlist.zip.aes and dblock.zip.aes but skip creating dindex.zip.aes files.

  2. 2nd backup, just after initial backup, Duplicati creates Indices from dblock.zip.aes by copy dblock.zip.aes file to source machine, load it than generate index file followed by uploading to destination as dindex.zip.aes file. It is a time-consuming process (not sure why) also it creates total number of index file matched to dblock files.

  3. To verify that Duplicati is creating indices and not hung otherwise, go to ‘About’ => ‘Show Log’ => ‘Live’ => ‘ExplicOnly’
    A message like following should be appear, look for "VolumeID" = # , Where it tells, currently it is working on # dblock file and creating index.

Apr 12, 2019 7:41 PM: Starting - ExecuteReader: SELECT “A”.“Hash”, “C”.“Hash” FROM (SELECT “BlocklistHash”.“BlocksetID”, “Block”.“Hash”, * FROM “BlocklistHash”,“Block” WHERE “BlocklistHash”.“Hash” = “Block”.“Hash” AND “Block”. “VolumeID”=277) A, “BlocksetEntry” B, “Block” C WHERE “B”.“BlocksetID” = “A”.“BlocksetID” AND “B”.“Index” >= (“A”.“Index” * 16384) AND “B”.“Index” < ((“A”.“Index” + 1) * 16384) AND “C”.“ID” = “B”.“BlockID” ORDER BY “A”.“BlocksetID”, “B”.“Index”*

  1. If you need to know how many more indices still needs to create, check destination and count for dblock.zip.aes files. In Duplicati message each dblock.zip.aes file considered as Volume. Also, as it proceeds, new dindex.zip.aes files should appear at destination.

  2. Once all indices created, subsequent backups should be quick, subject to the data changed in between backups.

  3. In my understanding, current Duplicati message is confusing and should be meaning full, something like “Creating Index # of ##…" also a progress bar with current process will even be more help full.

Thanks

1 Like

Hello @RGupta and welcome to the forum!

Are you running 2.0.4.16 canary? That’s the one that inadvertently sometimes missed dindex uploads, thus forcing recreation. It’s not a normal thing. Canary releases are bleeding edge, and sometimes have bugs… Fortunately, those who run it can report them and fixes can be made. 2.0.4.17 should upload all the dindex. While I’m not familiar enough with the details of recovery, your analysis is very nice and seems reasonable.

3 Likes

I’m struggling to figure out how I can see how many more files need to be recreated by the two systems I have that are still verifying for over a week. I tried what @RGupta wrote but the numbers made no sense or I misunderstood the directions.For example, currently one backup has logged:

16 Apr 2019 08:15: Starting - ExecuteReader: SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM "BlocklistHash","Block" WHERE "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = 89642) A, "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND "B"."Index" >= ("A"."Index" * 3200) AND "B"."Index" < (("A"."Index" + 1) * 3200) AND "C"."ID" = "B"."BlockID" ORDER BY "A"."BlocksetID", "B"."Index"

There are 23,698 dblock.zip.aes files in the destination.

Can anyone help?

Yes, I was on 2.0.4.16. Current release addressed Index issue. Thanks

No changes for me with 2.0.4.17. After upgrading version it has now been “verifying backend data” for over 12 hours.

This does only happen when backup is interrupted and backups seem to complete in a normal amount of time when the verification process finally finishes so I guess I can live with this.

1 Like

I think that means you should have have that many dindex files when you get them all back together. Hopefully quite a few of those are old ones, so should not have been affected by 2.0.4.16 not uploading some of them…

Couple of things I need to clarify here:

  1. VolumeID is the Index number (will ONLY increase), mapped to actual BlockVolumeID
  2. BlockVolumes are added or removed as data changes over the time period but VolumnID ## will only increase.
  3. Above observation are from the fresh backup where no additional blocks created or removed. So “theoretically” VolumeID are sequentially mapped to BlockVolume and are in order.

If you do not have fresh backup than you can download 'DB Browser for SQLite"

  1. Open "READ ONLY… " sqlite database (to find local database file, expend job list => Database (in Advance) => Local database path )
    If database is locked, on Windows machine use ‘ Ctrl+C ’ and ‘ Ctrl+V ’ in explorer and use copy of database.
  2. Go to ‘ Execute SQL ’ tab and paste following query (Replace ## with the current VolumeID)
    select (select max(rowid)from IndexBlockLink) as "Total Volumes", rowid as "Current Volume" from IndexBlockLink where IndexVolumeID = ## order by IndexVolumeID asc
  3. Click on execute button ‘Blue Triangle’.
2 Likes

Ok, I have file counts for the local backup destinations (SMB) for the problem machines telling me:

Indexes: 25,298 Blocks: 23698
Indexes: 5,717 Blocks: 5,717
Indexes: 12,417 Blocks: 13,651

One of the servers shows as processing a Wasabi destined backup, so I can’t get the counts as easily, but that’s assuming that the GUI is telling me the right thing and it’s indeed processing this backup set as the main log doesn’t show me which job it is.

Very good analysis! m currently experiencing this myself where a backup made on 2.0.4.16 did not upload all the dindex.zip.aes.

Agree it could be clearer but as this is not supposed to happen maybe it is not worth worrying about.

I’m not sure if having excess index files is normal, but I’ve seen times where two index files index the same dblock. Not sure how it happens or if it causes a problem, but it sure throws off the how-many-left theory…

Dear Duplicati authors,

analysis of user RGupta is correct and issue of “Verifying Backend Data” is not resolved in 2.0.4.17 build.

What is recommended to be changed is:
a)
create index i1 on BlocklistHash(Hash);

b) LocalDatabase(1301) issue commands:
PRAGMA optimize;
ANALYZE sqlite_master;
ANALYZE Block;
ANALYZE BlocklistHash;
ANALYZE BlocksetEntry;

c) in LocalDatabase.cs(1304):
var sql = string.Format(@“SELECT ““A””.”“Hash”", ““C””."“Hash”" FROM " +
@"(SELECT bb1."“BlocksetID”", bb2."“Hash”", * FROM ““BlocklistHash”” bb1,"“Block”" bb2 WHERE bb1."“Hash”" = bb2."“Hash”" AND bb2."“VolumeID”" = ?) A, " +
@" ““BlocksetEntry”” B, ““Block”” C WHERE ““B””."“BlocksetID”" = ““A””."“BlocksetID”" AND " +
@" ““B””."“Index”" >= ("“A”"."“Index”" * {0}) AND ““B””."“Index”" < (("“A”"."“Index”" + 1) * {0}) AND ““C””."“ID”" = ““B””."“BlockID”" " +
@" ORDER BY ““A””."“BlocksetID”", ““B””."“Index”"",
blocksize / hashsize

My development laptop has over 10 million local files.
Even in 2.0.4.17 “Verifying Backend Data” takes over 24 hours before I kill service.
With changes it takes fraction of a second.

Thanks

2 Likes

this also shows in the same direction as i found out, that running an ANALYZE command on the SQLITE DB while service is stopped, was fixing the issue of long running SQL Queries for me to.

GitHub issue created here:

1 Like