The dindex file internally holds the name of the dblock that it indexes, but you’d have to open it up to see it.
Database repair ideally opens only dlist and dindex, and it calls an error if dindex points to missing dblock.
Or if you prefer a different way to get a lighter look at the database from Linux shell, upload-verification-fileduplicati-verification.json can give it. What’s missing is the IndexBlockLink to find what’s unused.
I’m not sure if an unused dindex is guaranteed to be the one to delete though. It seems a good guess, but what happens on a DB recreate if two dindex files both claim to index the same dblock is unknown to me.
Third option is to delete all the dindex files and press the Repair button to see what Duplicati builds.
Ordinarily it can replace missing dlist and dindex files, but its records might be a little incorrect now.
It’s not clear humans can do better but at least it’s some learning, especially if logs can be checked.
Ideally we would want to identify how things got out of line, so that something can be done to stop it.
I just tested this on one of my backups with a mismatch (100 dblocks and 102 dindexes). Duplicati rebuilt the 102 dindexes, so there’s still a mismatch.
Thanks for the test (which may knock out option 3). Are you or @Kahomono up for a database look?
Posting a database bug report for others (maybe me) is another option, but less learning than the DIY.
Yeah I might be able to generate a database bug report.
For what it’s worth, I did a full database recreate which processed all 102 dindex files without error. I then repeated the earlier test just for kicks where I delete the dindex files and regenerate them. No change.
I don’t know if people organize their backup data like I do (all backups stored in the same main location, with backup sets stored in individual subfolders). But if so here’s a simple PowerShell script to show the file counts:
I’m looking at the database in DB Browser. The “IndexBlockLink” table has 100 entries.
In the “Remotevolume” table when I filter by dindex I see 102 entries. Interestingly, 2 of them have a size of 541 which from memory is an empty dindex. I need to decrypt and open them to be sure.
Edit to add: confirmed, those two dindexes only have a “manifest” file inside and no “list” or “vol” folders.
Nice remembering. Since your NAS might be Linux, you might be able to fairly easily look inside them
to see how many manifest and vol you have relative to the number of input files. Ideally all match…
$ unzip -l "*dindex.zip"
Archive: duplicati-ib3189424484248dfa5bbdce8caa8fcda.dindex.zip
Length Date Time Name
--------- ---------- ----- ----
148 2021-02-08 17:27 manifest
425 2021-02-08 17:27 vol/duplicati-b1c651688cf094802bb08d0593ab0fc81.dblock.zip
--------- -------
573 2 files
...
but knowing the magic sizes is easier. I also just tried making some SQL based on source examples:
SELECT ID FROM Remotevolume WHERE Type = “Index” AND State = “Verified” AND ID NOT IN ( SELECT IndexVolumeId FROM IndexBlockLink )
run in the Execute SQL tab might lead you to the “extra” dindex. If that SQL is wrong, I can try again…
I wonder how the bad files looked before they were recreated? Did you happen to save the old dindex?
EDIT:
Not looking at code, there might be something that sees dindex is missing (based on Remotevolume),
looks to see what it should have, but comes up empty because it’s not hooked into the IndexBlockLink.
The Block table has the VolumeID in the Remotevolume table for a block. Code could reverse-lookup, provided IndexBlockLink can map IndexVolumeID to BlockVolumeID, but maybe that entry is missing?
Yes, I kept a copy of the back end files before my experiments (and also a copy of the database). I’ll try to do more digging later today and see if the dindex files have that same 541 file size before my testing.
I have six sets here and this was the only mismatch. Unfortunately it’s also the live version of the 8TB monster we were testing for blocksize effects last fall. I think it needs a full re-build.
One of my older backups has 188 (8 extra) dindex files.
When I do a test command with all and --full-remote-verification=true commands I don’t get any errors other than these Extra:'s.
This doesn’t seem to pan out as the original dindex files aren’t 514 bytes, they were a bit larger. But when I deleted them and had Duplicati recreate them, it made 514 byte files (in my case). Not sure how consistent that number is.
Nice job, this worked and pointed to the same 2 files.
That’s all I had time to test today. Will try to dig deeper tomorrow…
I was using my previous SQL query again today in Python after doing kill testing on Duplicati which was unfortunately accumulating some extra dindex files with an actual “vol” file pointing to a dblock that was missing. This caused Recreate to error when I finally ran it. I put in a test to try to catch it earlier, but I’m pretty sure it caught the issue where an error during compact forgot dindex file deletion, getting Missing file error next run. Because I’m trying to keep the test rolling, I run a Repair on this error, and apparently what’s uploaded is this sort of mostly-empty just-a-manifest file. Makes sense, as this file was set up for deletion, and no longer has an entry in IndexBlockLink to find the dblock filename it’s supposed to index.
Running backup at 2022-05-07 18:12:57.166290
Timed out after 6 seconds
Statistics: started: 14 timeout: 2 missing: 0 extra: 0 marked: 0
Running backup at 2022-05-07 18:13:08.472690
Exit code 100
stderr was:
ErrorID: MissingRemoteFiles
Found 2 files that are missing from the remote storage, please run repair
Running repair at 2022-05-07 18:13:11.702707
Listing remote folder ...
Uploading file (233 bytes) ...
Uploading file (233 bytes) ...
Statistics: started: 15 timeout: 2 missing: 1 extra: 0 marked: 0
Running backup at 2022-05-07 18:13:22.303555
Exit code 0
duplicati-ie7c5f92748ff4b5ea45e1f01f026fa52.dindex.zip
duplicati-i3c359a2d644c45f484c7e7f5d689dc52.dindex.zip
Statistics: started: 16 timeout: 2 missing: 1 extra: 0 marked: 0
These two files don’t bother the Recreate, so I might have to work harder to filter out the noise they add.
EDIT to add profiling log to confirm interrupted compact: