Any way to recover backup (repair and purge-broken-files don't help)?

It was described above as the sanitized version of the File view, but FileLookup is underneath:

CREATE VIEW “File” AS SELECT “A”.“ID” AS “ID”, “B”.“Prefix” || “A”.“Path” AS “Path”, “A”.“BlocksetID” AS “BlocksetID”, “A”.“MetadataID” AS “MetadataID” FROM “FileLookup” “A”, “PathPrefix” “B” WHERE “A”.“PrefixID” = “B”.“ID”

What I’m wondering is whether this problem is going to pop up again on attempt to delete:

If I try to purge a file on a backup which is missing a dindex file (I just hid it), the output is:

The operation PurgeFiles has started
Starting purge operation
Backend event: List - Started:  ()
  Listing remote folder ...
Backend event: List - Completed:  (7 bytes)
Missing file: duplicati-ibf42734f5f734d1ba14505cc794e6007.dindex.zip
Found 1 files that are missing from the remote storage, please run repair

ErrorID: MissingRemoteFiles
Found 1 files that are missing from the remote storage, please run repair
Return code: 100

Above was The PURGE command in GUI Commandline with console-log-level=Information.

is possibly the VerifyRemoteList that’s complaining – but no-backend-verification avoids that.
But is this what was intended, is it safe, and will we still be left with a missing dindex file after it?

One thing that can get in the way of fixing broken backups is that Duplicati is full of self-checks.
Ordinarily they’re to avoid making a mess worse, but interfere with trying to make a mess better.

here is the query (based on the Duplicati query)

select blocksetid, hash1, count(*) from
(
SELECT A.Blocksetid, a.hash hash1, C.Hash hash2 FROM
(SELECT BlocklistHash.BlocksetID, Block.Hash, BlocklistHash.“Index”
FROM BlocklistHash, Block
WHERE
BlocklistHash.Hash = Block.Hash AND
Block.VolumeID = 1994) A,
BlocksetEntry B,
Block C WHERE
B.BlocksetID = A.BlocksetID AND
B.“Index” >= (A.“Index” * 3200) AND
B.“Index” < ((A.“Index” + 1) * 3200) AND
C.ID = B.BlockID
ORDER BY A.BlocksetID, B.“Index”
) group by blocksetid, hash1

1994 is the volume number, I know from your post the index file name, the RemoteVolume table gives the volume number for the index, with the table IndexBlockLink I can find the volume number for the block file.
Next is to match ‘hash1’ with the value given in your post, it gives blocksetid values (the ID of Blockset table). Then it’s only a matter of matching the blockset ID found with the values in the table FixedFile.

What’s really complicated for me is to understand how these values got here. I don’t know at this point. Looking at the block file could help, but you have first to understand Duplicati file structure.

select name from remotevolume where id=1994;
duplicati-b15f91b7389824149a4b644861e09e685.dblock.zip.aes

To expand on what was said by @ts678, I don’t know if it’s possible to PURGE (not Delete indeed, Delete is for versions) the file in this case.

Database and destination internals will help if @solf wants to try to follow along with the discussion.
There’s also some sample (non-error) output from the Python destination checker script I mentioned, however if a forecast of recreate success isn’t needed, one can just save the old database then try it.

I’ll assume this refers to the “Internal consistency check” failure originally posted, and discussed here.

so what Duplicati calls a blocklist which identifies blocks of a multi-block file by those block hashes concatenated together. A blocklist is itself a block in Block table, but is also in the BlocklistHash table. Typically the same dual nature is visible in the dindex file, where it’s in both the vol and list folders.

Here though, we have a missing dindex file, but I guess its dblock file still exists, and has all its data?
Sometimes it’s also helpful to look at the RemoteOperation table or somehow construct a chronology.
Use of Path field is subject to age based deletion, but the Data field of list will show even old dates.

It wasn’t clear if there was any further evidence found in the database to predict reasonable recreate, however sometimes a bug report after recreate (if it works) can be compared to old DB, for more info.

Yes, Duplicati recreates this block from the block hashes present in the database, calculates its hash, and compares it to what is stored also in the database. Obviously this points at a bug, it should not happen (unless there is a hardware problem of course). Without access to the backend data itself it’s difficult to be sure. Given that this kind of error does not seem to be frequent on this forum, a hardware problem is not unthinkable.

the RemoteOperation don’t contain either the index or dblock remote file names in the Path column.
I did not think to explore the blob field, indeed it can be found in the 2023-10-26 06:06:04 record, following something that is obviously the removing of an old version, then there is a sequence of get that looks like a verification. Then next access is on 2023-10-31 16:49:30, a list and then the index file is missing. It’s difficult to understand how this file has disappeared without Duplicati having sent a DELETE operation.
Unless… something else (than Duplicati) deleted the file ?

This can probably be sanity-checked, for example are the block counts in BlocksetEntry correct? Those influence both the length of the blocklist and its hash. The blocklist length can be found in Block table to divide by 32 bytes per SHA-256, then see if BlocksetEntry for the blockset has the right number of rows.

As an example from one of my databases, hash iY95060E28d3aMM8xBaRWHcZ0j5pyZdP9VWBFE/EPt4= has Block length of 61792, so 1931 blocks, which matches the BlocksetEntry row count for the blockset.

When a DB transaction rolls back, one can’t totally trust that the database has the latest information in it. Unfortunately, it’s almost never the case that the user has a good enough file from a --log-file option.

These are possibly separable questions. One might be how the dindex vanished. Another is how internal information got sufficiently messed up that a blocklist couldn’t be rebuilt, so alarm raised over hash error.

Typically a lost dindex is rebuilt and replaced by repair from good database information, but not this time.

Apologies, other things got in the way, so I haven’t replied for a while.

I am uncertain about this, as it’s been a while ago and I didn’t write it down at the time, I think the problems with this backup started with complaints about some other file, which I then deleted and did purge/repair and then it got stuck where it is now.

But honestly I’m very uncertain about this, I might be conflating issues I had with different backups.

I realize this is unhelpful, just throwing it out in case it might matter.

On the subject of ‘what next’ it sounds like rebuilding the database might be the most realistic way to try and to recover this backup.

However if @gpatel-fr or @ts678 are interested in trying to figure out what went wrong, I’m willing to try.

The way I see it, there are two sort-of-separate problems:

  • Why the backup got corrupted? Given the hardware issues I’ve experienced, it might well have been the cause.
  • Why repair/purge are failing to repair the backup – and what can be done to fix them – this seems to me to be the more important question now.

For further investigation I’m willing to try to look into specific things if I’m told what to look for; or it might be far easier to hop on some kind of Zoom/meeting with a screen share where I can run SQLs against the original DB (assuming this is helpful) or whatever else.

Please let me know if that’s something you’re interested in.

Also the logging was mentioned – I do have logs enabled (and stored in the file) for all the operations, but it doesn’t seem to record much useful info in there anyway (I couldn’t really find anything particularly helpful myself):

It might help us decide that it’s too messy to figure out. Fortunately, (or not?), there’s still hope.

One thing that’s being improved (but not out yet) is to put more in the job log. For now, check in
About → Show log → Stored because fatal errors tend to divert from the job log into server log.

Job log sometimes has clues too, but not in the bug report (I hope) due to the privacy sanitizing.
One person with a tough problem and SQL skill just posted a file with the LogData table values.
That filled in a few blanks, as did server log check, but we still haven’t been able to find a cause, therefore I’ve been trying a repro based on the little we could get. So far, no luck with that either.

Importance does not give us Duplicati and SQL wizards, so while it’s important, it can take awhile.
Feel free to PM me the post-VACUUM database (I hope there’s a bug fix planned), and I can try…
Any additional clarity about what and when that you can glean from logs, etc. might also be handy.

Apologies, didn’t mean this in a ‘negative’ way, I realize that this is purely a volunteer effort and nobody owes me anything etc (and appreciate all the people who are willing to try and help).

I only wanted to say that it feels that repair/purge failing is a more important issue than the original backup breaking because the original breakage is always a possibility due to hardware failures and similar.

Sent.

This is what it currently has for the backup in question (which is set up to run daily):

Nov 27, 2023 12:59 AM: Failed while executing "Backup" with id: XX

Duplicati.Library.Interface.RemoteListVerificationException: Found 1 files that are missing from the remote storage, please run repair

   at Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, IEnumerable`1 protectedFiles)

   at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify(BackendManager backend, String protectedfile)

   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

   at CoCoL.ChannelExtensions.WaitForTaskOrThrow(Task task)

   at Duplicati.Library.Main.Controller.<>c__DisplayClass14_0.<Backup>b__0(BackupResults result)

   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)

   at Duplicati.Library.Main.Controller.Backup(String[] inputsources, IFilter filter)

   at Duplicati.Server.Runner.Run(IRunnerData data, Boolean fromQueue)

Thanks for letting me know about this log, didn’t even know it exists.

I can’t find anything there about my repair/purge attempts though; it does, unfortunately, contain a lot of ‘spam’ for me, because of:

System.IO.IOException: Cannot create "XXXX" because a file or directory with the same name already exists.

   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
..

this is an artifact of how I tend to block duplicati backups (when I need to – by preventing it from writing to the jobs log file) – and I did a bunch of this during the past month due to all the issues I’ve been having.

Didn’t know it generates and stores all those errors.

I did the easier part first, because there’s documentation (sort of) on using SQLite JSON functions.

sqlite> WITH
   ...> OldList AS (SELECT value FROM json_each((SELECT Data FROM RemoteOperation WHERE RemoteOperation.ID = 12831))),
   ...> OldName AS (SELECT json_extract(value, '$.Name') FROM OldList),
   ...> NewList AS (SELECT value FROM json_each((SELECT Data FROM RemoteOperation WHERE RemoteOperation.ID = 12862))),
   ...> NewName AS (SELECT json_extract(value, '$.Name') FROM NewList)
   ...> SELECT * FROM OldName EXCEPT SELECT * FROM NewName;
duplicati-i4539710a65934bffa2f3522ac1bbbcab.dindex.zip.aes
duplicati-i45c97c53bff44ee4b2d17f190417040e.dindex.zip.aes
sqlite> WITH
   ...> OldList AS (SELECT value FROM json_each((SELECT Data FROM RemoteOperation WHERE RemoteOperation.ID = 12831))),
   ...> OldName AS (SELECT json_extract(value, '$.Name') FROM OldList),
   ...> NewList AS (SELECT value FROM json_each((SELECT Data FROM RemoteOperation WHERE RemoteOperation.ID = 12862))),
   ...> NewName AS (SELECT json_extract(value, '$.Name') FROM NewList)
   ...> SELECT * FROM NewName EXCEPT SELECT * FROM OldName;
sqlite>

so basically two dindex files vanished mysteriously without a recorded Duplicati delete, comparing the list at end of the October 26, 2023 4:54:35 AM UTC backup ID 281 at end of nice series against list at start of the Repair ID 285 at October 31, 2023 4:49:30 PM. In between were 282, 283, 284 which did not make any job log, so probably failed and possibly left server log errors for when I go to review them.

Moving ahead though, and now looking at put rather than list in RemoteOperation table, Repair 285 replaced duplicati-i45c97c53bff44ee4b2d17f190417040e.dindex.zip.aes (which is the expected result).
Not replacing duplicati-i4539710a65934bffa2f3522ac1bbbcab.dindex.zip.aes was an unexpected error.

I modified the hard-for me SQL cited earlier to use the relevant-here volume ID of 1994. After minutes, 59543 came out, but the last 35802 had the same first column, which seemed a little too high but right when one outputs 13 sets of 2754. 13 comes from posted list. 2754 comes from there too, due to size.

A blocklist is limited in size to a block, so for 102400 block one can have 3200 hashes, so 327680000 bytes. For a 1920401408 byte file, this takes 5.8606 blocklists, so 6 with last one handling 282001408 bytes, so 2753.92 blocks. Round up and you get 2754 blocks, and I’d guess second column has hash.

GetBlocklists looks like it looks for changes in first column. Is it maybe confused by the 13 repetitions?

EDIT 1:

This was my sampling in result rows to support the idea that result ends in 13 repetitions of 2754 rows:

23742	8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=	aXU9w4m0Q+F+acfrUBcE4DrQUjXe63sleLQVDwGYpSo=
26496	8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=	aXU9w4m0Q+F+acfrUBcE4DrQUjXe63sleLQVDwGYpSo=
29520
32004
34758
37512
40266
43020
45774
48528
51282
54036	8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=	aXU9w4m0Q+F+acfrUBcE4DrQUjXe63sleLQVDwGYpSo=
56790	8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=	aXU9w4m0Q+F+acfrUBcE4DrQUjXe63sleLQVDwGYpSo=
59544 (one past last of 59543 rows)

EDIT 2:

Looking at the last two of the 13 blocksets, they are different (so no rule violation), but have same end:

sqlite> SELECT BlocksetID,Hash FROM BlocklistHash WHERE BlocksetID IN (2450029,2456499);
2450029|u7fEuVAS14L0RGfnD462XtiRUsEU8fB7iiTfryH+DSo=
2450029|HCDzMlYT/zjVnea/ImTs1VaDxuqUyo5r6caxTJrX43A=
2450029|/L5Hgl6mfT+DullSqKJQb5ja1ki5QEEf9+YAticXO1o=
2450029|tNvWtgOzwcNsrYAOnwNV9mAxqDgXf/T6W1aOrrpsmnw=
2450029|QsimL455vKO8BLsycbCXhUzVOV0MKX48MJkb4QGPaXU=
2450029|8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=
2456499|d0LgdQg6VmrmubOyuYBYN+ZmvqKV1CNfjQ7nI1Fk8hw=
2456499|Kt7EDdt9Mj3Uy4Cnzpt6YVumMa+SNwDpYGw6CRwHz5E=
2456499|Tw7XNIWJoDP/fy0CGvmQocSJKdQ6xfZvYTQPmmapn8s=
2456499|8YnAGbk6//+00W8CdiHCCGN9lWGQ2LIjBKVtP4mOaV8=
2456499|ny/qNhOaIajHuBEFMvCaVloDBYDIe4PMpyv59SNBsXA=
2456499|8ABF0EvjURZh3nonRt1C77PCoTNTDfyNI1/hTuLNaMk=
sqlite>

EDIT 3:

I have to say I’m having a hard time finding test steps to make that SQL output a repeating pattern, however I don’t really understand either the SQL or C#, so maybe a developer can give an opinion.
Mis-grouping of the query output still seems an attractive explanation, but I don’t know if it’s correct.

Test case:

--blocksize=1KB
Create 32769 zero bytes file
Backup
Edit first byte to 1
Backup
Delete oldest dindex
Repair
  • 2023-11-29 15:43:35 -05 - [Error-Duplicati.Library.Main.Operation.RepairHandler-CleanupMissingFileError]: Failed to perform cleanup for missing file: duplicati-i2185398e90964c16a6d1f2c6a88ea3c8.dindex.zip, message: Internal consistency check failed, generated index block has wrong hash, soneqSylq6Xy4YkaGvEb4nkUxIhU2w/ltLuVwTfg8tY= vs FAbgWIHimTZ3ZtMT4mwFVk7JG/ch0xcmvW5G5gaJU5o=

I was fiddling with a hex editor to get this, but any text editor that can make a long line will probably do.
I’m still not understanding the SQL or C# well though, but maybe this small case plus experts can help.

1 Like

Do you mean the source (empty) file here or the backup file?

I mean edit the source, but it’s not empty – it’s 0x00 32769 times which is a full blocklist plus one byte.
The math works as 1024 byte block being able to hold 32 blocklist hashes of 32 bytes each, with each referencing a 1024 byte block before it hits its limit at 32768 bytes. 32769 demands a second blocklist,
but if you’re doing this with ASCII 0 and 1 in a text editor, a line ending (if added) probably won’t hurt it.

Apologies, I misused the word ‘empty’ here, I understand that the file is not actually empty.

So, it other words, under certain conditions without doing anything ‘illegal’ (except losing one of the dindex files), it’s possible to bring Duplicati into a state where it’s non-functional (can’t backup, can’t repair)?

And it sounds like it has to do with individual files that are many times the block size?

For example, here’s the additional pain in the Recreate I just did (which you’ve been holding off on):

Nov 29, 2023 5:26 PM: Backend event: Get - Completed: duplicati-b749a1c706c6443f7984b600c9fffb566.dblock.zip (2.19 KB)
Nov 29, 2023 5:26 PM: Backend event: Get - Started: duplicati-b749a1c706c6443f7984b600c9fffb566.dblock.zip (2.19 KB)
Nov 29, 2023 5:26 PM: Probing 1 candidate blocklist volumes
Nov 29, 2023 5:26 PM: Probing 1 candidate blocklist volumes: duplicati-b749a1c706c6443f7984b600c9fffb566.dblock.zip

makes me think that repair (not recreate, which is slower) with index-file-policy turned down from Full might work too. Perhaps Lookup, or if not that None. I can’t test it without resetting, as I tested recreate.

There are definitely times when Recreate is the path out. It would be nice to not need it, but it happens.

EDIT 1:

What’s unknown is whether the test case behaves exactly like your backup, but after I tried other fixes (like you did), it seemed resistant (I didn’t compare exact output though). Somebody could look at that, however as mentioned, Repair is supposed to upload new dindex files – except when it gets confused. That’s probably the main thing that a willing developer should look at – why is the test case doing that?

EDIT 2:

Lowering the index-file-policy for Repair (if it works, and you can try it on the test case if you like) isn’t a perfect solution, and would probably leave future recreates with the same need to read a dblock. What it does is to keep the repair off the path where it’s getting confused putting blocklist info in dindex.

Whenever an actual solution is coded, if a slightly slower Recreate is a worry (I’m not sure why it’d be), then you could delete the toned-down dindex and have the future fixed Repair upload a perfect dindex.

Sometimes the developers even offer very unofficial test builds to users. I don’t know if they will here…
Before any code can be changed, they probably need to figure out why the test case is being confused.

If you can wait a little more, maybe the devs can make a Canary with a better bug report, plus repair fix.

EDIT 3:

That’s normally handled fine. Block size is tiny – 100 KB. I think it has to do with several files having the same ending, although it’s still not well understood. The C# code that reads the SQL result knows not to overfill the block, so possibly this saves some cases. Having a partially full blocklist at end misses that.

If you want to entertain yourself with the test case, open the dindex .zip file and look in its list folder. You’ll see a combination of 1024 byte blocklists (the full one), and 32 byte blocklists (for the extra byte).

Great finding! I can’t imagine how much trial and error that was to find. I was able to reproduce this with a simple text file with all ‘a’, changed the first letter to ‘b’.
I commented out the check to see what goes wrong in the repaired index file:

Original blocklist v106_7c-_S7Gw2rTES3ZM-_tY8Thy__PqI4nWcFE8tg= (hex view):

CA978112CA1BBDCAFAC231B39A23DC4D
A786EFF8147C4E72B9807785AFEE48BB

Recreated (which fails hash check):

CA978112CA1BBDCAFAC231B39A23DC4D
A786EFF8147C4E72B9807785AFEE48BB
CA978112CA1BBDCAFAC231B39A23DC4D
A786EFF8147C4E72B9807785AFEE48BB

So obviously, it seems that the repair duplicated the blocklist.

Technical stuff

It comes down to this SQL query from LocalDatabase.GetBlocklists:

SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM  "BlocklistHash","Block" WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ?) A,  "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND  "B"."Index" >= ("A"."Index" * 32) AND "B"."Index" < (("A"."Index" + 1) * 32) AND "C"."ID" = "B"."BlockID"  ORDER BY "A"."BlocksetID", "B"."Index";

This already returns the row twice:

Hash	Hash
DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	LtyYaEfiCbQBbhQabchxbTIHNQ9BaWk4LUMVOb8pLko=
[... total 32 of these entries, because the file is filled with identical data, this is expected]
DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	LtyYaEfiCbQBbhQabchxbTIHNQ9BaWk4LUMVOb8pLko=
v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	ypeBEsobvcr6wjGzmiPcTaeG7/gUfE5yuYB3ha/uSLs=
v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	ypeBEsobvcr6wjGzmiPcTaeG7/gUfE5yuYB3ha/uSLs=

It is from the subquery:

SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM  "BlocklistHash","Block" WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ?;

BlocksetID	Hash	BlocksetID	Index	Hash	ID	Hash	Size	VolumeID
3	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	3	0	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	4	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	1024	3
3	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	3	1	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	6	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	32	3
8	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	8	1	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	6	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	32	3

The first entry is the changed part of the file, but the second one is unchanged. The blocklist table itself looks like this:

BlocksetID	Index	Hash
3	0	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=
3	1	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=
8	0	lQw8vVLiZ8fPhU7yZsjGud8Lczxqp8KDrKPQcqzNfGw=
8	1	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=

Because block 8-1 is the same as 3-1, it is stored on the same volume and is not filtered, even though it is for a different version entirely. The code does not realize that these belong to different blocksets (3 vs 8), and combines both into a single file. Maybe it should check the blockset ID instead of the hash to know when to emit the next entry? This also only happens with blocklists that are not full, because otherwise the maximum file size would split the entries automatically.

The circumstances for this failure are:

  • have a file larger than a single blocklist can fit, but also not evenly divisible (the final blocklist must not be full, otherwise it will just be overwritten multiple times)
  • the file changes at the beginning, but the final blocklist stays the same (however many bytes that are)
  • the index corresponding to the unchanged final blocklist goes missing
  • repair will repeat the blocklist while recreating the index (even multiple times for more than 2 file versions) and append everything, then fail due to the incorrect hash

TL;DR

The SQL query for GetBlocklists needs to be changed to filter blocklists correctly in this edge case.

The bad news

This broken GetBlocklists function is also used when writing indices for Compact (without a consistency check), so it might cause undetected, incorrect index files after compact. I am not sure how much of an impact that might have in existing backups. I tried a recreate with such a broken index and got the error:

Recreated database has missing blocks and 2 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

So, in the future it would be another possibility for people who get this error to delete the list folder from the index file of the broken volume and see if that fixes it.

Edit: If this is deemed severe enough, it would be possible to add the hash consistency check in Recreate, so that future versions can cope with this kind of broken index. It would just have to download an additional block volume.

The good news

There is probably no damage to the database, so a recreate with --index-file-policy=Lookup will be a remedy in the short term. Change it back after the repair is successful, and maybe keep note of the index file that only has partial data.

Any database recreate will work correctly, because it uses a different way to build the blocklists.

1 Like

Very nice! Much appreciated.

You don’t want to know, but you actually gave some hints. First I was tidy, and tried full blocklists.

but not only at the first byte (where the test case wound up). I also tried at offset 0x400 to see if an additional offset could help prod Duplicati into appending the all-zero-first-block’s hash to previous.

If I had had a longer file, perhaps I could have changed the middle too (unsure), just not the ending.

An early attempt forced a compact into one dblock/dindex, but that worked so I backed down to the original problem report, with the final blocklist being the same as an old one in some earlier volume.

Being able to reproduce and then analyze problems is why making a good test case is worthwhile.

I don’t know (actually I’m still a bit unclear on things), but I wasn’t sure the current SQL had enough information in it for the C# to sort things out. Regardless, if it changes, I guess the C# also changes.

Definitely some great progress, having untangled the knot a bit. BTW the SQL reminds me of a knot having references in various directions. If it gets changed, I wonder if there is a cleaner way to do it?

Thanks again!

EDIT:

Credit of course to @solf who provided the bug report that allowed findings that led to the test case. Without these, it’s frequently hard to get even a start on figuring out how some weird failure happens.
This one is pretty weird. If you wish to search the Internet for such cases, there are only a couple out.

Seemed a very good fit for the dindex loss test inadvertently done by the HDD issues in current topic,
except it wasn’t reproducible for some reason. It also links to a GitHub issue that may be related and gathered a number of me-too posts, however it’s sometimes unclear if root causes are the same one.

Original post stack trace is running RecreateMissingIndexFiles which then runs CreateIndexVolume which does have the consistency check on a GetBlocklistsAsync which runs the faulty GetBlocklists.

I think I found one way to do it, but it is not cleaner:

SQLite: GROUP BY

Each expression in the result-set is then evaluated once for each group of rows. […] Otherwise, it is evaluated against a single arbitrarily chosen row from within the group.

An arbitrary selection should be fine, because only the hash is of interest (which is the same). The other values are only used to select the blocks in the database, which should lead to the same hashes. I tested this and it does fix the bug:

SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM  "BlocklistHash","Block" WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = ? GROUP BY "Block"."Hash";

BlocksetID	Hash	BlocksetID	Index	Hash	ID	Hash	Size	VolumeID
3	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	3	0	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	4	DWypxEXA7TXbM2azclwoTH+Y7rEgA9kbDa4/K3ycNdA=	1024	3
3	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	3	1	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	6	v106/7c+/S7Gw2rTES3ZM+/tY8Thy//PqI4nWcFE8tg=	32	3

The external query also works as expected now and repair will generate a correct index volume. Also I saw with some more playing around that it is also possible to have the same blocklist file multiple times in the zip file, but that should not cause any trouble (and is also fixed by this).

Obviously, this is a bit of a risky, but necessary change. It should definitely be tested whether the results are the same on some bigger databases with more history. If someone has a better idea for this query, I am open for suggestions.

Seemingly the early SQL standards had GROUP BY intended for aggregate functions over the group. Eventually the SQL vendors decided to extend it to non-aggregate, basically just pick one of the rows.
I’ve got some web pages open on the topic, but it would also be possible to look at this independently.

SQLite changelog isn’t clear on when they began supporting non-aggregate, and we do have several extremely old systems (CentOS 7 for example) that shouldn’t be broken. There’s an open issue about

and IIRC I think this is the reason I’m holding off on suggesting CTEs, e.g. to get more readable SQL.
You could probably study Duplicati code to see how prevalent the non-aggregate GROUP BY form is.

My other initial question is whether it matters which BlocklistHash Index is the recipient of a duplicate.
We’re letting Index 1 be the victim. Original case used 5, but if test case got a at Index 2, would it fail?

Another slight concern is we’re probably watering down whatever sanity check this check is aiming at. Ignoring duplicate blocksets means not checking their BlocksetEntry rows. Advantage is it saves time.

An easy-ish workaround is to add e.g. MAX(xxx) to every column, that way you’d have a ‘random selection’ from everything under ‘group by’ (unless it matters that columns read from one row rather than potentially from different rows).

1 Like