Row missing from index BlockHashSize

NaN · October 2, 2019, 11:36am

We’ve been evaluating Duplicati for about 6 months and have run into an issue that hopefully a developer can answer. This issue is a duplicate of this question: Database disk image is malformed

After about 20-25 successful backups, Duplicati errors out with a SQLite constraint violation on BlockHashSize.

We removed the index and attempted to recreated it, but SQLite errors out with the same constraint violation. If we add VolumeID to the BlockHashSize index, or we just drop the index entirely, then backups seem to finish with no errors. However, without knowing what the logic behind the index was supposed to do, we are just guessing. We are hoping that a developer with knowledge of this index can chime in and help.

Should the VolumeID in the block table also be included in the index? It seems to me that if the VolumeID is the same thing as a backup set, then maybe for uniqueness, it should be added to the BlockSizeHash index. Can anyone confirm this?

Ingmyv · October 2, 2019, 1:01pm

Database disk image is malformed refers to physical corruption: (invalid 0s and 1s on the disk) But what’s happening now is “logical” corruption: data is correctly written but is wrong. Did you take a back-up before making these changes? Because changing the uniqueness constraint allowed logical corruption to be written.

If we add VolumeID to the BlockHashSize index, or we just drop the index entirely, then backups seem to finish with no errors.

It finished “with no errors” because the protection against writing erroneous, logically corrupt data was removed.

Here’s why adding the volumeid allowed invalid, corrupt data to be written.

Example:

CREATE TABLE a ( pk int, ak int, val text );
CREATE UNIQUE INDEX a_indx on a (pk);

INSERT INTO a (pk, ak, val) values (1, 2, 'abcdef');

RESULT:
|pk|ak|val|
|1|2|abcdef|

INSERT ITO a (pk, ak, val) values (1,4,'ghij');
-- Unique key constraint on PK column, hence: blocked.

DROP INDEX a_indx;
CREATE UNIQUE INDEX a_indx on a (pk, ak);

INSERT ITO a (pk, ak, val) values (1,4,'ghij');

RESULT:
|pk|ak|val|
|1|2|abcdef|
|1|4|ghij|

By adding the second condition to the uniqueness a compound unique key constraint was created. To violate the constraint BOTH values need to be unique across the entire table. This modified index does not protect against repetaedly writing pk = 1. So, in the simple demo above we couldn’t’ve written that, and ak = 2 a second time. However we can create pk=1 and ak=4 which is (for the purposes of argument) logically corrupt.

Could you provide the exact error messages and the exact output of the pragma integrity_check; as well as the exact drop and create statements?

Adding and removing the unique key constraint index will not eliminate the issue of the system trying to insert a row that violates the index. (However: removing the index and allowing the system to proceed will have dire consequences. Changing the index uniqueness condition allowed)

NaN · October 2, 2019, 1:49pm

Thank you for the response.

The exact output from “pragma integrity_check” is:
“row 110212 missing from index BlockHashSize”

The exact commands we used to drop and recreate the index are:
drop index BlockHashSize
-and-
CREATE UNIQUE INDEX “BlockHashSize” ON “Block” (“Hash”, “Size”);

Thank You

ts678 · October 2, 2019, 3:34pm

My guess is no, but I’m not the original designer. There are many lookups of blocks to see if they’re already backed up, and these need to be fast, thus the index. One can lookup based on hash and size at any time, however adding VolumeID (which points to a specific dblock volume in RemoteVolume table) may hurt that.

Blocks also move from dblock volume to another dblock volume regularly, e.g. when a compact gets done.

Processing similar data talks about this in the deduplication section. I haven’t actually checked in code yet, and I’m also not very familiar with databases, so I appreciate the expertise that @Ingmyv is bringing here.

Ingmyv · October 4, 2019, 12:46am

It allows data to be written, which otherwise would be rejected. So I assume no too.

In fact, I worry that there may be an issue later with this new unexpected data.

But: a good question is, why was it trying to write data that violated a unique key constratint?