Problem after update to 2.1.0.2 (Beta)

Duplicate hash entries. I posted the 7-Zip image of the dblock file duplicate. Its dindex has:

{"hash":"Merv5GbRee6UJ8qz3Kgna79leGOcj9hxGDlKiJ+4U2I=","size":160},{"hash":"sPaznhDsoLc559OlMl3JyVe4Z98eKSTayhRPKq+VW10=","size":102400},
...
{"hash":"sPaznhDsoLc559OlMl3JyVe4Z98eKSTayhRPKq+VW10=","size":102400},{"hash":"FpRzPeoc5lD6UeYTcTbV9b+NqmMT+piMxrBxcPP6ZQg=","size":102400},

from two Notepad lines of duplicati-bd4cd9d9703dc4e2da48e3bef5daeb2ce.dblock.zip.aes

I tested my other Duplicati production backup (2.0.8.1 to OneDrive), and got some findings:

checker52.py finds
Duplicate block yPQ+yYSWbgWkym0AwcFeL7lG3ApFFUyGMzcVYnVIzBo=

Checker DB finds
duplicati-b76ad56f31c6c4631b0e18c05a552ba0a.dblock.zip.aes / duplicati-iacf2b22587fb4c85b62f4c59e7e6aaf2.dindex.zip.aes
duplicati-b8769cc358966495892974d30f087a561.dblock.zip.aes / duplicati-i00ccd5fe773346fd9b19dd1a8de28ae0.dindex.zip.aes

Regular DB finds
duplicati-b76ad56f31c6c4631b0e18c05a552ba0a.dblock.zip.aes / duplicati-iacf2b22587fb4c85b62f4c59e7e6aaf2.dindex.zip.aes

test with --full-remote-verification=True finds

duplicati-b8769cc358966495892974d30f087a561.dblock.zip.aes: 1 errors
	Extra: yPQ+yYSWbgWkym0AwcFeL7lG3ApFFUyGMzcVYnVIzBo=

duplicati-i00ccd5fe773346fd9b19dd1a8de28ae0.dindex.zip.aes: 1 errors
	Extra: yPQ+yYSWbgWkym0AwcFeL7lG3ApFFUyGMzcVYnVIzBo=

A theory is it starts with duplicate blocks in different volumes, then compact puts them in one.

Theory will be better when I look at B2 (the other) backup more, but it’s in debug for log error.

I also found that my old --full-remote-verification broke courtesy of a breaking change:

  --full-remote-verification (Enumeration): Activate in-depth verification of files
    After a backup is completed, some (dblock, dindex, dlist) files from the remote backend are selected for verification. Use this option to turn on full verification, which will decrypt the files and
    examine the insides of each volume, instead of simply verifying the external hash. If the option --no-backend-verification is set, no remote files are verified. This option is automatically set when
    then verification is performed directly. ListAndIndexes is like True but only dlist and index volumes are handled.
    * values: True, False, ListAndIndexes
    * default value: False

used to be below. I’m pretty sure I objected, so should have changed my own usages…

  --full-remote-verification (Boolean): Activates in-depth verification of
    files
    After a backup is completed, some (dblock, dindex, dlist) files from the
    remote backend are selected for verification. Use this option to turn on
    full verification, which will decrypt the files and examine the insides
    of each volume, instead of simply verifying the external hash, If the
    option --no-backend-verification is set, no remote files are verified.
    This option is automatically set when then verification is performed
    directly.
    * default value: false

EDIT 1:

OneDrive had only one duplicate, so B2 is more interesting. I can test in another Duplicati.

EDIT 2:

Tested with a brand new 2.1.0.105, an rclone sync of B2 destination, and the 2.1.0.104 DB.
Surprisingly, test all with --full-remote-verification=True was clean, if it worked…

A Recreate wound up with two rows in DuplicateBlock table. Not surprising, because I had previously seen the same thing in 2.1.0.104 post-backup Recreate test database, and had accounted for the other nine complaints from my checker as being dups all in single dblock.

sPaznhDsoLc559OlMl3JyVe4Z98eKSTayhRPKq+VW10= 248
FpRzPeoc5lD6UeYTcTbV9b+NqmMT+piMxrBxcPP6ZQg= 248
EnVNI+7tspnQeKOQ6SIJ+QWn1aBEgFhuRMWnKwejjFg= 248
0BJxhj10ydSioTYLA9nnkp5kJ9V9pJnY5KhZzEtWlfY= 248
yn2HXIspg7W1nFb0WH7taNRhGEDUptulV4i8min380M= 248
ZnCqRTwwKH2II1eDmKWiexmf/qDcdIc8hbVeP7pknKQ= 248
aeGC3P769M8Asuz3hLWTDd5fPAweXbmXC2JP5406ths= 248
sV9XOv0yWiw4O2tb+kKdhL4y2bpSyr/UBDsrhdVDtbM= 248
nzyAT8GXhTupLyHnAvR8X6aEj+tlKy0S7XJ3dJ27Rcw= 248

VolumeID 248 is duplicati-bd4cd9d9703dc4e2da48e3bef5daeb2ce.dblock.zip.aes

0PCRrDi7fAYE/9iytkGtdW4uRFConEq5PlkSEmGGWTg= 139,272
FR0wRA4Rdy7oeUBNoK69bymiCu1e2cpVKqNB3p5X1sE= 272,322

VolumeID 139 is duplicati-b2f8a98d827fc4faf847f74db470e0fd6.dblock.zip.aes
VolumeID 272 is duplicati-bf66b5b966c0c40a88a933fc0367784c0.dblock.zip.aes
VolumeID 322 is duplicati-baea7fcd2e269420391ffa6f5e76d9f11.dblock.zip.aes

Above was the checker DB. Below is 2.1.0.105 Recreate DuplicateBlock table

0PCRrDi7fAYE/9iytkGtdW4uRFConEq5PlkSEmGGWTg= 80,238
FR0wRA4Rdy7oeUBNoK69bymiCu1e2cpVKqNB3p5X1sE= 185,238

VolumeID 80 is duplicati-b2f8a98d827fc4faf847f74db470e0fd6.dblock.zip.aes
VolumeID 185 is duplicati-baea7fcd2e269420391ffa6f5e76d9f11.dblock.zip.aes
VolumeID 238 is duplicati-bf66b5b966c0c40a88a933fc0367784c0.dblock.zip.aes

I forget (or don’t know) the mechanism behind the Extra block problem, but I think having runs of more than one Extra in a given dblock or dindex complaint was common. Maybe seeing a run of duplicates in a dblock coming out of compact could be due to a run in now-gone feeder volumes.

Looking at this another way, is there anything in compact that will weed out incoming duplicates?