Duplicate hash entries. I posted the 7-Zip image of the dblock file duplicate. Its dindex has:
{"hash":"Merv5GbRee6UJ8qz3Kgna79leGOcj9hxGDlKiJ+4U2I=","size":160},{"hash":"sPaznhDsoLc559OlMl3JyVe4Z98eKSTayhRPKq+VW10=","size":102400},
...
{"hash":"sPaznhDsoLc559OlMl3JyVe4Z98eKSTayhRPKq+VW10=","size":102400},{"hash":"FpRzPeoc5lD6UeYTcTbV9b+NqmMT+piMxrBxcPP6ZQg=","size":102400},
from two Notepad lines of duplicati-bd4cd9d9703dc4e2da48e3bef5daeb2ce.dblock.zip.aes
I tested my other Duplicati production backup (2.0.8.1 to OneDrive), and got some findings:
checker52.py finds
Duplicate block yPQ+yYSWbgWkym0AwcFeL7lG3ApFFUyGMzcVYnVIzBo=
Checker DB finds
duplicati-b76ad56f31c6c4631b0e18c05a552ba0a.dblock.zip.aes / duplicati-iacf2b22587fb4c85b62f4c59e7e6aaf2.dindex.zip.aes
duplicati-b8769cc358966495892974d30f087a561.dblock.zip.aes / duplicati-i00ccd5fe773346fd9b19dd1a8de28ae0.dindex.zip.aes
Regular DB finds
duplicati-b76ad56f31c6c4631b0e18c05a552ba0a.dblock.zip.aes / duplicati-iacf2b22587fb4c85b62f4c59e7e6aaf2.dindex.zip.aes
test with --full-remote-verification=True finds
duplicati-b8769cc358966495892974d30f087a561.dblock.zip.aes: 1 errors
Extra: yPQ+yYSWbgWkym0AwcFeL7lG3ApFFUyGMzcVYnVIzBo=
duplicati-i00ccd5fe773346fd9b19dd1a8de28ae0.dindex.zip.aes: 1 errors
Extra: yPQ+yYSWbgWkym0AwcFeL7lG3ApFFUyGMzcVYnVIzBo=
A theory is it starts with duplicate blocks in different volumes, then compact puts them in one.
Theory will be better when I look at B2 (the other) backup more, but it’s in debug for log error.
I also found that my old --full-remote-verification
broke courtesy of a breaking change:
--full-remote-verification (Enumeration): Activate in-depth verification of files
After a backup is completed, some (dblock, dindex, dlist) files from the remote backend are selected for verification. Use this option to turn on full verification, which will decrypt the files and
examine the insides of each volume, instead of simply verifying the external hash. If the option --no-backend-verification is set, no remote files are verified. This option is automatically set when
then verification is performed directly. ListAndIndexes is like True but only dlist and index volumes are handled.
* values: True, False, ListAndIndexes
* default value: False
used to be below. I’m pretty sure I objected, so should have changed my own usages…
--full-remote-verification (Boolean): Activates in-depth verification of
files
After a backup is completed, some (dblock, dindex, dlist) files from the
remote backend are selected for verification. Use this option to turn on
full verification, which will decrypt the files and examine the insides
of each volume, instead of simply verifying the external hash, If the
option --no-backend-verification is set, no remote files are verified.
This option is automatically set when then verification is performed
directly.
* default value: false
EDIT 1:
OneDrive had only one duplicate, so B2 is more interesting. I can test in another Duplicati.
EDIT 2:
Tested with a brand new 2.1.0.105, an rclone sync of B2 destination, and the 2.1.0.104 DB.
Surprisingly, test all
with --full-remote-verification=True
was clean, if it worked…
A Recreate wound up with two rows in DuplicateBlock table. Not surprising, because I had previously seen the same thing in 2.1.0.104 post-backup Recreate test database, and had accounted for the other nine complaints from my checker as being dups all in single dblock.
sPaznhDsoLc559OlMl3JyVe4Z98eKSTayhRPKq+VW10= 248
FpRzPeoc5lD6UeYTcTbV9b+NqmMT+piMxrBxcPP6ZQg= 248
EnVNI+7tspnQeKOQ6SIJ+QWn1aBEgFhuRMWnKwejjFg= 248
0BJxhj10ydSioTYLA9nnkp5kJ9V9pJnY5KhZzEtWlfY= 248
yn2HXIspg7W1nFb0WH7taNRhGEDUptulV4i8min380M= 248
ZnCqRTwwKH2II1eDmKWiexmf/qDcdIc8hbVeP7pknKQ= 248
aeGC3P769M8Asuz3hLWTDd5fPAweXbmXC2JP5406ths= 248
sV9XOv0yWiw4O2tb+kKdhL4y2bpSyr/UBDsrhdVDtbM= 248
nzyAT8GXhTupLyHnAvR8X6aEj+tlKy0S7XJ3dJ27Rcw= 248
VolumeID 248 is duplicati-bd4cd9d9703dc4e2da48e3bef5daeb2ce.dblock.zip.aes
0PCRrDi7fAYE/9iytkGtdW4uRFConEq5PlkSEmGGWTg= 139,272
FR0wRA4Rdy7oeUBNoK69bymiCu1e2cpVKqNB3p5X1sE= 272,322
VolumeID 139 is duplicati-b2f8a98d827fc4faf847f74db470e0fd6.dblock.zip.aes
VolumeID 272 is duplicati-bf66b5b966c0c40a88a933fc0367784c0.dblock.zip.aes
VolumeID 322 is duplicati-baea7fcd2e269420391ffa6f5e76d9f11.dblock.zip.aes
Above was the checker DB. Below is 2.1.0.105 Recreate DuplicateBlock table
0PCRrDi7fAYE/9iytkGtdW4uRFConEq5PlkSEmGGWTg= 80,238
FR0wRA4Rdy7oeUBNoK69bymiCu1e2cpVKqNB3p5X1sE= 185,238
VolumeID 80 is duplicati-b2f8a98d827fc4faf847f74db470e0fd6.dblock.zip.aes
VolumeID 185 is duplicati-baea7fcd2e269420391ffa6f5e76d9f11.dblock.zip.aes
VolumeID 238 is duplicati-bf66b5b966c0c40a88a933fc0367784c0.dblock.zip.aes
I forget (or don’t know) the mechanism behind the Extra block problem, but I think having runs of more than one Extra in a given dblock or dindex complaint was common. Maybe seeing a run of duplicates in a dblock coming out of compact could be due to a run in now-gone feeder volumes.
Looking at this another way, is there anything in compact that will weed out incoming duplicates?