not here to start flame war on which is better / worse; just want to curious on technical reasons.
What is happening -
Duplicati seems to not be really deduplicating files which are theorectically 100% deduplicatable (is that a word?)
Doing some benchmarking / comparison work as documented here (https://forum.duplicati.com/t/big-comparison-borg-vs-restic-vs-arq-5-vs-duplicacy-vs-duplicati)
When researching found the following thread on Restic board which I thought would be useful to add to test suite (https://forum.restic.net/t/dedup-only-0-3-efficient-on-100-duplicate-data)
Observation / Steps to Replicate -
As per thread linked above - it is a back-to-back backup of component files and then a combined file comprised of urandom files of the following sizes:
- 100 * 1MB
- 16 * 8MB
- 4 * 128MB
The backup size was compared after each run. As the combined file is simply a concatenation of the components, a perfect dedup algorithm could attain 100% efficiency (and no changes in the backup size sans small metadata impacts)
Duplicati seems to be having real difficulty. For the smaller sizes; all programs struggle but Duplicati clearly still behind. For the last, the file sizes are big enough that dedup kicks in for others to a virtually full extent but Duplicati is still struggle. Given that last is 128MB files - it can’t be because of chunks exceeding the file size right?
Results in the below. I ran Duplicati twice coz the result seemed out of line but it turns out to be consistent