Duplicati 2 vs. Duplicacy 2

From reading the source code for Duplicacy, it appears that they actually build a table of all known blocks and keeps this in memory. We had that option some time ago, but it was a real memory hog, so I removed it.

I have just tested with a small backup (~2000 blocks), and a block lookup cache had a tiny positive effect on the backup speed. But it is possible that this speedup is much more pronounced with larger backups, so I made a canary build with the option --use-block-cache. If it turns out that it really does improve performance without ridiculous memory overhead, I will convert it to a --disable-block-cache option.

If @dgcom and @jl_678 have a test setup, I would like to know the performance difference from the --use-block-cache option. I am aware of another performance issue related to how a file’s previous metadata (size, timestamp, attributes) is fetched, so it is possible that this is what really makes the difference, but maybe the --use-block-cache can help shed light on that.

Version with the --use-block-cache switch is here (2.0.2.6): Releases · duplicati/duplicati · GitHub

If that is how it works, it would require that the destination supports some kind of “If-Modified-Since” approach.

Without actually running Duplicacy, it does appear that it lists all remote blocks first, and keeps them in memory:

I think it works by having a copy (maybe without contents) of the remote store, in the local filesystem:

Some of the complexity in Duplicati is there to handle big backups. If you have a file of 1TB, you need 300mb of raw hash data (with 100kb blocks). To avoid a blowup in memory and large file-lists, Duplicati keeps it in database and uses “blocks og blocks”.

But if you have a smaller backup, this should not perform worse.