From reading the source code for Duplicacy, it appears that they actually build a table of all known blocks and keeps this in memory. We had that option some time ago, but it was a real memory hog, so I removed it.
I have just tested with a small backup (~2000 blocks), and a block lookup cache had a tiny positive effect on the backup speed. But it is possible that this speedup is much more pronounced with larger backups, so I made a canary build with the option
--use-block-cache. If it turns out that it really does improve performance without ridiculous memory overhead, I will convert it to a
If @dgcom and @jl_678 have a test setup, I would like to know the performance difference from the
--use-block-cache option. I am aware of another performance issue related to how a file’s previous metadata (size, timestamp, attributes) is fetched, so it is possible that this is what really makes the difference, but maybe the
--use-block-cache can help shed light on that.
Version with the
--use-block-cache switch is here (126.96.36.199): Releases · duplicati/duplicati · GitHub
If that is how it works, it would require that the destination supports some kind of “If-Modified-Since” approach.
Without actually running Duplicacy, it does appear that it lists all remote blocks first, and keeps them in memory:
I think it works by having a copy (maybe without contents) of the remote store, in the local filesystem:
Some of the complexity in Duplicati is there to handle big backups. If you have a file of 1TB, you need 300mb of raw hash data (with 100kb blocks). To avoid a blowup in memory and large file-lists, Duplicati keeps it in database and uses “blocks og blocks”.
But if you have a smaller backup, this should not perform worse.