Duplicati 2 vs. Duplicacy 2

kenkendk · September 16, 2017, 5:59pm

From reading the source code for Duplicacy, it appears that they actually build a table of all known blocks and keeps this in memory. We had that option some time ago, but it was a real memory hog, so I removed it.

I have just tested with a small backup (~2000 blocks), and a block lookup cache had a tiny positive effect on the backup speed. But it is possible that this speedup is much more pronounced with larger backups, so I made a canary build with the option --use-block-cache. If it turns out that it really does improve performance without ridiculous memory overhead, I will convert it to a --disable-block-cache option.

If @dgcom and @jl_678 have a test setup, I would like to know the performance difference from the --use-block-cache option. I am aware of another performance issue related to how a file’s previous metadata (size, timestamp, attributes) is fetched, so it is possible that this is what really makes the difference, but maybe the --use-block-cache can help shed light on that.

Version with the --use-block-cache switch is here (2.0.2.6): Releases · duplicati/duplicati · GitHub

If that is how it works, it would require that the destination supports some kind of “If-Modified-Since” approach.

Without actually running Duplicacy, it does appear that it lists all remote blocks first, and keeps them in memory:

github.com

gilbertchen/duplicacy/blob/master/src/duplicacy_backupmanager.go#L198


      
          	// If the listing operation is fast and this is an initial backup, list all chunks and
          	// put them in the cache.
          	if (manager.storage.IsFastListing() && remoteSnapshot.Revision == 0) {
          		LOG_INFO("BACKUP_LIST", "Listing all chunks")
          		allChunks, _ := manager.SnapshotManager.ListAllFiles(manager.storage, "chunks/")
          
          		for _, chunk := range allChunks {
          			if len(chunk) == 0 || chunk[len(chunk)-1] == '/' {
          				continue
          			}
          
          			if strings.HasSuffix(chunk, ".fsl") {
          				continue
          			}
          
          			chunk = strings.Replace(chunk, "/", "", -1)
          			chunkCache[chunk] = true
          		}
          
          		// Make sure that all chunks in the incomplete snapshot must exist in the storage
          		if incompleteSnapshot != nil && !incompleteSnapshot.CheckChunks(manager.config, chunkCache) {

I think it works by having a copy (maybe without contents) of the remote store, in the local filesystem:

github.com

gilbertchen/duplicacy/blob/554f63263fe2a84803cb7f83dad2da9f3a55fe70/src/duplicacy_snapshotmanager.go#L172


      
              return len(collection.Fossils) == 0 && len(collection.Temporaries) == 0
          }
          
          // SnapshotManager is mainly responsible for downloading, and deleting snapshots.
          type SnapshotManager struct {
          
              // These are variables shared with the backup manager
              config        *Config
              storage       Storage
              fileChunk     *Chunk
              snapshotCache *FileStorage
          
              chunkDownloader *ChunkDownloader
          
          }
          
          // CreateSnapshotManager creates a snapshot manager
          func CreateSnapshotManager(config *Config, storage Storage) *SnapshotManager {
          
              manager := &SnapshotManager {
                  config: config,

Some of the complexity in Duplicati is there to handle big backups. If you have a file of 1TB, you need 300mb of raw hash data (with 100kb blocks). To avoid a blowup in memory and large file-lists, Duplicati keeps it in database and uses “blocks og blocks”.

But if you have a smaller backup, this should not perform worse.