Restore Performance

Hi @sgeklor !

I’m the author of the new restore flow that @ts678 mentioned.

That’s about 1.93 MB/s, which I’d say is pretty slow, especially when restoring from a local hard drive to another. My main guess as to why it’s so slow is that there is a problem with volume cache management in the current stable release (2.2.0.3) that leads to volumes being downloaded multiple times (see this discussion). The current default cache size value is 100 volumes (or dblock). E.g. the default being 50 MB volumes means a cache size of 5 GB. You can tune this using --restore-volume-cache-hint (e.g. --restore-volume-cache-hint=200GB to set the cache hint to 200 GB). When this cap is reached, volumes are evicted from the cache based on least recently used. However, since the volumes may still be in use, the actual disk space used by the cache may temporarily exceed the hint until the volumes are no longer in use and can be deleted. Note that volumes are also copied at least twice to the TEMP storage (1 for the download and 1 for the decryption).

Have you set --restore-volume-cache-hint setting? If not, with your volume size of 1 GB (it seems?) your cache hint should be 100 GB, which is why you’re running out of RAM disk space. 128 should be enough to run (100 volumes + 11 GB database) without running out of space (assuming nothing else is consuming that RAM disk space). You might still experience performance issues due to the re-downloading of the volumes when files are spread across volumes. The new restore flow is file-oriented and if a file is spread across multiple volumes, then all of those volumes need to be downloaded and decrypted before the file can be restored. For an unlucky (but is seemingly common) case, this leads to a volume being downloaded, decrypted, have one block extracted, and then evicted, thrashing the download.

One note on the volume size: larger volumes reduce the amount of parallelism that can be leveraged in the new restore flow by the volume downloaders, decryptors, and decompressors as there’s less volumes to work with and because a single volume (as it is now) cannot be processed in parallel. Secondly, larger volume sizes will require a larger cache size to be effective. Thirdly, if a volume is prematurely evicted from the cache, downloading it again will have a larger impact than with smaller volumes. The only real benefit that remains for larger volumes is that they may compress better (since compression is only applied within a single volume) and the overhead of metadata (e.g. zip headers) is reduced.

You can try the legacy restore by using the --restore-legacy=true option. The legacy restore is volume-oriented, holding only a single volume in memory at a time and only touching a volume once (download, decrypt, extract). The downside is that there’s not as much parallelism (only one volume is processed at a time), files are written in scattered fashion (since the blocks are scattered across volumes), and it has an additional verification step at the end of the restore, as the file hashes can only be verified once we know the entire file has been restored. The new restore flow knows exactly when each file is fully restored and can compute the file hash on the fly while it’s restoring the file.

This is expected behavior. The restore process first checks the local file system to determine which files are already present and match the expected size and timestamp. This is done to avoid unnecessary downloads of files that are already available locally. If a file is found to be missing or does not match the expected attributes, it will be marked for restoration, and the process will proceed to download the necessary volumes from the backup source. If you don’t want this behavior, you can set the options --restore-with-local-blocks=false to not use local blocks, and --overwrite=true to overwrite existing files. I can see that the new restore flow still checks the target files, which it could skip if these options are set, so I’ll look into that for a future update.

You mentioned that you have a large system, which could mean that the default concurrency parameters are far too large for a hard drive (which doesn’t like too much concurrent access), such as described in this post. The default is number of cores / 2 (same as the backup), so for your 36 cores that would be 18. The options are described in the blog post, but you might want to set --restore-file-processors (the number of processes that writes the target restore files) and --restore-volume-downloaders (the number of processes that download the volumes (or reads from the source drive in your case)) lower. 1 may not be enough to saturate the drive (as there’s some none-I/O processing that needs to be done), but you can experiment with different values to see what works best for your system. In the blog post I didn’t see too much of an performance improvement, but you may see something else.

This is a warning system in place to ensure that the network doesn’t stagnate due to a deadlock. It was introduced with the new restore flow as it’s using concurrent processes. So if you’re seeing system activity (be that CPU, network, or disk) related to Duplicati you should ignore the warning. If nothing’s happening, then it could indicate a deadlock, which should be reported. The threshold is set to 2 times the maximum round-trip time for a block request. If your disks are being overloaded by requests, then the round-trip time will increase, which will lead to the warnings being emitted.

Right, I can see how that can be confusing. The temp folder is used as a staging area for the volumes as they are downloaded, decrypted, and decompressed. I think that if we’re able to handle the volume cache more effectively (discussion) then this shouldn’t be as much of an issue. In the meantime, I’m not sure of how to convey this information better, as most (or at least myself whenever I use a new tool) don’t want to read the documentation to use the product and forcing a read is not ideal.

Not too much, but there is some if run with --log-level=Profiling and --internal-profiling=true (at least those provide information that’s parseable by me). I would like to build a more intuitive performance/profiling tool/UI to provide more insight to where and which bottlenecks reside. But that’s future work. There has also been added some cache metrics in this PR, but it’s yet to be included in a build.