For example, I have two different filesets. The 2TB one is taking almost 3 days straight and still ongoing while the other 1TB one took only half a day at most
I will like to know what affects the restoration rate so that I can plan the backups better
Hi, I will like to know about this since I have effectively ruled out hardware issues and possibly legacy Duplicati issues when I retested this via a new backup job
I set up a new dataset of approx 1 TB of data and created a new Duplicati backup job while also creating a new Kopia backup job
Kopia took slightly longer to create this but yet took only 7 hours to restore while the Duplicati job is still running right now and is already more than 7 hours for sure
Good question, and the Kopia comparison makes it easier to narrow down. The short answer is that Duplicati’s restoration speed is fundamentally tied to how it reassembles files from deduplicated blocks, which involves a lot of random I/O and SQLite lookups that don’t scale as well as Kopia’s approach.
Here is what affects Duplicati restore performance the most:
Block size (dblock-size). Duplicati defaults to 100KB blocks. A 1TB dataset gets split into roughly 10 million individual blocks. During restore, each file must be reassembled by looking up which blocks it needs, locating those blocks across potentially thousands of dblock archives, then extracting and decompressing each one. If you used the default block size, that is a lot of individual operations. Kopia uses content-defined chunking with larger average chunk sizes and a different lookup strategy that scales better.
SQLite database bottleneck. Duplicati maintains a local SQLite database tracking every block. On a large restore, this database is hit millions of times. If your SQLite database file lives on a spinning disk or a slow network share, this alone can make restore 5-10x slower than it should be. Make sure your Duplicati database is on your fastest local SSD.
CPU and encryption overhead. AES-256 encryption and decompression happen block by block. On a 1TB restore with 100KB blocks that is roughly 10 million encrypt/decrypt cycles. Kopia batches these more efficiently.
Parallelism. Duplicati’s restore process is largely single-threaded in older versions. Newer Duplicati 2.x builds improved this somewhat but Kopia still has better multi-threaded restore.
What I would check: during the active restore, watch CPU usage (should be pinned), watch disk IO (should be heavy on the Duplicati database path), and look at the restore log for the throughput rate. If you see the throughput dropping over time that often points to the SQLite database getting fragmented.
For future backups, bumping dblock-size to 250MB reduces the number of archive files significantly and typically cuts restore time by 30-50% on large datasets.
The bigger blocksize helps cut down the restoration time (about 1.5 days to 14 hours) but I am having concerns about leaving the Duplicati databases on SSDs as the restoration process is using up a lot more read/write cycles on them than expected/desired
For reference my SSD’s % life went down by 3% over these few days of testing. I have never seen such usage over the span of using these SSDs until when I started these restoration tests
I’m the author of the reworked restore flow. 2 TB over 3 days ( ~7 MB/s), 1 TB over half a day ( ~23 MB/s), and 1 TB over 14 hours ( ~19 MB/s) I agree seem quite slow for a local restore on SSDs.
Which version of Duplicati are you using? The current stable release (2.2.0.3) has some issues related to cache management, that lead to volumes being downloaded multiple times (see this discussion). The default cache size value is 100 volumes (or dblock), which I think is why you see better performance with larger volume sizes. E.g. the default being 50 MB volumes means a cache size of 5 GB. If you increase the volume size to 1 GB, the default cache is 100 GB. You can tune this using --restore-volume-cache-hint (e.g. --restore-volume-cache-hint=200GB to set the cache hint to 200 GB). When this cap is reached, volumes are evicted from the cache based on least recently used. However, since the volumes may still be in use, the actual disk space used by the cache may temporarily exceed the hint until the volumes are no longer in use and can be deleted. The problem arises when files are scattered across many volumes and the restore order is unlucky, leading to volumes being prematurely evicted, thus requiring them to be downloaded again.
So my first suggestion would be to increase the --restore-volume-cache-hint value to something larger (e.g. 10 TB if you have the disk space for it).
Secondly, I’ll touch on the comments from @RianKellyIT :
This is for old versions of Duplicati, the current default is 1 MB. Increasing block size can help reduce database pressure (as rightly pointed out by @RianKellyIT: larger block size results in less blocks, which means less database operations) and it may also increase hashing performance (as hash work is batched into larger blocks). The downsides to larger block sizes is that it hurts deduplication (as the probability of blocks being identical shrinks with increasing block sizes) and provides larger overhead for blocks that are smaller than the block size (e.g. a 500 KB file will still take up 1 MB of space in the backup if the block size is 1 MB). So there is a tradeoff to consider here.
While the statements hold true on their own, I’d argue that this isn’t the issue for the slow restore performance. The restore operation mostly reads from the database and uses parallel connections for its queries. In this blog post I did some benchmarks on SQLite performance where for the slowest single core performance machine (1950X) SQLite could reach >250 KOps for select queries on a 10 million row table. If we assume 10x worse performance, that’s still 25 KOps, which for 2 TB / 1 MB block size (2 million blocks) would mean 80 seconds of database time, which is negligible compared to the 3 days restore time.
I doubt that encryption is the bottleneck here since most CPUs carry hardware acceleration for AES. However, I may be wrong and it’s worth testing!
As rightly pointed out, the new restore flow uses parallelism on files being restored, blocks being decompressed, volumes being downloaded, and volumes being decrypted. To what degree of parallelism each step utilizes can be tuned, with the default being number_of_processor_cores / 2 for each. You can try the old restore flow using --restore-legacy=true. The restore flow has its advantages in that it only ever downloads and keeps one volume at a time, which is beneficial for low disk space / low memory environments.
I’d argue that this is no longer a good recommendation for the new restore flow. For one, larger volumes reduce the amount of parallelism that can be leveraged by the volume downloaders, decryptors, and decompressors as there’s less volumes to work with and because a single volume (as it is now) cannot be processed in parallel. Secondly, larger volume sizes will require a larger cache size to be effective. Thirdly, if a volume is prematurely evicted from the cache, downloading it again will have a larger impact than with smaller volumes. The only real benefit that remains for larger volumes is that they may compress better (since compression is only applied within a single volume) and the overhead of metadata (e.g. zip headers) is reduced.
This is quite concerning, and I can understand why you would want to avoid this. I think this points back to the cache management issue. For a good case, a restore will incur at least 2 writes of each volume (one for the initial download and one for the decrypted volume), but with the cache management issue, some volumes may be downloaded and written multiple times, which can lead to excessive wear on the SSD. So again, I would suggest increasing the --restore-volume-cache-hint value to something larger until the cache management issue is resolved in a future release. It has been handled and merged in this pull request, but it hasn’t made it into a release yet.
As a final note, if you want more insight to what’s taking time during the restore operation, you can run with --internal-profiling=true and --log-level=profiling then you’ll see the internal timers of each of the processes within the flow.
Thank you for the detailed response! Let me take a look at these pointers but just a quick note that my current observations were on 2.2.0.3; right now I understand that the current stable is 2.3.0.0
Are these points still valid for 2.3.0.0?
Also, for one of my systems, to mitigate the SSD wear issue, I have changed to a RAID 0 HDD array so it is yet to be seen how this change will help