This is for discussion since there might be better ideas or other concerns I’m not aware of.
Naturally, if appropriate, this will end up as an issue on the github.
Background
The new (default) restore process caches downloaded volumes in temp storage up to the limit imposed by the -–restore-volume-cache-hintoption, defaulting to volume-size * 100, or 5GB for a default volume size of 50MB.
Once the volume cache size has been exceeded, previously downloaded volumes are evicted based on LRU policy (I think that’s an over-simplification but it’s sufficient for the discussion).
However, if the backup has many files spread across many volumes (or many volumes containing data from many different files; pick your poison), volumes can be evicted and re-downloaded many times, potentially significantly a) slowing down the restore process and b) increasing the amount of data downloaded to complete the restore. A backup containing a relatively small number of large files would require much less space than a backup containing a large number of small files.
Proposed Changes
I think the default for -–restore-volume-cache-hint should be removed and a limit should only be applied if the option is present, indicating that the user deliberately wants to limit the temp space used for volume caching.
Also, I think a warning should be emitted that the volume cache size has been exceeded and/or volume cache eviction has occurred due to space constraints, thus giving the user the option to organise larger temp space (see -–tempdiroption).
Also, an addition to the final report, for restores, showing the number of re-downloaded volumes and the avg/max re-download count would be useful to show how urgent the issue is. No point scrambling for more space if only a few volumes are re-downloaded.
On further reflection, since this report, as proposed, would only be shown for the restore operation, it might be a good idea, if possible, to somehow calculate how much space a restore would require and post that information to the final report for a backup.
Secondly, my backups were created from duplicati versions much older than today’s so it might be that there’s something about how those older versions operated that makes a big difference in the observed behaviour; I can’t be sure but it’s possible that we’re even talking v1.x here. Moreover, I’ve generally always used canary versions with the same caveat.
I’ve been doing some further testing on the use of the -–restore-volume-cache-hint option and the behaviour of volume caching.
Keeping in mind that it appears that the volume eviction policy with the highest precedence is the one that says: if all the data from this volume has been restored, evict. This behaviour can be observed when the volume cache is unconstrained by seeing file count in the temp folder decreasing as well as increasing as the restore proceeds. Another way to put this is that all files in the cache have data pending restore; there’s no files in the cache that shouldn’t be there.
Also keep in mind that the largest files are processed first so caching won’t really show itself until the larger files, ie files that are likely to be spread across multiple volumes, have been processed.
I have about 12 backups that I regularly restore test. The backend storage requirements for each ranges from ~6GB to ~180GB.
I’ve now tested all my restores with cache size set to 50GB and, except for the smallest backup, they all broke the 5GB default cache size ceiling: defined by the size and count of cache files in the temp directory.
Now, for context, my largest backup is my Audio backup which is essentially my mp3 music collection plus other files like podcasts, lectures, etc. As you can imagine, there’s many files generally ~5MB and they don’t change often, if at all. I was quite surprised to still see the cache requirement blow past the 5GB default ceiling. In fact, I’m very suspicious about just many living cache files have accumulated; in currently running restore, it’s not even halfway through (112GB to be restored) and there’s already ~40GB of volumes cached. This is what made me wonder about earlier versions of Duplicati and how data might have been spread across volumes.
So, for anyone that does regular restores of any reasonable size, the-–restore-volume-cache-hint setting should definitely be on your radar.
One other thing that should be pointed out is that this isn’t going to make any difference whatsoever to a single file restore or probably even restoring a small selection of files from a backup. In fact, in those cases, I’d wager that there would be very little difference between the legacy and new restore process.
The problem we’ve been seeing comes as a side effect of the new design, which focuses on increasing parallelism and sequential disk writes. This led to caching strategies in order to keep the individual parts of the pipeline busy as much as possible. This caching further led to an increased amount of temporary disk space being used, which led to the introduction of the cache limiting option, trading consumed temporary storage for performance.
As you’ve pointed out:
this trade-off can for an unlucky (and it seems like it’s not as uncommon as I first thought) extremely hurt the performance, leading to a substantial amount of redundant work.
I like all of your suggestions, which I think will provide a better “off-the-shelf” experience, alongside user hints on how to deal with the problem. I do have one idea that I think will alleviate the problem.
Smart-er file orderings
The current implementation orders files to be restored by their size. My initial motivation behind that choice was that working on larger files first would lead to better overlap of the FileProcessor, as larger files simply take longer to restore. A better approach would be to rank the files according to how they’ll impact the volume lifetime.
Since we know the file list, which blocks they need, and in which volumes those blocks reside, we could make an analysis ahead of time, mapping out an order of files that’ll minimize volume cache lifetime. This approach will also tie in great with the warnings you’re proposing, as we now know how many volumes we’d need at one point. If any of the cache options clash with this, we’d be able to throw a warning (or maybe an error that can be ignored / lowered to a warning) that this configuration will require re-downloading in turn introducing redundant work and hurting performance.
I think that choosing strategy should be an option, with the “smart” ordering being the default. I still think the “large files first” provides value in that the analysis can be skipped, which for setups that have enough storage space for a large volume cache could benefit from. I’d also add the “small files first” (may be better for some setups / backups) and “don’t care” (where the file list is just fetched as-is from the database).
Cache blowing past cache hint
Note that the option is only a hint to the cache: it may be unable to satisfy the hint, which will result in a higher consumption.
The VolumeManager keeps track of cached volumes. Whenever a request for a new block comes in, it checks whether the volume is already being requested or already is in cache. If it’s neither, then the cache hint is being evaluated: would requesting another volume exceed the cache hint? In which case we’ll wait until we can evict a volume.
When a volume is evicted from the VolumeManager, it may not necessarily be removed from the temp storage, as the VolumeDecompressor might still be in the process of decompressing blocks from the volume, deeming it unsafe to delete. It won’t be deleted until all references to the volume have been disposed. Eviction from the VolumeManager cache just disposes of its reference to the volume. Writing about it now, I guess it could be changed to the VolumeManager further spin locking on the reference count until all references hove been disposed, leaving the VolumeManager in charge of the final disposing.
So, to summarize: in its current form the option --restore-channel-buffer-size determines the block request burst rate and the number of requests in flight in between each step. This means that there can be --restore-channel-buffer-size times --dblock-size amount of ‘unaccounted’ in-flight cache storage in between the VolumeManager and the VolumeDecompressor.
Single file / small selection restore
In general, I’d agree with you, when looking at the core functionality. When there’s less room for parallelism, a parallel approach won’t benefit, it might actually be worse off.
However, the reworked restore flow has one clear advantage over the legacy flow: by switching to a file-centered approach, we now know exactly when each file has been restored. As such, we can, as we restore the file, keep track of the file’s hash for later validation. The legacy flow doesn’t know when a file has been fully restored (due to the scatter effect of the volume-centric approach) until all of the files have been restored. That’s why the legacy flow has a separate verification pass, where it has to read all of the restored files once more. The reworked flow can perform this extra verification on the fly, giving it an edge.
TODO list
Mostly as a list of notes for my self (or for someone else if they’d like to implement it! ). Do comment if there’s something I’ve forgot or something that I’ve interpreted incorrectly.
--restore-volume-cache-hint should be default off. Only users who experience problems with temporary storage usage should want to enable this.
If a volume is being evicted prematurely (due to the cache max size being hit), a warning should be emitted. It’s already being logged as an Explicit log message, so it should be easy to elevate. They should, however, only emit the 5 first, with later ones being aggregated to reduce log message flooding. Furthermore this could be tracked to provide a report towards the end of the restore, enlightening the user about why the restore performed poorly.
Look into further spin locking in the VolumeManager to keep cache usage more in-line with the provided cache size hint. The only problem I see is that there would be a decrease of performance (not really a problem, since we’ve already dived into a performance hinderance) and potential deadlocks (lock keeping should be short).
Add a pre-restore analysis step, which reorders the file list to try to minimize volume cache lifetime. This analysis also provide the ability to an early warning system, which can inform the user that something either won’t work (limited storage space), or will be slow (will trigger many re-downloads).
Thanks for confirming and further analysing the issue. And thanks for the link to that blog post. I had no idea we had that and it does an excellent job of laying it all out.
I really like the idea of doing a pre-run analysis on the file/volume mix. The idea that there could eventually be a set of ‘tuned’ restore strategies depending on the structure of the data is very interesting. These might become quite important, especially when volume requirements exceed cache size.
I’m not sure I understand point #3 in your TODO list … I’ll have to give that blog post another read.
On a side note, and speaking of the blog post, I was very interested in the “Effects of multiple FileProcessors on HDDs” section.
I do my restore tests to HDD and since I’ve been analysing performance of late, I thought there might be a fair amount of thrashing going on so I set the file processor count to 1. Although I haven’t done any A/B testing yet, a test on a large restore didn’t really yield the improvement I was expecting, which agrees with the testing you reported in your blog post. Logically, it seems to me that keeping the FileProcessor count low for HDDs and higher for SSDs (indicating another opportunity for autotuning) makes sense yet, at least on my system and your testing system, that theory falls over in testing. My CPU (i7-8700) is far from maxed out during my restores so I don’t see decryption/decompression/etc as bottlenecks. The disk itself appears to be the bottleneck so I return back to the FileProcessor count. Uhm….?
Also, I noted that particularly towards the end of the restore, a not insignificant amount of time is spent where the CPU and disk throughput fall to a trickle and yet the disk activity is maxed out and queue length starts growing. At least according to Task Manager, this appears to be $Log and $Mft activity. I introduced the -–skip-metadata flag (I’m only doing restore testing after all) which may have improved things, and I definitely need to do more testing to quantify, but if you’ve got any intuition about what’s happening there?
Given a state of the cache, the objects in the cache might contain several very small files that can be restored fully, will they be restored? Or will they await their turn in the restore list?
I assume you’re referring to whether files with currently cached volume data will be restored before the volume is evicted from the cache?
With the current default limit on the cache size, it’s quite likely the volume will be evicted from the cache before all file data has been retrieved. This is the cause of the cache ‘thrashing’ we’re seeing.
As @kenkendk mentioned on the other thread related to this subject, I have some pending code changes which modifies the cache sizing so, by default, all but 1GB of your temp storage is available for volume caching. However, I’m also investigating the restore file ordering algorithm because the original cache limit should be sufficient in most cases. but something else is going on that blows out the volume caching requirements significantly (my current prime suspect is de-duplication).
Until that code is either committed for release or replaced with an improved ordering strategy, it’s highly recommended you set the --restore-volume-cache-hint option to as much temp space as you’ve got, minus a little headroom.
Following up on my earlier reply to @carljohnsen, I spent the last few weeks digging into the restore ordering / volume lifetime side of this in more detail.
In short: the general direction he suggested was a good one, and I’ve now done a fair bit of work around pre-analysis, restore ordering, and cache-behaviour simulation.
The main thing I’ve learned is that smarter ordering does help, but it does not appear to be the whole answer.
I tried a few different approaches.
phased file ordering, where phase 1 contained files spread across multiple volumes, phase 2 contained files contained within a single volume, etc. In some cases this helped a little, but in others it actually made the problem worse.
shared-block caching, ie deduplication-aware caching. This did help a little, but nowhere near as much as I expected.
At that point I switched focus to a simulation environment, because it allowed much faster and more targeted iterations. That made it easier to analyse things like bridge volumes, family groupings, and other structures that seem to keep volumes pinned in cache much longer than expected.
The conclusion I’ve come to is that restore ordering can help a bit, and may still end up being quite useful once the larger problem has been addressed, but it does not look like the root of the problem.
My current focus is shifting toward volume packing / repacking strategies, because the current packing scheme seems quite hostile to the new restore process. To be fair, it also seems close to ideal for the legacy volume-first restore process, so I want to clear that isn’t “the old design was wrong” but more “it was optimised for a different restore model”.
For anyone curious or interested in doing their own analysis, I’ve pushed the code into two repos:
As for the issue discussed earlier in this thread, I now have a PR pending merge to the main Duplicati repo that removes the default cache limit as discussed above. I held that back longer than I expected because I was assuming the excessive cache requirement would turn out to have a relatively simple underlying fix, in which case removing the limit by default might have been unnecessary. As it turns out, I needn’t have worried.
Nice work! I’ll look deeper into it when I get around to it. Interesting that the ordering doesn’t help too much. I do like the phases; phase 2 should reduce the size of working cache size (assuming that the volumes only contain the blocks needed by these files). For phase 1, I guess it all really comes down to how they’re spread. The backup in it’s current form doesn’t guarentee any volume locality, but maybe it should? The same applies to the compact operation, as that also has the information and is intrusively touching the volumes.
In the meantime, I’ve checked your PR and it looks fine! I only have two small notes / questions.