Hi @aureliandevel ! I’m the author of the reworked restore flow. First of all, thank you for the feedback, it’s very appreciated!
The problem
The problem we’ve been seeing comes as a side effect of the new design, which focuses on increasing parallelism and sequential disk writes. This led to caching strategies in order to keep the individual parts of the pipeline busy as much as possible. This caching further led to an increased amount of temporary disk space being used, which led to the introduction of the cache limiting option, trading consumed temporary storage for performance.
As you’ve pointed out:
this trade-off can for an unlucky (and it seems like it’s not as uncommon as I first thought) extremely hurt the performance, leading to a substantial amount of redundant work.
I like all of your suggestions, which I think will provide a better “off-the-shelf” experience, alongside user hints on how to deal with the problem. I do have one idea that I think will alleviate the problem.
Smart-er file orderings
The current implementation orders files to be restored by their size. My initial motivation behind that choice was that working on larger files first would lead to better overlap of the FileProcessor, as larger files simply take longer to restore. A better approach would be to rank the files according to how they’ll impact the volume lifetime.
Since we know the file list, which blocks they need, and in which volumes those blocks reside, we could make an analysis ahead of time, mapping out an order of files that’ll minimize volume cache lifetime. This approach will also tie in great with the warnings you’re proposing, as we now know how many volumes we’d need at one point. If any of the cache options clash with this, we’d be able to throw a warning (or maybe an error that can be ignored / lowered to a warning) that this configuration will require re-downloading in turn introducing redundant work and hurting performance.
I think that choosing strategy should be an option, with the “smart” ordering being the default. I still think the “large files first” provides value in that the analysis can be skipped, which for setups that have enough storage space for a large volume cache could benefit from. I’d also add the “small files first” (may be better for some setups / backups) and “don’t care” (where the file list is just fetched as-is from the database).
Cache blowing past cache hint
Note that the option is only a hint to the cache: it may be unable to satisfy the hint, which will result in a higher consumption.
The VolumeManager keeps track of cached volumes. Whenever a request for a new block comes in, it checks whether the volume is already being requested or already is in cache. If it’s neither, then the cache hint is being evaluated: would requesting another volume exceed the cache hint? In which case we’ll wait until we can evict a volume.
When a volume is evicted from the VolumeManager, it may not necessarily be removed from the temp storage, as the VolumeDecompressor might still be in the process of decompressing blocks from the volume, deeming it unsafe to delete. It won’t be deleted until all references to the volume have been disposed. Eviction from the VolumeManager cache just disposes of its reference to the volume. Writing about it now, I guess it could be changed to the VolumeManager further spin locking on the reference count until all references hove been disposed, leaving the VolumeManager in charge of the final disposing.
So, to summarize: in its current form the option --restore-channel-buffer-size determines the block request burst rate and the number of requests in flight in between each step. This means that there can be --restore-channel-buffer-size times --dblock-size amount of ‘unaccounted’ in-flight cache storage in between the VolumeManager and the VolumeDecompressor.
Single file / small selection restore
In general, I’d agree with you, when looking at the core functionality. When there’s less room for parallelism, a parallel approach won’t benefit, it might actually be worse off.
However, the reworked restore flow has one clear advantage over the legacy flow: by switching to a file-centered approach, we now know exactly when each file has been restored. As such, we can, as we restore the file, keep track of the file’s hash for later validation. The legacy flow doesn’t know when a file has been fully restored (due to the scatter effect of the volume-centric approach) until all of the files have been restored. That’s why the legacy flow has a separate verification pass, where it has to read all of the restored files once more. The reworked flow can perform this extra verification on the fly, giving it an edge.
TODO list
Mostly as a list of notes for my self (or for someone else if they’d like to implement it!
). Do comment if there’s something I’ve forgot or something that I’ve interpreted incorrectly.
--restore-volume-cache-hintshould be default off. Only users who experience problems with temporary storage usage should want to enable this.- If a volume is being evicted prematurely (due to the cache max size being hit), a warning should be emitted. It’s already being logged as an Explicit log message, so it should be easy to elevate. They should, however, only emit the 5 first, with later ones being aggregated to reduce log message flooding. Furthermore this could be tracked to provide a report towards the end of the restore, enlightening the user about why the restore performed poorly.
- Look into further spin locking in the
VolumeManagerto keep cache usage more in-line with the provided cache size hint. The only problem I see is that there would be a decrease of performance (not really a problem, since we’ve already dived into a performance hinderance) and potential deadlocks (lock keeping should be short). - Add a pre-restore analysis step, which reorders the file list to try to minimize volume cache lifetime. This analysis also provide the ability to an early warning system, which can inform the user that something either won’t work (limited storage space), or will be slow (will trigger many re-downloads).