Is there any way we could have a test restore process that doesn’t require a full backup’s worth of disk space + a version of destination files + temp usage?
I’m thinking restoring one file, CRC verifying it, deleting it, then moving on to the next file.
Three scenarios come to mind:
Performance / bandwidth frugal: All destination files for targeted test version are downloaded files restored, CRCed, deleted individually. This would require full destination version size + largest backed up file - but minimal transfers (likely fastest option)
Space frugalFiles necessary for a single to-be-restored file are downloaded, file restored, CRC checked, deleted, repeat for next file. Lots of repeated downloads needed but required space would be minimal - at MOST latest backed up file + that size /dblocks size.
Hybrid Combination of the two with logic to try and group files to be tested based on their proximity in destination files. Actual performance varies depending on backup layout (such as history length and frequency of file changes).
You can get a bit of this with a restore with –dry-run so it doesn’t actually write the files. It surprises itself when it doesn’t find them for verification though. For that, you could perhaps stream it through the hashing (if it’s not already done). One other concern I have is that a huge file could take a huge amount of work to totally verify…
I think the current approach of verifying just the dblock, dindex, and dlist raw volumes is weak proof of restore. Something closer would be better, and using mostly actual restore code might beat making testing-only code both on realism and on required work. Something sooner might be better than something fancy much later…
I think a case can be made for verify of changed, however it could get expensive if done as an immediate verify-changed-after-backup instead of occasionally. If for example a large file gets appended all the time, the change will be a small upload, but the verify will be a big download because most of the download will be the initial old blocks. I’d prefer not getting too specific about the details until we get somebody who can talk about the design and coding aspects, however more people may stop by with good input on priorities.
Full verification may get slow, and may be better suited for when concurrent backup and verify is possible, however some people may be willing to delay backups, and spend the time and bandwidth to do full verify.
Other options are possible such as sampling, including manual sampling fed to tweaked --dry-run restore. Personally I would favor having something simple in less time, rather than attempting the ultimate method.
The current rewrite of repair/recreate (which I hope is still underway) might have some insights to add too.
Yes - repair/recreate updates are probably better use of time at this point.
So to stray from the topic a bit… from an academic point of view what would be useful in a validation process.
For example, the ability to schedule a full (expensive) restore tests in the GUI with email notifications, like can be done with a backup now, might be more useful than manual/command line only “minimal resource” restore test features.
Sorry, it’s been a long day and I might be rambling a bit.