I’ve spent more time thinking about this and playing with it. The more I understand how the database is linked into the backup process, the less confident I am that it has the correct results unless the entire backup completes successfully.
I know that all queries are run on the main thread. I know there are transactions wrapping many of the calls. I am unsure the size and scale of those transactions and how a possible rollback of one of those transactions via a stopNow() action will impact the output. I’ve not been able to create a transaction model in my head that will create consistency between the resulting backup and the local database that have different definitions of complete/committed. The multi-threaded nature of the backup does not lend itself to doing that in a helpful way.
eg;
Lets take 10 files, 100Megabytes each, all with the same content.
The backup will split those across threads, produce the same blocks in different files and write that into the local database. That all happens before the data is in a volume and that volume is stored on the remote. I can see race conditions in that scenario, even before a stop.
When we stop in the middle, the database may commit to say we have backed up that block. But I see no guarantee that we have backed up that block. Alternatively, we may backup the block, but because of the transactions in the local database and the exit point, the transaction is rolled back and we now have backed up blocks and volumes that the local database thinks are not backed up.
It continues to be my increasing opinion that the local database, local data management model and threading process need to be revisited to determine how they can work together to be able to produce a relatively transactional model.
My thinking about it would prefer the messages that are sent between threads would contain the backup block information for any backup in progress, and be updating and checking the database at the last possible point before deciding if we are actually collecting the block data to put it into a volume. There are edge cases in that as well that I’ve not devised a simple plan to resolve.
There are possible performance implications of these choices. But as a backup tools, I consistently hear that correctness is the most important. It’s my view that changes should be made in that direction to be able to increase stability and correctness. I believe in the short term, that may have some performance impacts. However I also believe the structure of the local database and its use are a significant performance problem and we may see improvement.
I don’t think any of these issues can be resolved in the current beta as all require changes that are too risky.
This supports my opinion that larger changes need to be able to be made to allow longer term benefits. That is not viewed as the best direction for the project with the resources available. Which in turn pushes me back to wondering how I can help if all the things I have to offer are not suitable at this time. Maybe I should try again in a year.