Thanks for taking the lead on gaining understanding. Do you think you’ll be able to share someday?
Understanding the existing mechanisms (which do extend beyond just the database) will be helpful:
My proposal is that a visit should precede a revisit. No currently active person knows the early plans. Should you prefer to call that a revisit (by a new team), I think it’s still good not to “clean-sheet” code. Proposing an ideal design should be safe enough, but ripping everything out will set things way back. Incremental change starting from the current code should (if it can get us there) be a whole lot better.
Some items that are not strictly database transaction, but are possibly helped by commit at right time:
Remotevolume table State field gives clues on a backup restart of what was how far in the uploading. Mechanisms must exist for blocks that didn’t make it, or for that matter, made it then dblock vanished.
A vanished dblock gets fixed by code in purge-broken-files
by purging files existing before vanish.
Potentially similar mechanism could be applied to the temporary block files that are written in parallel.
Unlike the vanished dblock case, this loss is of a proto-dblock that probably won’t impact any old files.
There is a synthetic filelist mechanism that may run before backup to upload dlist if prior couldn’t do it.
This could happen due to an interrupt, and means to record files to the extent that the upload finished. Instead of running it at the start of the backup after interrupt, maybe do it at end after a fast stop effort.
Reliable cleanup from a crash or interrupt is needed anyway. Should a stop now go down super-hard?
How far off is it currently from stopping all backup progress and doing what the synthetic filelist will do? That’s probably the minimum possible time for a clean stop, e.g. destination is consistent, with its dlist.
There are some similarities between SpillCollector which runs when proto-dblock parallel work is done, and Compact. They both take partially filled block files and emit filled ones, and maybe some leftovers.
To make the whole analysis less of an overwhelming large chunk, one can actually still request doing a synchronous upload, allowing focus on file preparation. That can then be de-parallelized using options. Looking at logs from a simplified version may be one way to start. Do we log commit? If not, add those.
Can you continue developing an understanding of the current plan and produce a short writeup of that? There have been several attempts made at describing the processing. Add in the database processing.
You might have thoughts about right-sizing of queries, EXPLAIN QUERY PLAN
(don’t scan), and so on…
There’s also the new PRAGMA adder to play with. That was well-received by one database-savvy user.
I’m not sure who’s going to actually go explore though, but then the explorer can share what they found.
This same user has a great site for hitting transaction problems too. I wish they’d work with us on those.
Database and destination internals testing reply to you would probably do a lot to support their situation.
Want some testing scripts to go break Duplicati, and you can study and fix, and go break it some more?
Or look at some existing backup corruption that has good steps to reproduce. There are several around.
There’a an abundance of places to help IMO. If you care to, describe what you’re good at, and what not. Above suggestions were sort of DB flavored but need some C# reading ability too. What about Python?