StopNow(), Backup and cancellation tokens

This is just a summary about how the cancellation is working right now so I won’t forget:

Cancellation process

  • All operations have a result which can control the operation (pause, resume, stop (now / after file), abort)
  • The backup handler is passed a cancellation token
  • Controller.Stop() cancels that token and calls BasicResult.Stop()
  • Controller.Abort() calls BasicResults.Abort() and Thread.Abort() on the task thread (does not cancel the token)

Part A:

  • BasicResult.Stop() sets m_pauseEvent to unpaused and m_controlState to stopped (aborted for Abort()) on the result. These are used by result.TaskControlRendevouz():
    • The method throws an exception if aborted
    • If paused it blocks, otherwise it returns the current state
    • BackendManager uses this to pause and abort transfers between operations and in the progress handler
    • BackupHandler: pause and abort before post backup verification
    • CompactHandler, DeleteHandler, RecreateDatabaseHandler, RepairHandler, RestoreControlFilesHandler, RestoreHandler, TestHandler:
      pause/abort before each file, stop: complete backend transfer and finish transaction before stopping
    • ListChangesHandler: pause/stop/abort at a few predetermined spots
    • ListFilesHandler: pause/stop/abort before each fileset

Part B:

  • In addition, BasicResult.Stop() calls stop on a m_taskController, but only for stop now. There is supposed to be a distinction between stop now and stop after current file in the task controller, but it is not used.

  • For BackupHandler this is passed in as result.TaskReader to

    • BackendUploader
    • DataBlockProcessor
    • FileBlockProcessor
    • StreamBlockSplitter
    • FileEnumerationProcess
    • SpillCollectorProcess

    and is checked at some other places, but not to:

    • FilePreFilterProcess
    • MetadataPreProcess (gets the cancellation token though, uses it to ignore cancellation exceptions)
    • ProgressHandler
  • In each of those with task reader:

    • await taskreader.ProgressAsync is called repeatedly
    • Running: this returns false without blocking
    • Paused: this await blocks until resumed
    • Stopped: this returns true without blocking, allowing cleanup after a file/block is complete
    • Terminated (only called in Dispose()): this cancels the underlying task, throwing a TaskCancelledException once awaited. Some processors rely on this to finish after all other tasks.
  • Otherwise the task reader is only used for FileEnumerationProcess in TestFilterHandler

Part C

  • The cancellation token is used in
    • FileBlockProcessor
    • FileEnumerationProcess
    • BackupHandler.RunMainOperation() to determine if the backup is partial
  • It seems to be checked at least as often as ITaskReader.ProgressAsync, or more often. I think this is to cancel faster in sections where a pause would not make sense.

Conclusion

  • There seem to be two parallel ways to communicate the end of the operation (three if you count the cancellation token)
  • BasicResult.TaskControlRendevouz():
    • is a blocking operation to pause the progress
    • throws when aborted
    • can return Run or Stop (Pause will always block until resumed and Abort will always throw)
  • ITaskReader.ProgressAsync:
    • is async await compatible to pause the progress (Pause() is not called anywhere, should be in BasicResults.Pause())
    • throws when terminated (in Dispose, which does not seem to be called ever, it is also not set in BasicResults.Abort())
    • returns true to continue, false to stop

I think it was intended to move from the blocking TaskControlRendevouz() to the async ProgressAsync, but this transition is very incomplete at the moment. In the commit history it was added in a3f2b39d9 in 2016, seems to be after the cancellation token was already used:

Implemented handling of pause/stop/abort in the concurrent code.
Implemented the dry-run feature for backups.

TaskControlRendevouz() was added in bd53090a in 2014:

Implemented the pause/resume/stop/start methods throughout the calls to allow for interactive control over the tasks

Right now this whole logic is very confusing, maybe @kenkendk knows more about how this was intended to be.