Real-time backup

Daniel_Gehriger · April 19, 2018, 9:21pm

See pull-request: 3184

First implementation of NTFS USN journal optimization.

Folder renames are correctly handled (old name is removed, new one added)
I don’t actually rely on the USN journal to determine if files have changed. Instead, I use it to filter the source list (reduce it) to the set of possibly modified files and folders, which are then scanned recursively using FilterHandler.EnumerateFilesAndFolders(). That way, there is no risk of incorrectly interpreting the journal entries in complex rename / delete / re-create scenarios. Whenever a file / folder is “touched” in the journal, it will be fully scanned.
The state of the USN journal is recorded in a new table ChangeJournalData in the local database. This table also records a hash value for each fileset, representing the active source files and filter set. An initial full scan is re-triggered whenever the backup configuration is modified.
A full scan is also triggered if the USN journal is incomplete (has been overwritten / truncated), and of course in case of errors.
In case of DB loss, recovery, the USN data is not recovered, and a full scan is triggered to establish a sane baseline for subsequent backups.

TODO:

The USN journal records a limited amount of changes, and if backups are spaced too far apart, full scans are required as the data will be incomplete. This has the following implications:

Frequent (realtime) backups avoid this problem. If nothing has changed, the fileset will be compacted. Duplicati may need optimizing if this becomes a common scenario (compacting before uploading).
Frequent backups result in numerous filesets, and this may interfere with retention policy. Maybe for the current day, many more filesets would make sense in the “automatic” retention policy mode, to avoid deleting changing data.
Less frequent backups with USN support could be made possible by scanning the USN journal at regular intervals, and recording the changes, using a process / thread separate from the backup thread. When the backup is run, this data is then used instead of reading the journal at that time. There is no risk that modifications will be missed during reboots / Duplicati not running, as the USN numbers allow us to ensure that the journal is recorded w/o gaps.