[Idea] Bit rot detection in source files

Hello, this is more idea for discus than feature request.

We, who love to backup are probably afraid of bit rot ( Data degradation - Wikipedia )
Bit rot means that content of file on the HDD can silently change and HDD, file system or OS do not know about it. It may therefore take several years to find out that the file is corrupted.

Idea is that Duplicati can warn user that hash of file is changed, but metadata (last edit time…) are same.
That should mean that corruption occur (or some exceptions like default behavior of TC container)

If I understand how Duplicati works correctly, the first backup save the hash of all files to DB, and future backups only check if modified time was changed… (this behavior can be set)

So for corruption scan, we need special run (and it would make sense to run it regularly, for example just once a month, maybe by CLI?)
This run will check for all the backup files and compare that with the record in DB.
If hash will be different, ale modified metadata are same, warning is logged …

For corruption detection in backup files, we already have “upload-verification-file” parametr and DuplicatiVerify ( duplicati/Tools/Verification at master · duplicati/duplicati · GitHub) (thanks to @mnaiman)

I really think this can be feature that ad another layer to data security and Duplicati sqlite database is perfectly suited to this task.

What do you think ?:question:

I think it’s an interesting idea, though I’m not sure we currently have a file level hash so much as block level hashes. Granted, they could possibly be more useful as they could identify the block of the file that has rotted and even propose the last time it was backed up prior to the rot…

So I guess it would be kind of a block level SnapRAID or FlexRAID type thing…

However, I’m pretty sure there are some instances where files get changed without their timestamps necessarily updating - I’m thinking database files or other in-use-but-backed-up-with-a-snapshot type items (though I could be wrong).

I do like the idea though - I just want to make sure it doesn’t cause more confusion than good if it were to get implemented.

Yes, interesting idea. But will Duplicati scan a file that it doesn’t think has been modified? If USN or metadata or other means doesn’t indicate modification, will it re-read contents of a file? I seem to think it wouldn’t… so it wouldn’t have the ability to detect bit rot unless that behavior was also changed.

I think you are correct - by default a file is scanned only if the size or date/time stamp has changed so implementing something like this would essentially require a full scan (and hashing) of every file every run.

Even if we added a file level CRC to the database and did just CRC checks every run it will still likely take waaaayyyyy too long (and we might even potentially be trusting an OS provided CRC in certain cases). :frowning: