Repository validation & contents UI module

I’ve been playing around with the idea of a UI module that takes some of the mystery out of:
1 - What a repository contains
2 - Data validation status of files in the repository

As far as 1 is concerned, I would like to be able to browse the files in a repository in the UI. I would also like to be able to expand the file lists of individual backups to view what was backed up (helpful when manually culling old backups). I’d especially love to be able to see data similar to the output of the compare function, that lets me analyze which files/folders were added/removed/changed in the backup run. This would be great for troubleshooting why certain backup sets take longer than expected (imagine a backup job that needlessly includes a web browser’s cache folder). This is a nice-to-have feature, but it could be extended to include quite a lot of information & functionality to make backup sets more transparent.

Now for the more important one: #2 - data validation. It would be great if one could see a listing of all the files in a repository with a validation status for each. This status (red/green light) would indicate which files have been hash checked by the random spot-check that is done by default on a few files during every backup. The status of each would be tracked (timestamp of last download and hashcheck) and the random files would be chosen such that the oldest files are checked first in future spot checks. This will give a very nice quick overview of the expected data reliability of the repository.

Furthermore, the above could later be extended with server-side hash checks during upload on providers that support it. For example, Backblaze B2 reports the SHA1 hash of each file when the upload completes, which can be compared to the expected hash and stored in the abovementioned validation status tracking (table). Of course in the case of B2 this would mean computing SHA1 for each upload in addition to SHA256, but the benefits are definitely worth it, as this effectively makes random spot checks redundant.

I’m happy to tackle development of this module if nobody has started anything similar yet. Please let me know if such work has been started or discussed, and also please discuss any enhancements/criticisms/opinions anyone might have…

I think both features are essential. The only reason for them not being present is lack of developer time.

There are quite a few suggestions on Github considering how the UI could show versions of each file. The compare command would also be helpful to have as a UI option.

For #2, the information is in the database, so it would be a matter of reporting this information to the UI and slap on some icons. I can make the backend support if you are more comfortable with developing the frontend stuff.

I think the backend would report something like:

[
  "filename": {
    "size": 123,
    "validations": 2,
    "created": "2017-01-02T13:00:01",
    "status": "verified"
  },
  "another file": ...
]

Cool, I’m happy with both front- and backend work. It will be a great opportunity to familiarize myself with the codebase and hopefully make more contributions in future…

I like the idea of including more transparency of the file versioning as will as the compare output. It would also be great to add a few other features to help manual backup culling and housekeeping, such as showing which backups exclude missing drives (ie: external/virtual drives that weren’t present during the backup) and allowing manual deletion of dlist files and subsequent cleanup & compacting, or manually selecting specific file sets for verification. The latter would be good for confirming that a particular critical snapshot is 100% restorable, while the others still rely on best effort verification.

Yes, I think showing what files are where is also a gateway into allowing “purge files” in the UI, where the user can see exactly which versions are affected by a purge.

If you have questions for getting started, drop me a PM or write in the #developer channel.