Python script to generate stats about backend storage

jfparis · April 19, 2021, 5:51pm

Hi there

I just pushed a short python script that generates some stats about

wasted space in the repository (data in blocks that relates to deleted backups)
files that have never been verified (duplicate does verify one block after each backup jobs but as a job usually generates many blocks there are a lot of blocks in storage that have never been verified)

The script takes one parameter (the path to the database)

Hope this can be useful to the curious minds

jf

drwtsn32 · April 19, 2021, 9:20pm

Pretty interesting, thanks for sharing!

I’m guessing the knowledge of which blocks have been tested is lost any time you do a database recreation. (This is something I do every so often just to validate that recreation is fast and doesn’t need any dblocks.)

ts678 · April 19, 2021, 10:08pm

Thanks! That must have taken some learning of DB tables. Feel free to keep on learning…

backup-test-samples and a new option below can adjust that to leave fewer gaps, if you like:

  --backup-test-percentage (Integer): The percentage of samples to test after a backup
    After a backup is completed, some (dblock, dindex, dlist) files from the remote backend are selected for
    verification. Use this option to specify the percentage (between 0 and 100) of files to test. If the
    backup-test-samples option is also provided, the number of samples tested is the maximum implied by the two
    options. If the no-backend-verification option is provided, no remote files are verified.
    * default value: 0

Add backup-test-percentage in addition to backup-test-samples [$25] #3296 inspired the above, noting:

Checking only 1 sample per backup when there are thousands of files doesn’t make much of a dent

Backup Test block selection logic describes sampling algorithm. I hope that its approach makes sense.

upload-verification-file and the DuplicatiVerify utility-scripts can do major verification if file access exists.

The COMPACT command describes some tuning, if the default options for compact are not as desired.
Using verbose log level or capturing with log-file-log-filter are other ways of getting stats about compact.

jfparis · April 20, 2021, 5:56am

I suspect that is the case although cannot be certain. I am just reading the database, not maintaining it

jfparis · April 20, 2021, 7:33am

ts678:

backup-test-samples and a new option below can adjust that to leave fewer gaps, if you like:

  --backup-test-percentage (Integer): The percentage of samples to test after a backup
    After a backup is completed, some (dblock, dindex, dlist) files from the remote backend are selected for
    verification. Use this option to specify the percentage (between 0 and 100) of files to test. If the
    backup-test-samples option is also provided, the number of samples tested is the maximum implied by the two
    options. If the no-backend-verification option is provided, no remote files are verified.
    * default value: 0

Thank you, I was not aware of that option. My desire to know how many blocks had never been verified before (so I could ultimately verify them with the verify command) is what drove me to write this short script.