Schema/DB question

My backups have been bigger than I expected, so I wanted to answer the question “what files were added/changed in the last backup?” I started poking around my sqlite file, and I was able to answer the question, but I also ran into some surprises in the data I found. I’m wondering if this is expected, or if it’s indicative of some bug or corruption. I’m doing everything on a copy of the file, and duplicati had been idle for hours, with no journal on disk when I made the copy, so I would expect everything to be consistent, but it’s certainly possible sqlite doesn’t behave the way I expect somehow.

My most recent backup included only one Remotevolume of type Blocks, with ID = 4914. I discovered that there are blocks in this volume which do not appear to be associated with any blockset:

sqlite> select * from block where = 15206372;
sqlite> select * from blocksetentry where blockid = 15206372;

This is not the only such block; there are a lot of them:

sqlite> select count(*) from block where block.volumeid = 4914 and not in (select blockid from blocksetentry);

In fact, there are similar orphaned block ids in nearly every volume I checked.

Is this normal? Is it possible my backup is missing the files which these blocks were part of?

That is probably “indirection blocks”, which are “blocks that describe other blocks”. I have written about the need for them (it saves space) in a number of places, but it is perhaps easiest to follow from an example:

Look for the section “Processing a large file” to see the explanation and example.

You should be able to find the block in question with a query like:

select * from blocklisthash where hash = (select hash from block where id = 15206372;

Looking back at the database schema, this would probably have been much easier to understand if the blocklisthash had a BlockID field instead of just the Hash field.

It’s easier to use the COMPARE command to view the differences between any 2 backup versions.

Thanks, I didn’t know about that. But it seems to require a lot more security context, such as the application key and encryption password. Everything I want to know is stored locally in the sqlite database, so this seems unnecessary. Does this command use the local data?

It seems to, at least in my test which was commandline with --console-log-level=Information which would show backend events such as Get. I also gave it the wrong encryption password and it didn’t care. YMMV.

It does the compare locally, if the database is present. If there is no local database, it needs the credentials to connect remotely. You can either omit the credentials or put bogus ones in, if you know the --dbpath parameter points to a valid database.