Database rebuild

Ok - I get it now. Thank you for the additional clarifications. What I had gathered from the material was that the information in index files was based upon zip directory listings. What I was missing is that files have an actual content, because at one moment in time I had tried to open the entry in vol/ in a dinxed file but unzip whined that that entry was corrupted. Probably a random event, that wasted me a couple hours of research.

So, mainly for my future reference, let me summarize all that information:

  • There are volumes, which are compressed and optionally encrypted archives containing either backup lists (dlist), any kind of block (dblock), or indices and blocklist blocks (dindex)
    • These are simply files in the remote
    • dlists have a list of all the source files contained inside a backup, with their hash, metadata hash, and if necessary the hashes of the blocklist(s) that must be used to put them back together
      • They are represented as Filesets, where each file is associated to the set by a FilesetEntry
      • Filesets are also linked back to their source dlist file
    • dblocks have blocks named after their hash
    • dindexes have blocklist blocks named after their hash in list/ and an index of their dblock named after the dblock in vol/
    • They are represented as Remotevolumes in the database, where dblocks and dindexs can also be linked together through IndexBlockLink
  • There are blocks, which are binary blobs containing either source files, metadata, or blocklists (i.e. lists of blocks which compose a source file)
    • Blocks exist in the remote in volumes, either in a dblock (all types of blocks), or in the list/ directory of a dindex (only blocklists)
    • They are represented as Blocks in the local database, belonging to a Remotevolume. In the case of blocklists, which belong to more than one volume, they are (as far as I can see) always represented to be part of a dblock, even when they are also part of a dindex
  • Source files are… well, we know what they are :slight_smile:
    • They are listed for each backup in the backup’s dlist file
    • In the same file, they are linked to their metadata
    • In the same file, they are either listed to a single block through its hash (which is also the file’s hash), or to a list of blocklists, through the hashes of the blocklists
    • A file is represented as a FileLookup, which belongs to a Fileset and has a Blockset
  • Metadata exists only as part of a source file, in the form described earlier
    • It is represented as Metadatasets
  • Blocklists are binary files as described earlier
    • They are represented by Blocksets, where each block is linked to is by a BlocksetEntry (which obviously points to a Block)
    • Note that this system parts form the remote representation: when there is more than a blocklist for a file, that’s represented as a single Blockset in the database, which thus has the hash of the file; and when there is no blocklist for a file, a Blockset containing a single block is represented, with the same hash as the block and thus the file
    • The above is also true for metadata, represented as Metadataset (which however is not a set in the same sense as a Blockset or a Fileset), which points to a Blockset with a single Block, having the same hash as the Blockset itself
    • The blocklists themselves are not represented as Blocksets with a single entry (i.e. there are no Blocksets with the same hash as the hash of a list), but as BlocklistHashes, each associated with a Blockset
      • This means that not every Blockset has a BlocklistHash, because as I mentioned earlier some Blocksets are made up for the sake of normalization and don’t have a corresponding representation in the remote - or a corresponding blocklist
1 Like