Database recreate performance

ts678 · February 20, 2023, 12:30am

Maybe here:

duplicati/duplicati/blob/666b2281032460254839fdc3b6e1055fdf7ce1db/Duplicati/Library/Main/Options.cs#L38-L41


      
          /// <summary>

          /// The default block size

          /// </summary>

          private const string DEFAULT_BLOCKSIZE = "100kb";

I’d agree, although there might still be improvements possible for some areas such as recreate.
If, for example, it’s doing anything (e.g. SQL) per-block, that will really add up with many blocks.
@gpatel-fr has likely looked at that more than I have, in addition to being better at C# and SQL.

That would simplify it, if it winds up being a standalone. A less complex rewriter is described at:

Re encrypt remote back end files (referencing a Python script from forum user @Wim_Jansen)

and it also gets into more of the little details of the files. First thing to do might be to open a few.
Format doesn’t seem super complicated. A file is either a block or a blocklist that lists its blocks.
Both cases identify a block by its SHA-256 hash, sometimes directly, sometimes after a Base64.

The challenge is that a hash is one-way, so hash of (say) 10 blocks in sequence can’t be known
unless the blocks are around. Unfortunately they’re packed into dblock files, but can be undone,
however that’s more storage space temporarily while repacking data into larger blocks per spec.

Following the idea of opening a few files, unzip of a dblock can just drop all its blocks right there.
The dindex file has a vol folder with a file of JSON data describing the contents of its dblock file.
There might also be a list folder to hold redundant copies of blocklists also in its big dblock file.

EDIT:

Duplicati.CommandLine.RecoveryTool.exe is another rewriter, in C#, working on recompression.
Duplicati.CommandLine.BackendTool.exe might possibly be the basis for a file uploader where it
matters that the uploaded files wind up looking like Duplicati made them. Google Drive likes that.