I am testing restoring a backup from Box and Google Drive currently, on same the host as the one with the Duplicati backup database, and I’ve been watching the restore folder where all ~2 million files have been restored.
Duplicati now switched to a stage called “Downloading files…” and watching the log, I’m seeing lots of these messages for all the files:
Recording metadata from remote data:
However, I am pretty sure that I also saw dblock downloads in between all these.
I did check the box to restore permissions for this backup, so my question is this: does Duplicati download dblock files to restore and then separately redownload them to assign metadata information, such as user:group and file permissions? If so, this seems like an operation that could use some optimization.
Above used backup from a bug chase using special settings – a 4 MiB uncompressible file, 1 MiB remote volume size, and 256 KiB block size. So 3 blocks per remote volume, plus maybe a bit more. Specifically, small things like metadata can fit in a dblock even if it already has three 256 KiB blocks in it. The math for blocks says 16 are required, so that means 5 dblocks of 3 blocks per dblock, and 1 at 1 block per dblock.
Each of the 6 dblocks is opened, and whatever is needed is taken from it. File patching happens 6 times, and one of the dblocks carries the bonus of a metadata block which is recorded, then applied at very end.
You can test this on your restore using live log at Information level (and a sharp eye), or more casually by –log-file and –log-file-log-level=information, then do Linux egrep or similar on the log using expression of:
I was observing Patching file with remote data while the (pretty quick) first portion of the restore was running and creating the files on the disk. Once this initial and quick stage was finished (1-2 hours for the ~400GB ~2 million files), the 2nd stage consisted only of Recording metadata from remote data and took ages. I checked this morning and the total restore took: Last successful restore: Yesterday at 11:54 PM (took 13:02:01).
During this time, all restored files were owned by root. This morning, once the restore was fully finished, I checked again and found the ownership/permissions to be correct.
I wasn’t using a log file, so unfortunately I can’t go back and look, but I can try to restore again with a log and see if I can spot the issue again. I’ll update to the latest experimental first though due to Recreating database logic/understanding/issue/slow.
OK, here we go. After patching data, it moved to the “Recording metadata from remote data” stage and started downloading all the dblock files. So the metadata is stored together with the raw files inside dblocks and not separately somewhere (perhaps in the database or a separate set of files that would be much smaller to download)? Seems like there could be some optimizations to make here.
That would be my assumption, but I haven’t researched or tested it. In a situation where the restored isn’t exactly the current file (e.g. current changed since backup, or you’re restoring am old version), I’d expect some downloads, otherwise it could probably get all the same file content from the unchanged local file…
Quite possibly so, but there are plenty of other issues and feature requests (pushing 1000 probably) to compete with it for limited (far from ideal…) amounts of time that the development volunteers can give.
Thinking through use cases more than we can right here would be useful. Full disaster recovery would probably download a lot. Recovery of a few files due to a loss or “oops” wouldn’t typically take that long.
Please also note that this is not just an initial question of how blocks are packed into dblocks. Compact operations would also have to treat blocks as different types, when time comes to repack new dblocks.
Test restore done to original system at different location probably shows this idiosyncracy the best, but using --no-local-blocks (as one should to get a better test) will probably look pretty balanced in its work.
The transition point between FilePatchedWithLocal messages phase and remote dblock phase is here:
Or at least I think so. You can test your restores if you’re curious enough, to see if this pattern matches.