Because 90% gets mentioned (sometimes phrased as “last 10%” here), I’ll refer to my post above to say that I think the 70% point is where the dblock fetching runs in three passes, with the final from 90%-100% according to code here. All of this tries to save your data as much as possible, but definitely has its costs. The progress bar is hard to get right. One never knows how much of the last 30% will actually be needed.
What’s specifically wasteful is that, in some cases, I think it tries a fruitless chase, looking for something that will never be found, and if that’s really so, I wish it would recognize that and give up on the searching. This might be hard in the general case, but if the only one is an empty file (or -1 VolumeID), special-case.
I’m not a Duplicati developer, and not an expert in the core design, and my SQL is also not up to the task. The latter two items might be true for most of the small number of active developers. It’s a resource limit.
Awhile ago, the lead developer set out to rewrite recreate and/or repair, but I have no idea where that is…
Repair is downloading dblock files was my stab at analysis, plus a question on how empty files get done.
In current topic, this was another report after more testing, and with pictures to help with the visualization. Some of the people on this thread might look in their own databases to see if they’re seeing such oddities. Preventing them might be ideal, but dealing with them if they happen might better help existing backups…
While I’m offering free advice, I’ll also point to this, where I suggest a –blocksize increase for big backups, intended to reduce the overhead from trying to track lots of 100KB blocks. Maybe default should increase, assuming benchmarks confirm it helps. There’s no performance test team or well-equipped lab though…
Basically, speed is a problem in at least two parts. One is scaling, and tuning measures for its slowdown. Another is a possible bug which sends Duplicati off to download dblocks where it will download all in vain.
There might be other cases that cause that, and some volunteer could move their database aside to see whether they can find a recreate that runs the third pass (maybe a bad sign). The code looks like a log at –log-file-log-level=verbose (possibly due to a bug that doesn’t show lines at information level) should say:
ProcessingAllBlocklistVolumes
(twice on consecutive lines, with the first one possibly listing ALL the possible dblocks, which it will scan)
If anyone can find a test backup not involving an empty file that causes those, that would be good to study.