Database repair - taking days

ts678 · July 25, 2019, 2:17am

What Duplicati version is this? Speaking only for the recreate of the first paragraph, slowness may be:

Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747
This was fixed in v2.0.4.18-2.0.4.18_canary_2019-05-12

After this bug is removed, database creation for Recreate or the restore you tried is supposed to be possible using only the relatively small dlist and dindex files, unless there is info missing after those. Viewing the Duplicati Server Logs by About → Show log → Live → Information shows downloads.

“Unexpected difference in fileset” test case and code clue #3800 is a case where compact had a bug which could happen when it ran, and it runs somewhat invisibly (it’s in the logs) as needed in backups.
This was fixed in v2.0.4.22-2.0.4.22_canary_2019-06-30, but there are probably other ways to get this.

Current manual sometimes-a-fix recovery is to delete that version, e.g. with backup job Commandline. Sometimes the issue then moves to a different version, so sometimes it takes fresh start or Recreate which used to sometimes be slower than new start (with loss of old versions), but first fix will help that.

I don’t think I saw a resume of repair mentioned. I did see a shutdown of a repair. I don’t think it resumes.

Running canary is too adventurous for most people. It’s basically first outside look at new fixes and bugs. Beta is more of a known quantity (but not perfect – that’s why it’s beta), and Experimental is Beta lead-in (following current practice which might change). So v2.0.4.21-2.0.4.21_experimental_2019-06-28 would possibly be reasonable to run, even though it hasn’t gone to beta just yet. Note that going back to 2.0.4.5 (which possibly is what you were running?) isn’t possible due to the newer database formats, but testing direct restore of some of your 2,4TB backup (#1) might go faster, if it was suffering from now-fixed issue.

Do you ever look at your backup logs to see if you get RetryAttempts in BackendStatistics? These can be invisible until their ability to hide network and server errors runs out… Sometimes –number-of-retries and –retry-delay can be increased to add more tolerance, especially if your logs show some history of retries.

Repair tempo (and backup tempo, and other tempos) can be increased for new large backups by scaling some things up to match the larger backup. Choosing sizes in Duplicati discusses some of the tradeoffs. Default –blocksize of 100KB means the 2.4TB backup will require tracking about 23 million blocks even if there are no extra blocks due to past versions. You could consider a 1MB blocksize but can’t increase an existing backup. You can change –dblock-size whenever you like though, but that’s just the block package.

Another way to speed things up is to get them smaller. Rather than a 2.4TB backup, have several smaller. Although I don’t think good tests have been done, some say things slow down more than linearly with size.

The backup should avoid the maybe-faster-now Recreate, but the “Unexpected difference” sanity test is at start of the backup, meaning it complains about what the previous run left – then you made a backup of…

Possibly restoring the database from the run before the seemingly-successful-but-latent-damage backup would avoid the “Unexpected difference” error, but it would be surprised at the backend files it didn’t make.

BUT

Repair command deletes remote files for new backups #3416 describes what can happen when an older database is slid into a newer backup. I’m not sure how the backup command handles the situation. I think it’ll at least complain about the newer files. This is mostly a heads-up if you try the restore-old-DB method.

As a side note, repair/recreate are being rewritten, but it’s been slow going, and I have no completion date.

Here is some past advice from the main author, who suggests Recreate. My much-less-informed idea is that the same problem mentioned earlier of repair doing damage to remote files when database is too old might also happen to remote files when database repair was interrupted, leaving it in an incomplete state.

Some people have reported that starting again is faster than recreate, but that might have been before the recreate bug cited earlier got fixed. It depends on your network as well. Some download fast, upload slow.