Database repair - taking days

shalmirane · July 24, 2019, 12:10pm

First I tried to restore file from 2,4TB backup (#1) to different machine. Everything went fine until the point where duplicati started to recreate database - took several days until I needed to shutdown the machine, so I aborted the restore.

Few days ago, duplicati started complaining that it’s missing a single file it expects from different backup (#2, size of data 500GB), suggesting repair of database - “Unexpected difference in fileset version 1: 19.07.2019 15:01:45 (database id: 4), found 987 entries, but expected 988”.
This was during normal operation of computer / backups and I haven’t noticed anything that might cause disruption in backup operation or remove said file from backup storage (filename isn’t mentioned so I cannot check for that). Storage is jottacloud in both cases and ton’s of duplicati files seems to be present at the storage.

So I did commanded it to do so. Took two days until I needed to shutdown computer. After starting it later today I got “The database was attempted repaired, but the repair did not complete. This database may be incomplete and the backup process cannot continue. You may delete the local database and attempt to repair it again.” error message. So both repair and resume of repair failed and database is still incomplete.

In both cases it seems duplicati is downloading full backup and importing it at extremely slow pace.

This makes duplicati highly unreliable backup solution… While it boasts with high resistance to disruption of pretty much anything, my experience shows otherwise. Should I need to restore files from either backups I would fail. I was able to restore only from different (#3, way smaller) intact backup.

What to do to get rid of those “I expect one more file, worlds gonna end, repair db” or “Repair failed, do it again - you got the time” errors and how to increase repair tempo. And preferably eliminate need for repairs… ?

My files that the backup is created from are in perfect condition, if that helps. I also setup separate backup task to backup sqlite files after other backups are done, but don’t know how that helps with “missing one file” or new files at the storage site. My faith in duplicati handling that cool isn’t strong…

shalmirane · July 24, 2019, 1:03pm

New one - “The database was attempted repaired, but the repair did not complete. This database may be incomplete and the repair process is not allowed to alter remote files as that could result in data loss”.

So rather than fixing remote database I should delete all local and remote files and then upload whole (500GB) backup again as completely new backup (that might fail anytime again)?

ts678 · July 25, 2019, 2:17am

What Duplicati version is this? Speaking only for the recreate of the first paragraph, slowness may be:

Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747
This was fixed in v2.0.4.18-2.0.4.18_canary_2019-05-12

After this bug is removed, database creation for Recreate or the restore you tried is supposed to be possible using only the relatively small dlist and dindex files, unless there is info missing after those. Viewing the Duplicati Server Logs by About → Show log → Live → Information shows downloads.

“Unexpected difference in fileset” test case and code clue #3800 is a case where compact had a bug which could happen when it ran, and it runs somewhat invisibly (it’s in the logs) as needed in backups.
This was fixed in v2.0.4.22-2.0.4.22_canary_2019-06-30, but there are probably other ways to get this.

Current manual sometimes-a-fix recovery is to delete that version, e.g. with backup job Commandline. Sometimes the issue then moves to a different version, so sometimes it takes fresh start or Recreate which used to sometimes be slower than new start (with loss of old versions), but first fix will help that.

I don’t think I saw a resume of repair mentioned. I did see a shutdown of a repair. I don’t think it resumes.

Running canary is too adventurous for most people. It’s basically first outside look at new fixes and bugs. Beta is more of a known quantity (but not perfect – that’s why it’s beta), and Experimental is Beta lead-in (following current practice which might change). So v2.0.4.21-2.0.4.21_experimental_2019-06-28 would possibly be reasonable to run, even though it hasn’t gone to beta just yet. Note that going back to 2.0.4.5 (which possibly is what you were running?) isn’t possible due to the newer database formats, but testing direct restore of some of your 2,4TB backup (#1) might go faster, if it was suffering from now-fixed issue.

Do you ever look at your backup logs to see if you get RetryAttempts in BackendStatistics? These can be invisible until their ability to hide network and server errors runs out… Sometimes –number-of-retries and –retry-delay can be increased to add more tolerance, especially if your logs show some history of retries.

Repair tempo (and backup tempo, and other tempos) can be increased for new large backups by scaling some things up to match the larger backup. Choosing sizes in Duplicati discusses some of the tradeoffs. Default –blocksize of 100KB means the 2.4TB backup will require tracking about 23 million blocks even if there are no extra blocks due to past versions. You could consider a 1MB blocksize but can’t increase an existing backup. You can change –dblock-size whenever you like though, but that’s just the block package.

Another way to speed things up is to get them smaller. Rather than a 2.4TB backup, have several smaller. Although I don’t think good tests have been done, some say things slow down more than linearly with size.

The backup should avoid the maybe-faster-now Recreate, but the “Unexpected difference” sanity test is at start of the backup, meaning it complains about what the previous run left – then you made a backup of…

Possibly restoring the database from the run before the seemingly-successful-but-latent-damage backup would avoid the “Unexpected difference” error, but it would be surprised at the backend files it didn’t make.

BUT

Repair command deletes remote files for new backups #3416 describes what can happen when an older database is slid into a newer backup. I’m not sure how the backup command handles the situation. I think it’ll at least complain about the newer files. This is mostly a heads-up if you try the restore-old-DB method.

As a side note, repair/recreate are being rewritten, but it’s been slow going, and I have no completion date.

Here is some past advice from the main author, who suggests Recreate. My much-less-informed idea is that the same problem mentioned earlier of repair doing damage to remote files when database is too old might also happen to remote files when database repair was interrupted, leaving it in an incomplete state.

Some people have reported that starting again is faster than recreate, but that might have been before the recreate bug cited earlier got fixed. It depends on your network as well. Some download fast, upload slow.

shalmirane · July 26, 2019, 1:13pm

Thanks for extensive reply, I’ll read trough all links to find the issue.

Quick reply I can give now is, that my version is 2.0.4.22_canary_2019-06-30 of the default channel, I haven’t changed that setting since install. There was update from previous version, but I don’t remember if it was before or after I created that 500GB backup set (I do have about dozen backup sets, both by data and backup / retention period).

Also recreate didn’t resumed, I just kinda expected it would resume like rest of the operations If recreation taking days couldn’t be avoided, they should…

ts678 · July 26, 2019, 1:44pm

There is no universal default channel, however when you pick a particular version for initial install (and you have to go down to “Other versions – older or testing” at https://www.duplicati.com/download to see non-beta releases), the initial install does set the default for later updates. This can be changed in Settings.

I’m not sure exactly which operations resume. I know backup does (because it’s common and sometimes interrupted, therefore it has special goals), but a fast search of source saw no other “interrupted” handlers.

Regardless, 2.0.4.22_canary_2019-06-30 is as bug-free a Recreate as there is now. Sometimes data gets lost on the backend (e.g. an index file disappears or doesn’t get uploaded), so a big search is still mounted. Time will tell how many slow Recreate operations are fixed by the fix for the false-positive empty-file bug, or continue to be slow. I don’t recall if “Direct restore” progress bar is the same way, but on Recreate-button’s Recreate, the dblock downloads and searches start at the 70% mark, and 90%-100% is its last-ditch effort.

It’s a good suggestion, and another might be to consult with the user (but what if it’s a command-line run?) before going into a big scavenging operation that seems like it might take days. I have no idea how the new recreate/repair design will handle things. Regardless, you can create a Features category topic at any time to try to gauge interest, and maybe have technical comments. There’s an enormous work backlog though.

shalmirane · July 31, 2019, 6:35am

Checking download tempo (once per 5 minutes) and number of dblocks to download (4054), it would took me 16,5 days to recreate this database. That makes dropping whole backup and reupload it several times faster, so I’ll do that - data themselves are intact.

RetryAttempts - I found none in current log, but since they grow quite fast, I rotate them after 1GB so it might be gone. What verbosity level shall I use to see those and don’t see individual files that were considered for backup (majority of logfile). I haven’t found the values description in docs.
dBlock size - As you mention I cannot change that for existing 2,4TB backup (media, splitting it doesn’t have content-related meaning), but I can try it with this 0,5TB that I will re-create anew anyway. I already have volume size increased to 128MB (dblocksize) so this one will have also blocksize (thinking of 5MB / 512MB as backup is few big files and machine has 32GB RAM if that matters).
Versions - I’ve downloaded default and only one mentioned on download page, betas are on the bottom of the page:

image.png1007×172 8.28 KB

(which currently gives 502 error, but that was the place I got original v.18 from)

Current version contains canary in name (and doesn’t show v.23 from above):

I do apparently have canary set as default:

This stations with failed backup / recreate is running windows 10 version of duplicati, first installed version vas v.18. Other five not mentioned before are running Archlinux who builds duplicati from git via AUR (en) - duplicati-latest file and I haven’t similar issue on any of them and they all run v.22 since installation, four of those even backup do AWS S3 instead of Jottacloud.

Not sure if that matters but update from v.18 to v.22 might have impact. What I wonder is why settings page show canary version as same as base install - v.18, while update section in About shows v.22 installed.

Restoring former backup - I’ve only set 1 version to keep, so I cannot try that. I’ll raise that number.
Deleting new files won’t be nice, but in this case safe since all those files still exist, next backup should just upload them as new. Thanks for warning though

In windows 10 “restart whenever MS likes, even if computer holds cure for cancer in memory” world, expecting computers with uptimes in days is no longer realistic

Unrelated - on linux I set service to run as root, since I need it to access different users data. And had some failed backups because of differences in ACL, when it run as duplicati user (already fixed).
I noticed in log on windows, that when run as my user (when I closed and reopened duplicati from tray) it fails to create snapshot - [Verbose-Duplicati.Library.Snapshots.WindowsSnapshot-WindowsSnapshotCreation]: Failed to initialize windows snapshot instance
System.UnauthorizedAccessException: Attempted to perform an unauthorized operation. - isn’t is possible that such corruption ocured because of similar issue with ACL?
If there’s some way to run the service always at administrative level…

Hope I haven’t missed any of suggestions.