Backup unusable: Duplicati tells me to repair, but repairing doesn't work

After upgrading to Canary 2.0.3.8 or 2.0.3.9, running a backup job results in corrupted backend files that don’t seem to be repairable.

After running a backup, the job fails with

Duplicati.Library.Interface.UserInformationException: Found 8 files that are missing from the remote storage, please run repair
   bij Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList(BackendManager backend, Options options, LocalDatabase database, IBackendWriter log, String protectedfile)
   bij Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify(BackendManager backend, String protectedfile)
   bij Duplicati.Library.Main.Operation.BackupHandler.Run(String[] sources, IFilter filter)
   bij Duplicati.Library.Main.Controller.<>c__DisplayClass15_0.<Backup>b__0(BackupResults result)
   bij Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   bij Duplicati.Library.Main.Controller.Backup(String[] inputsources, IFilter filter)
   bij Duplicati.Server.Runner.Run(IRunnerData data, Boolean fromQueue)

On 1 of the 2 systems, I was able to fix this using these steps:

  • Stop the Duplicati service
  • Downgrade by downloading the Zip version of 2.0.3.5 and copy the files over the current version
  • Start the Duplicati service
  • Manually delete the local DB file
  • Repair the database. The database was recreated and ended with an error indicating that some remote files were missing.
  • Run list-broken-files and purge-broken-files from the commandline menu item in the Web UI. Some files were found and purged successfully
  • Backups run fine on the first system with trhis symptoms.

On the second system I can’t get the backup working again. Running a backup ends with the error message above, complaining that some backup files are missing.
When running list-broken-files or purge-broken-files from the commandline in the Web UI, no files are listed or purged.

Repairing the database ends with error “Found 8 files that are missing from the remote storage, please run repair” after one or 2 seconds.

Running the repair command from the “real” command line returns this:

C:\>"C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe" repair "webdavs://<storage-url>?auth-username=username&auth-password=password" --dbpath="C:\Program Files\Duplicati 2\data\DBNAME.sqlite" --passphrase=passphrase
  Listing remote folder ...
Failed to perform cleanup for missing file: duplicati-b013ebd640a63445493919358fcd3c745.dblock.zip.aes, message: Unexpected empty block volume: duplicati-b013ebd640a63445493919358fcd3c745.dblock.zip.aes => Unexpected empty block volume: duplicati-b013ebd640a63445493919358fcd3c745.dblock.zip.aes
Failed to perform cleanup for missing file: duplicati-b897659fd532e4d1fb04186e820f01b57.dblock.zip.aes, message: Unexpected empty block volume: duplicati-b897659fd532e4d1fb04186e820f01b57.dblock.zip.aes => Unexpected empty block volume: duplicati-b897659fd532e4d1fb04186e820f01b57.dblock.zip.aes
Failed to perform cleanup for missing file: duplicati-b845b23e7cc754202bc6a7706a08d4cc4.dblock.zip.aes, message: Unexpected empty block volume: duplicati-b845b23e7cc754202bc6a7706a08d4cc4.dblock.zip.aes => Unexpected empty block volume: duplicati-b845b23e7cc754202bc6a7706a08d4cc4.dblock.zip.aes
Failed to perform cleanup for missing file: duplicati-bf71db3a541f14f358c45ed293e2f3e33.dblock.zip.aes, message: Unexpected empty block volume: duplicati-bf71db3a541f14f358c45ed293e2f3e33.dblock.zip.aes => Unexpected empty block volume: duplicati-bf71db3a541f14f358c45ed293e2f3e33.dblock.zip.aes
Failed to perform cleanup for missing file: duplicati-bc5b9e6a4db564c04ab4878b3872aa2b1.dblock.zip.aes, message: Unexpected empty block volume: duplicati-bc5b9e6a4db564c04ab4878b3872aa2b1.dblock.zip.aes => Unexpected empty block volume: duplicati-bc5b9e6a4db564c04ab4878b3872aa2b1.dblock.zip.aes
Failed to perform cleanup for missing file: duplicati-b63f807ac1e9d4840ae05d65cd7ff0182.dblock.zip.aes, message: Repair not possible, missing 1 blocks.
If you want to continue working with the database, you can use the "list-broken-files" and "purge-broken-files" commands to purge the missing data from the database and the remote storage. => Repair not possible, missing 1 blocks.
If you want to continue working with the database, you can use the "list-broken-files" and "purge-broken-files" commands to purge the missing data from the database and the remote storage.
Failed to perform cleanup for missing file: duplicati-b1da3693a2fa94bd5b638a9a64bc7947c.dblock.zip.aes, message: Unexpected empty block volume: duplicati-b1da3693a2fa94bd5b638a9a64bc7947c.dblock.zip.aes => Unexpected empty block volume: duplicati-b1da3693a2fa94bd5b638a9a64bc7947c.dblock.zip.aes
Failed to perform cleanup for missing file: duplicati-b669dc61755f44ea48921f76a60fdedd3.dblock.zip.aes, message: Unexpected empty block volume: duplicati-b669dc61755f44ea48921f76a60fdedd3.dblock.zip.aes => Unexpected empty block volume: duplicati-b669dc61755f44ea48921f76a60fdedd3.dblock.zip.aes
Update "2.0.3.9_canary_2018-06-30" detected

list-broken-files doesn’t give any results:

C:\>"C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe" list-broken-files "webdavs://<storage-url>?auth-username=username&auth-password=password" --dbpath="C:\Program Files\Duplicati 2\data\DBNAME.sqlite" --passphrase=passphrase
  Listing remote folder …

So purge-broken-files doesn’t fix anything:

C:\>"C:\Program Files\Duplicati 2\Duplicati.CommandLine.exe" purge-broken-files "webdavs://<storage-url>?auth-username=username&auth-password=password" --dbpath="C:\Program Files\Duplicati 2\data\DBNAME.sqlite" --passphrase=passphrase
  Listing remote folder ...

Found no broken filesets, but 0 missing remote files
Update "2.0.3.9_canary_2018-06-30" detected

In summary, backup and repair tasks complain about missing files at the backend, but list-broken-files and purge-broken-files don’t find any files that are referenced in these missing files, making it unrepairable.

This sounds like what happened over here:

Did you check that the the 8 missing files for backup 2 are REALLY missing from the destination?

My guess is that there are some dindex files reference missing (or empty) dblock files and that moving those dindex files elsewhere (or deleting them, if you want to play it lose & wild) will let the list-broken-files and purge-broken-files commands work as expected.

1 Like

That sounds reasonable.

But how can I find out which .DINDEX files reference to one or more of the missing .DBLOCK files? And which of them should be deleted/moved?

The backup has 169 versions, so downloading the .DINDEX files one by one, decrypt them and find the .BLOCK filenames using a text editor would take ages.
I’ve moved the .DINDEX file with the most recent time stamp to another location. Backups still fail, recreating/repairing the local DB ends with the same error messages and listing broken files still doesn’t work, so I moved it back to the original location.

All 8 remote files indeed don’t exist at the destination.

The problems started after upgrading to a version that supports concurrent processing. A small number of backups worked fine, but after a few days backups start failing.
To me it looks like something goes wrong with the interaction between processes running at the same time. For example, uploading of a remote volume starts before the creation of the archive is completed, or an archive is deleted before the upload is completed.

Any thoughts about what to do next?

That’s odd - running a Repair on a backup with missing dlist or dindex files should regenerate those files.

I’m trying to replicate your failure scenario so I can test if moving ALL the dindex files will let list-broken-files work, but I haven’t yet gotten the same “Unexpected empty block volume” or “missing 1 blocks” messages you have.

1 Like

I’ve finally managed to fix my backup!
Your suggestions have put me on the right track, so I’ve marked your post as solution. Thanks for the help!

As mentioned earlier, deleting the most recent .DINDEX file didn’t help. I ended up with deleting all .DINDEX files manually from the destination, deleting the local database, rebuild the database and start a backup manually.

It looks like no data is lost and backup jobs run without problems now.

However, IMHO 2 things should be fixed:

  • To recover from this issue, I had to manually delete files from the destination. I guess it should never be necessary for the end user to access files on the destination.
  • The REPAIR command should detect what was wrong and should have fixed the problems without manual intervention from the end user.

@kenkendk: Do you have any clue what could have happened to my backup jobs? The problems started on multiple systems after upgrading to one of the latest Canary versions. I guess there’s some problem with handling concurrent processing. It looks like one or more .DINDEX (or.DBLOCK) files are/were corrupted or uploaded incomplete.
If there is corruption in some DBLOCK files, the corruption must be found before they are needed during a RESTORE operation.

/EDIT:
Just ran a TEST command for all samples with --full-remote-verification. Everything seems to be ok with the remote backup files:

image

I’m glad things are working for you again, and I agree - the process is definitely NOT user friendly.

The issue is that the repair commands are based on the assumption that something is missing, not corrupted - so they don’t trigger in situations like yours. Ideally, I would expect that when a corrupted file is found it should be recorded somewhere in the database so that subsequent purge or repair commands can use that as a starting point rather than just checking for missing files.

Of course that’s probably quite a bit of coding, so something that might be a good intermediate thing could be to provide a method of knowing what dindex or dlist files are associated with a problematic dblock file. It would still involve manual tweaking at the destination, but it would be a “surgically guided” process rather than the shotgun approach you needed to use.

Oh - and yes, from at least 2.0.3.6 (multi-threading was introduced) to 2.0.3.9 (current newest version) a number of odd issues have cropped up. Many seem hard to replicate so might be dependent on thread settings.

While even canary versions of Duplicati are normally very stable, I’d recommend AGAINST using 2.0.3.6 - 2.0.3.9 unless you are interested in helping us debug these threading issues (potentially losing backup data in the process).

There was a race condition in one of the versions that created incorrect dindex files. I suspect that you had some dindex files that were incorrect, and that trips the repair as it relies (initially) on the dindex files.

Completely agree. Users should never manually touch files on the remote destination.

Yes, the 2.0.3.6 was a big rewrite and caused some unexpected problems. The unittests do not cover all configurations and scenarios unfortunately.

I think we can add that feature easily, simply query the database to figure out which dindex files references the missing dblock file.

One thing I can do to help the repair command, is to “prune” the list of required blocks before attempting to download them. Currently, after we rebuild the database using the dlist and dindex files, it checks to see which blocks are missing, and then proceeds to download these. But in your case, the dindex files references something that is not needed (because it works if you wipe the dindex files).
If we remove all blocks from the database that are not needed, it should quickly be able to determine that the dblocks are not required anyway and then proceed.

I have seen some reports where I suspect that the “compact” feature sometimes creates a similar scenario (pre 2.0.3.6) and this would help recover from that problem until we can fix the root cause. (the root cause for the OP has been fixed AFAIK).

Good evening, any updates on this issue?

Working on it… :sun_behind_small_cloud:

1 Like

I’m here to (unfortunately) report that in the newest canary 2.0.4.10 this problem still exists. It actually happened to me three times. I had to revert to 2.0.3.5 :frowning: Uploads also won’t respond to the stop after uploading command. They just keep running. When I try to forcefully stop it after that (I’m not 100% sure on this) or if duplicati unexpectedly shuts down or stops for no reason, those are the times I saw this problem pop up.
EDIT: I just want to say that this happened on the first backup and that I’m running docker. Repair did not fix, and rebuild wouldn’t work because there was no dlist file yet. If there already were dlist files, I don’t know if it would be different.

Is this still your first backup?

If so, I know it’s not a great workaround, but it might make your life easier to try a reduced Source folder initially just to get that dlist file created, then expand what’s included in your backup.

@kenkendk, how “bad” would it be to upload a “partial” dlist file with each dblock during the initial (or any?) backup?

For non-first backups do you think it could cut down on repair issues for interrupted backups?

I’m not quite sure if even with an existing backup, the problem will fix. I just remembered the first time i upgraded to 2.0.4.9 (it may have been 2.0.4.5, but still very recent). I believe i had an error like this with an existing file set, however i rebuilt the database and this was my first backup using the new db. I believe the error happened when duplicati stopped suddenly. I could not fix by repairing because the repair gave off an error (I don’t remember why) and i could not fix it. I believe it was a problem with the fileset, and probable corruption both likely caused by duplicati. This is just speculation, however as I was in a panic state and less of a diagnostic one and never closely inspected the errors. I abandoned the data and deleted everything because none of the commandline options fixed it. I then tried twice more, once with the beta and the newest canary (i believe 2.0.4.10) with a new fileset. All three gave the same error when i tried to run backups, and all three gave repair errors and commandline errors. The last two were from a lack of dlist files, and probable db corruption. I forgot the first error. It may have been the same, but i cant remember.