Backup valid, but still unrestorable?

This is likely to be related somehow, to compaction and this data set getting “broken” whatever it means after all.

The initial situation was caused by terminated compaction. (Generated by earlier versions, which did have the journal issue). I’ve updated since that. Before running repair, compact, and rest of the operations. My activity was triggered due the index error below.

Missing file: duplicati-i12764fec0f664e6185a416acf7b1850e.dindex.zip.aes
Found 1 files that are missing from the remote storage, please run repair

And repair uploaded 541 byte file, as said, I’ve earlier connected this to restore failure.

  Listing remote folder ...
  Uploading file (541 tavua) ...

As we all know, this is basically empty place holder, which shouldn’t hurt.

After this compaction finished without problems.

And then we’re in the situation I described bit earlier.

I also decrypted the “broken” b0366 … file, and zip verify passes and extrating it passes also without any problems.

If necessary, I can privately share the dlist and the b0366 index (not data), if anyone thinks that would be helpful. Those must be kept private and deleted when strictly not needed for debugging / analysis, but technically those do not contain anything particularly sensitive information.

… Then to the testing and fixing …

I took copy of the backup set, tried to restore it, just as previous. → Fail

Then I did run repair to rebuild the database locally, then I did run list-broken-files, prune-broken-files and repair again.

And then I decrypted the dlist files, and now I can diff these with the original one from the “broken” set. There are some differences, because data got deleted in the process.

Restore now “works”, by running repair, removing some data from the backups and then running restore. But now at least some data (ie one large file) is missing. It’s still not a good result at all. Restore should work without deleting data from backup.

Technically the content of the data file does not matter, what matters is its zip directory. If you can share that (it’s basically a list of hashes…), it’s enough. Like that:

unzip duplicati-b5dd653ae13bd4a1f837f0c9daa8cc1f7.dblock.zip
Archive:  duplicati-b5dd653ae13bd4a1f837f0c9daa8cc1f7.dblock.zip
  inflating: manifest                
  inflating: Ma3-QBwsRK0_BJSfeLtsnwAIZZHt6jIOzfu5o3otqJY=  
  inflating: AAFRcFERQnUtY1EC6zDh1lvNRwfzxKXYAmG5gqogqwI=  
  inflating: SRLfjQeJoK24g05Jww_vTuReIOpxyx9np8Hfum9pFKo=  
  inflating: GREhNzIj0EAgYjatcoSvR3J0s5-vy7lctkcSvUyFTIE=  
  inflating: CUJRLy8eWJ9R1lPZpk5Wi0esDqB1daiFMZXArhC4LPM=  
  inflating: z8LlKVvckpKACdgauIB-_oDcRK1lPe3OQYoa45wJaxc=  
  inflating: LJK1DgZH1lnz0vMq8Q_U6wbTe0WrJU026JBl56jpHqo=  
  inflating: KYdoJQxT2oIDxM7jZDlD0ocEhQogIFW1mRvVW37ffuQ=  
  inflating: u8vkG0O4L01uhzeJQyEZqEz_2Z_GFLBzXy0m6Lvn8Yg=  
  inflating: ZkkgPs4esd_qam2Syb3w9j5R2nGw2Hy7N_Y1SCyrXVk=  
  inflating: msTy9q-kd_8E0SSmS7W-ad-e7s8cjIls0ISdLuwDIbI=  
  inflating: -ZuLHRZdRosror_hwXHU4PpVvQ8sxQXl997JlxU_q2o=  
  inflating: YNTn65Qjs_bsx_eMp9nYR_a0S4ZSl6Ow1dfUIPwobcM=  
  inflating: m5f-9yMcej4oI_ZGMQf6quyqkvPqpUWuirSPwey-4YI=  
  inflating: F7_5qJbDY9qehN8SgatEiZWUvtscVAzVKL3_hq_vAUs=  
  inflating: 50yObb4sRUnzefC_-GQgJvuYDiK1AOHF0m2G9LSqH0A=  

(etc… the full list should be available, with the dindex file)

In this report of what works and what doesn’t work, it would be helpful to distinguish whether the act is before or after the failed restore. Yes, it should detect before, but fixing up unexpected fail is good too.

Please see what (if anything) seems more effective after it’s actually broken. This might depend on the process. I forget whether Direct restore database can be seen, but a normal recreate certainly can.

I’ll need to read the rest of this again, since it’s been over a week now. I might edit it some more notes.

Well, always when I run restore, I run restore with no-local-db option, because restoring with local-db is kind of pointless, because … yeah no need to repeat obvious things.

Here is the build (valid for 3 days):

But I do have great news.

2024-01-15 08:31:31 +02 - [Warning-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-WrongBlocklistHashes]: duplicati-i2f4f78e1f1e44044aadd3b82a9c6b315.dindex.zip.aes had invalid blocklists which could not be used. Consider deleting this index file and run repair to recreate it.

It detected offending dindex file and successfully restored the backup.

Just to be 100% sure, I retested with the exactly same command line with the canary version and it fails saying:

2024-01-15 08.44.53 +02 - [Warning-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-UpdatingTables]: Unexpected changes caused by block duplicati-b0366966e93db4129b6ce9f5798fbe9d2.dblock.zip.aes

This is good, really good.

It just makes me smile, how good my educated guesses about the issue causes were earlier. :wink: Yet as I said, I’ve been there and done that, broken transaction stuff I mean.

That is good to hear (and lucky timing that we just found this). That unfortunately confirms my theory that this bug will happen without any warning in existing backups, but luckily it can be fixed in retrospect.

thanks for reporting, I hope to get a Canary out during the next month with this fix.

This one was deficient SQL, but as an SQL dev you’ve probably seen that too.

I guess this means at DB recreate time? As a hidden defect in a dindex, they’re hard to see otherwise. Does anyone want to actually look at the dindex to see if it fits the predicted corruption, or is it certain?

Does that mean after the current one goes Beta, or is it a second Canary (meaning a change of plan)?

EDIT 1:

An earlier “in retrospect” chance would be in a test operation using --full-remote-verification, maybe even taking advantage of the latest value (whose name I forgot, and which isn’t in manual yet) which doesn’t require the heavy load of downloading all dblocks. Maybe this works now – I didn’t test.

EDIT 2:

Canary option value is ListAndIndexes according to the command line help. Also mentioned above.

EDIT 3:

The test operation would also want all if this was done, otherwise it’d be hit-or-miss whether it saw it. Repair command seems like it wouldn’t be able to see it, as it wouldn’t be looking inside dindex files…

EDIT 4:

Or the other view is that one should occasionally recreate to see what happens. If it says fix the dindex, then do it. The original database should have the right info. Has anyone tested the recreated database?

Today I did run many tests, and it still seems that the repair process is broken. It says, that everything is good, and doesn’t take necessary actions, and restore fails.

For many backup sets, repair fixed something, but still left these errors:

Unexpected changes caused by block duplicati-b670043e609644e659c4eb3ec31021a20.dblock.zip.aes

And for some backup sets, situation got worse:
Recreated database has missing blocks and 1 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

Repair didn’t fix the broken backups, nor even rebacking up the latest version.

So I think it’s still logically slightly broken. Which is of course not good → it’s broken.

Only way to recover was the classic way, delete everything and restart. - Not great at all.

Yet I hope the new version doesn’t create corruption like this anymore. But it’s still unable to recover from existing corruption. Few % of backups sets were corrupt but repairable, and a few unrepairable just as mentioned above. - That’s still totally unacceptable number for for backup tool.

Repair steps:
Update duplicati
run repair
run list broken
run purge broken
run repair
run backup

And even after this, everything seems to be ok, EXCEPT restore.

Repair isn’t a backup, and it’s also not:

I was wondering if you did that, then I saw that you listed a bunch of things (including those), then the:

What happens? I only find the word “restore” in the first and last lines here. Need some details about it.

is going to need a longer look. There are a few mentions of it in the forum. Most of them are in this topic.

EDIT 1:

I’m losing the chronology, but some history here is:

That reaction puzzles me, but I don’t have all the context. Regardless, available developers have changed. Probably I’ll ask if the current ones can dig in a little further. I might try to as well, but it won’t likely be fast…

EDIT 2:

was probably the change (this developer has left, unfortunately), and the discussion behind it might be at

I’m hoping some of the devs can help to reconstruct what was going on here, and what current situation is.

I hope the latest versions do not anymore create the mess I’ve suffered so much about, in several threads.

The latest post was about the situation when Duplicati gets the state messed up, there’s no any other sane method to recover from that, than deleting backup and restarting from clean.

As discussed several times earlier, it was most likely caused by multiple overlapping flaws in the program. But to be honest, after updating all the clients to the latest version. I haven’t knock knock, seen those problems for quite a while. It doesn’t mean that the problems do not exist. But at least it’s likely that those won’t occur so often anymore. Earlier it was highly likely that if compaction was aborted it lead to trashed database (and backup data).

I’ll promise, that if I’ll be in that mess again, I’ll get back to it. Let’s hope it’s all past now.

I reminded myself about that, and run some more extensive restore testing. I found a few older clients still. So I’m not 100% sure, that all of the clients causing the problems were latest beta version.

Problem again.

  1. Backup up
  2. Test ok
  3. I also started that the number of files that the backup and test reported, both match with server
  4. Restore from backup - Failed - I still don’t understand what’s the logic behind this trapping.

Note! The restore was attempted with the latest canary version, hoping that it would work better.

And this isn’t one off case, I found five cases, which ALL got exactly the same problem. Backup is can’t be restored even if it’s claimed to be 100% good. - Not amused at all.

As stated earlier, it’s likely that using different recovery methods, restoring the backup would likely to be possible. Main question is, why the normal restore simply doesn’t work as anyone would assume it to do.

At least now I’m not 100% shocked about this result anymore. Because I belive the data isn’t lost. It’s just harder to recover. But for 99,95% of normal non-technical users, this is same as data loss.

1st failed set:
Found inconsistency in the following files while validating database: 

2nd failed set:
Recreated database has missing blocks and 12 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database. => Recreated database has missing blocks and 12 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

3rd failed set:
Recreated database has missing blocks and 4 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database. => Recreated database has missing blocks and 4 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

4th failed set:
Recreated database has missing blocks and 12 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database. => Recreated database has missing blocks and 12 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

5th failed set:
Recreated database has missing blocks and 11 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database. => Recreated database has missing blocks and 11 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

The fewer older clients I encountered, I upgraded to the latest beta version. And again, as stated, the restore test was run using latest canary, because I hoped it would finally be able to restore the backup(s).

Somehow I can’t still believe this is going. Oh no! Anyway…

There still has to be systemic logic fail somewhere.

Edit: I do have debug log, if anyone is interested. But it doesn’t contain the detailed SQL statements.

The errors you are seeing could be a symptom of this bug I fixed in a PR, but it has not been merged yet:

The part from the bug discussion that might apply in your case:

I am quite surprised that this corruption bug is still not merged, but I haven’t followed the development closely for a while. @kenkendk maybe you could look at the PR for the next canary? It only lacks some testing with bigger databases, the root cause is well understood. Most of the changes are to detect corrupted blocklists in retrospect to fix existing corruption.

1 Like

It is on my list of tasks to handle before a stable release:

I have been pinned down by lots of details and bugfixes from the .NET8 + Kestrel upgrade, but it looks like it is getting in shape:

1 Like

I’m pretty sure it’s in the new 2.0.9.107. Maybe @Sami_Lehtinen can test it on troubled restores. Someone more familiar with the code may have to give assistance if it’s harder than “automatic”.

Re-tested. And for sure, restore now works as expected. Got some messages while rebuilding the database for disaster recovery restore test:

duplicati-i174b98bbbd2f4cfcb449a4f1285818cd.dindex.zip.aes had invalid blocklists which could not be used. Consider deleting this index file and run repair to recreate it.
Unexpected changes caused by block duplicati-b278fddf65249422b85a22bf294342231.dblock.zip.aes

But all five backup sets which failed to restore earlier are now successfully restored. Without any other repair actions being taken meanwhile, or the backup destination being modified in anyway. I also used some of the older sets, I stored for debug reasons, when those were failing to restore.

I also did run the restore with and without parameter repair-force-block-use. It also worked as expected and practically seemed unnecessary now. On top of that I made just a few partial restores where most of data was already in the restore destination, but I just deleted / changed content of a few files and checked if state is correctly restored by restore process, and it is.

On top of that, many of the issues initially causing these problems with block and index files being left or missing due to bad transactionality and recovery have been fixed earlier.

:sun_with_face: Approved :sun_with_face: Finally :sun_with_face:
:white_check_mark: Issue ticked closed :white_check_mark: