Backup valid, but still unrestorable?

canary

#1

Again, this is the worst of Duplicati.
list-broken-files -> ok
purge-broken-files -> ok
repair -> ok

Running full restore -> FAIL

Running full restore says registering a missing remote file, then it tries to recover using blocks from existing volumes and after that it fails.

Error id on restore DataabaseIsBrokenConsiderPurge. I just did that (again), nothing changed.

There’s something inherently and logically broken here. How the backup can be broken and valid at the same time?

Please fix issues like this, because these are absolute trust and reliability killers. It’s very bad that everything seems to be ok, until you really need it and then boom, you’re let down. This is the worst kind of trap you can build into backup system.

Version 2.0.4.15


Fatal error System.Exception: Detected non-empty blocksets with no associated blocks!
#2

If you’re talking about the names of the commands used, I think the short names can lead one to think they’re more universal than they are. They’re in some cases rather specific (but I haven’t seen a good writeup of capabilities and limitations – almost certainly one could not fit all of the details in the names).

At least the long repair description says it’s “Tries to repair” (and a rewriting of recreate has begun that possibly will also help repair, however I haven’t heard any news recently, so can’t offer you any details).

–list-broken-files sounds like it’s mainly a preview of –purge-broken-files, which I think are mostly done using easily obtained data such as the database and remote file listing. Testing remote files completely would require downloading everything, and rarely will anyone be willing to tolerate the time and transfer.

The TEST command of all files with –full-remote-verification is quite thorough – and maybe very slow. Longer-term, the hash capabilities that some storage provides may be a shortcut to limited better tests.

How often does duplicati test your backup archives? gets into that future idea and other sorts of testing, with different tradeoffs. Unfortunately, I don’t think there’s any test that actually tries a source file restore. This could help users who don’t try their own test restore, but could also fall victim to a huge source file.

It sounds like you might have lost a dblock file (thus the scramble for blocks), but do you have the actual error message? Sometimes exact messages can help find in the source the code that raised that issue. Restore is sometimes harder to get logs out of than backup though, but you can try pasting screenshots.

What’s the current system status? You were trying a full restore. Is there system damage in the picture?

Sorry things went badly, but with help from you, we can see if we can get the restore going better for you. How much of a hurry is this? Are you aiming for a recent version, or trying to get some historical version? Duplicati has lots of different ways to do restores (difficulty varies), and your needs can guide the choice.


#3

One more warning, testing, is still done wrong. It’s possible that full tests do pass but full restore still fails. This is the ultimate trap, I were talking about. Which undermines confidence and creates very dangerous false trust for users until the very end. I’ve seen this situation also, that’s why I do always full restore, not full test. Test is unfortunately dangerously misleading.

But back to this specific case I’m now talking about. I guess the problem might be that there’s for some reason left over i file. When the database is available, the file is ignored, but while rebuilding database it probably causes the badly written logic to fail.

Remote file referenced as duplicati-b745030a03fa640d29f2daa1849cf0f2e.dblock.zip.aes by duplicati-i59a736b5d13d47d0b91cdfdfa1ddf8cb.dindex.zip.aes, but not found in list, registering a missing remote file.

This file should get deleted, because it’s not probably even necessary and the restore process should ignore it and it shouldn’t cause it to fail.

Also about testing process, testing backup when the local database is present is also stupid, because it’s not available when you’re doing disaster recovery. These are very bad and inherent flaws in the current “solution”. It seems that the bad reliability is absolutely worst part, but let’s get the problems get fixed. Yet these are obvious and logical problems which should get fixed without reports. If the backup works transactionally and correctly, there shouldn’t be there problems ever and if there is, automatic recovery should occure. - I think I’ve written exactly same around year ago or so.

Luckily this isn’t backup set in terabytes, so it’s relatively viable to run all the tests over 10 gigabit network.

Found 1 missing volumes; attempting to replace blocks from existing volumes

Here’s fresh retested and confirmed results:
Full Restore: FAIL - Code 100, unable to recover
Full Test: OK - 0
List-Broken-Files: OK - 0
Purge-Broken-Files: OK - 0
Repair: OK - 0
Full Restore (again after all checks and tests): FAIL - Code 100, unable to recover

This is perfect example of ‘trap’ product, which wastes your time and resources, and doesn’t provide any value when actually needed. That’s why this kind of critical flaws shouldn’t exists by design.

About version I’m restoring in this case, always the latest version, which makes sense in case of DR restoration. That’s also the reason why if such problem is detected, the potentially missing blocks should be automatically and instantly replaced (when running next backup) because the required data is still available on the sources system.

As mentioned, I’m just TESTING, I’ve got all the data I need and no actual need to restore. So this isn’t actually major problem for me. But as programmer, data administrator, it manager and guy who greatly cares about data integrity and system reliability, I find these problems absolutely devastating.

  • I remember that compression program, which gave great compression results in all tests. It worked really great, it was fast, and gave incredible compression ratios. Under the cover it just referenced the source files. As soon as the user deleted the source files, extracting the archive failed leading to data loss. - Funny stuff.

Edit and continued: test was run with “all” and --full-remote-verification=true even adding --no-local-db=true doesn’t change anything, still passes. --no-local-blocks=true, still passes.

Which option removes excess files from destination folder? I think I’ve seen discussion about this, but I couldn’t find the parameter quickly. I’m also strong advocate of software which goes and “poops” around, leaving junk everywhere. If something isn’t necessary, it should get deleted.


#4

You might be thinking of –auto-cleanup but I’m not totally convinced you have excess files when it says you have a missing file. You could try feeding both the missing dblock name and the index file referencing it into the affected command, for example by opening the job’s Commandline. Change the command and put the remote files where the source file list would have been when it was the backup command which is default.

Inventory of files that are going to be corrupted gives an example where a damage forecast is done before intentionally making a missing dblock file then recovering. Your --list-broken-files having no complaint does suggest maybe you have an extra dindex. You could also test moving it elsewhere to see if it helps or hurts.

If you really like, you could make a database bug report, put it somewhere, then link. I can attempt to study.

Your failure, if I follow correctly, was on direct restore, so the following might be relevant to getting your log:

View logs for direct restore from backup files

Whether or not even –log-file-log-level=profiling might help me (there’s SQL but it’s hard to read) isn’t clear.

It would be wonderful if Duplicati someday got better at its repair capabilities, but writing code takes time…


#5

That’s what I thought before. But nice that you thought the same way. Now I had time to recheck it and confirm. Restore was completed in 30 seconds, I just had to manually remove the “offending file”. Which clearly proved three points.

  1. File deletion is broken, it doesn’t work transactionally
  2. File cleanup is broken, after step one failed
  3. Restore / recovery logic is broken, after steps 1 and 2 failed.

That’s three overlapping bugs causing one problem. Correctly working program should have had three changes to fix the situation before I posted to this forum.

It’s confirmed and there are three bugs to fix.