I’ve got more than a few systems using Duplicati. From practical experience I’ve found a few issues which aren’t necessarily bugs. Maybe I’ve just misunderstood something. Or I’m not using exactly right parameters to run the Duplicati.
The problems are combination of multiple features and options being used together. After all, the end result is that there are very serious problems with the operation and reliability.
My conclusion is that there should be a reliable way to fix (as much as possible) the backup set, and continue backing up data. Without needing a full reset of backup sets.
-
Purge-Broken-Files - No it doesn’t purge broken files, it actually removes “missing files”.
-
Repair - Doesn’t actually do repair, it just updates the local database and remote sets to match on some level. But doesn’t do the full job.
-
Backup - After these steps, might report broken files, where hash doesn’t match.
-
Restore - Restore claims that the backup is broken even after all the steps mentioned before.
I did some tests and the end result was that 50% of the backups were broken. Which is devastating result. And this is not the situation I started with, this is the situation AFTER all the repair steps.
At least two issues remained which were probably causing issues:
5a) Hash mismatch - Deleting file, and running repair seemed to help on some level.
5b) Maybe extra i files in storage, which mess up the database rebuild. - Didn’t yet try manual deletion. Anyway these steps are absolutely huge waste of time.
What’s the straight forward method of fixing these issues?
I’ve already added full automation for repair and purge-broken-files, but I don’t know how to get the job finished.
I’ll try the deleting broken files manually and then fixing more, but it’s extremely slow and painful way.
2 Likes
Not sure I have any thoughts on your specific questions, but I have tried to mitigate local database issues by backing it up with a secondary backup job. Basically when the main backup job completes, it automatically kicks off the secondary backup job. I do this especially on machines where the primary backup is pretty large.
hello, did you try test command?
https://duplicati.readthedocs.io/en/latest/04-using-duplicati-from-the-command-line/#the-test-command
you can specify “all” in sample to test all backup data. But it is just backup diagnostic, not a step to fix your issue.
Did you try restoring files from backup without local DB?
That is indeed a pretty high number and I would certainly think if that were a typical user experience Duplicati wouldn’t still be around so I have to wonder if it’s something in your environment or (as you suggested) parameter usage.
What OS and destination are you using and have you tried the running Duplicati.CommandLine.BackendTester.exe
tool (Other Command Line Utilities - Duplicati 2 User's Manual) to see if there might be some issues with your backend? Ummm…that might sound ‘wrong’ when you read it, so just to clarify there’s nothing personal implied there.
Of course all that being said, when issues like yours DO happen (no matter how infrequently) they should be as easy as possible to safely resolve so there’s definitely some room for improvement there.
After thinking about the problems for a while. I think there are a few situations where recovery could happen better.
-
Situation where there are missing files -> Fix references and deal with it (if it’s B file, yes, there’s data loss) Which could lead future versions to be unrecoverable. This also means that references should be updated so that next backup fills in the missing data, if it’s missing from “latest version”.
-
Situations when there are extra files -> Get rid of those, but only after checking if those contain valuable and usable information. - One example would be that there has been compact running but it has been aborted due to client or the remote end “dropping dead” during the process.
-
Corrupted files / not matching local db -> See what the content is. If it’s usable, use it and update local db. If it’s un-usable delete it, and mark the set for further repair.
-
Situations where database is out of sync for some reason or the remote storage is out of sync for some r
reason. -> Recover as efficiently as possible. Deal with the steps I mentioned before.
Now it seems that in some situations the SQL database test and repair says there’s nothing wrong. But restore still fails and claims there are files missing. How is this possible?
Having single command or automatic recovery would help a lot. And save huge amounts of time. What if you’ve got let’s say 1 to 1k backup sets, and you’ll have to run multiple commands on each of those manually to fix the situation. It’s total disaster that systems break down and require manual intervention. It’s absolute no no, from administration point. And easily costs several thousands of € / $ / £ etc.
Now I haven’t seen any errors for a few days. Maybe I’ll run the restore tests again tomorrow. And let’s see what the restore percentage is.
Thanks for the thoughts! Off the top of my head, here are some ideas I had as well:
-
missing files: a database recreate should essentially do what you describe, namely rebuild the database from the actual remote files. This will cause any blocks now missing from existing source files to be re-uploaded.
-
extra files: if you mean extra files on the destination, we have to be VERY sure they’re Duplicati files before we deleted them (so much so that we err on the safe side and just report them to let the user decide whether they need to be deleted, moved, etc.). If you mean extra local files then that should already be covered in most cases (though I know there was a recent bug - now fixed - causing extra usage log files to sit around).
Note that as far as I know a database compact doesn’t touch the destination at all, so there shouldn’t be any extra files created there.
-
corrupted files: this is essentially what the purge command is for, but because it’s actually removing backup contents we currently require it to be run manually. I could see a semi-auto approach such as the following, but I doubt a fully automated one is in the cards:
a. corrupt files found
b. --list-broken-files
run automatically
c. message (alert, email, etc.) sent about where to go to find list of broken files & how to MANUALLY run --purge-broken-files
-
out of sync scenarios: part of the potential complexity of out of sync situations is knowing which part is still good
As you’ve probably noticed, Duplicati is still under active development - in both functionality, stability, and usability. Hopefully we’ll be able to improve the 2nd item enough that the 3rd won’t be as important.
2 ) Compact, I mean Duplicati remote storage compact, when it contains excess “stale” datablocks. Local database compact is called vacuum with SQLite. What comes to extra remote files, filename prefix should be good enough indication. Of course it’s possible that users accidentally mess up paths or stuff. For that it could be an option which allows “remote cleanup” of any data which isn’t related to the current “backup set”.
3 ) Sure, corrupted files should be (hopefully!) very rare occurrence. I added on our side command “fix” which will run purge-broken-files and repair and auto_fix configuration option which will run those in case normal backup reports issues 50 / 100 return codes, etc.
4 ) Out of sync, is basically any combination of any other problems. Extra files, missing files, and probably wrong version of files and so on. In this case using “remote blocks” as much as possible and repairing rest would be the most efficient method. Yet as programmer my-self. I can see if such logic isn’t implemented, because it’s tedious and isn’t hopefully required often. Just deleting stuff which isn’t up-to-date is one way to deal with this. Not optimal, but works “well enough” and that’s what Duplicati seems to be currently doing in such situation.
Btw. You said --list-broken-files / --purge-broken files. Yet those aren’t options, those are commands.
Thanks for the corrections - I made the same “compact vs. vacuum” goof in another post recently as well. And the majority of stuff I post about involves parameters so it’s kind of a habit to include the leading double dashes ad call it a parameter.
Sometimes the brain just doesn’t want work right no matter what I try.
Technical question, are the i-files totally “redundant”. I mean, can the I file content fully rebuilt from b-files alone? Of not, then I’m again very worried. That corruption happens way too often. And I’m not talking about the 2.0.3.6 version which basically destroyed every backup.
Even so, if the i-files are redundant deleting and repairing manually is still a huge pain. Just today, I deleted again three corrupt i-files. And if those wouldbe necessary for successful restore. - Deep sigh. Then the backup sets are … again …
Just to summarize for those that may no know:
-
*.dblock.*
files contain the actual backed up data (if this gets deleted, you lose backup data)
-
*.dindex.*
files contains metadata ABOUT the backed up files (meaning they can be used to recreate the local DB without having to download the ACTUAL data in the dblock files, if this gets deleted it can be rebuilt from the local database)
-
*.dlist.*
files basically hold a simple JSON file with a list of files included in that run (mostly just used to populate the “Restore” file listings, if this gets deleted it can be rebuild from the local database)
So technically, yes - you can fully rebuild the database from dblock
files alone, but you’re essentially downloading your whole backup.
But you can also fully rebuild the database from the dindex
files alone.
When Duplicate does a database rebuild, it will use as many of the dindex
files as possible. If any are missing or corrupted, it will then start downloading the bigger dblock
files to figure out what was in the missing dindex
files.
1 Like
Suppose I use a S3-compatible storage solution, am I correct in saying that only the following origins of archive corruption (assuming the source is OK) are in play?
- Duplicati code bugs
- S3-compatible storage solution bugs
- Hardware failure at S3-end
- Transport errors
About the latter, I was wondering if 1 or 2 may have some sort of robustness against 3 or 4.
@JonMikelV Thanks for that detailed summary.
But are you sure, that dlist files are not needed for repair? I may be mistaken, but I think i was in situation when I was not able to repair/recreate ddatabase if dlist files was missing.
Yep. I did a failure test a while ago deleting various file types and the only thing that blocked the test file recovery was missing / bad dblock files.
That being said, I believe there is a scenario where if you try to restore when there is a dlist file missing and do not explicitly tell Duplicati to NOT check backend file status then it gets stuck complaining about the missing dlist rather than going ahead and restoring via the dblock file.
Assuming I’m remembering that correctly, I don’t recall if an official bug ticket was ever put in for that scenario or not.
Pretty much. There is of course the usual “never underestimate the power of human stupidity” mantra that covers things like “I deleted everything from my destination and I now I can’t restore my backups”.
Yes - Duplicati does what it can to recover from issues 2, 3, & 4. Unfortunately, the robustness of that code hasn’t been fleshed out as much as some would like so in many cases Duplicati will identify the issue but not actually do anything about it automatically, leaving it up to the user to have to manually resolve things.
I feel this is part of where we run into issues - when everything works, it works great but when something goes wrong (even because of transport or destination issues) it can be a bit of a hassle to return resolve the issues.
Some people see this as a failure of Duplicati (again, even if the issue is due to transport or destination) when really it’s Duplicati protecting the integrity of a backup as best it can. Luckily, that leaves plenty of opportunities for developers to make improvements.
1 Like
I’ve already automated that. Yet, I would prefer to have just one or a few commands to run. Currently I do run purge-broken-sets and repair. But I think that’s not enough in all situations. Having single command to “fix situation”, would be preferable. And yes, I of course know this might mean losing some (history) backup data, yet it’s already lost at that stage anyway if the destination is corrupted / lacking some key files. But it would still mean that future backups are taken and solid. - Having to do manual intervention and running multiple commands and wondering if things are now ok or not, are quite annoying.
I don’t disagree. “Unfortunately”, at the moment development is being focused on performance and avoiding the issues in the first place rather than improving the experience when issues do arise. Hopefully, as developer resources become available (such as through finishing current tasks or more people helping out) these sorts of user experience features can be improved.
I know it may not feel like much, but putting a ticket on Duplicati’s Github page and adding a bounty to it (even a small one) might draw developers to work on the feature.
I saw “Become a backer” or “Become a Sponsor” on that page. But not ticket or bounty.
I’m sorry, I mis-remembered the process for adding a bounty. It’s described here, but to summarize:
- Make sure the item has been added as an issue on GitHub
- Go to BountySource, find the issue (either on the list or through search), and pledge the bounty amount you want