Backups suddenly failing - 4 machines now

Thanks everyone, a lot has been discussed and I’m not sure I can answer all of it.

I can say that they did not all fail at the same time per-se, just the same day as they are all scheduled at staggered times over night.

All the machines affected except one have a secondary Duplicati backup to a local server, so Wasabi is just for long-term and real disaster recovery. If you think it might be better to start new ones for them then so be it, it’s not like it’s happened before i.e. moving from CrashPlan I made the same decision to simply lose the backups.

Also, I’m still struggling with the syntax for the various command-line tools and what to put, for a Wasabi bucket, in place of this <protocol>://<username>:<password>@<path>

Hi Folks - thanks for the details about and sorry you are seeing some problems. We will try to investigate based on the above info but if the affected Duplicati users would not mind sending in ticket request to support at wasabi.com, we can get back to you with a personalized reply.

Thanks,
Jim (part of the Wasabi team)

So my prior note didn’t help? I tested it on Google Drive and Backblaze B2. I tried options afterwards without luck, so I think it all has to be on the URL, and the Export → As Command-line seemed to be a way to get it. Haven’t actually looked at source code though, but I had assumed it took the URL apart in the standard way.

Ah ok, so I got that now. I tried the BackendTester and it wanted to delete everything, and I then ran just the Test - it asked me for my passphrase and nothing seems to be happening. I can enter more characters and returns, and it just sits there. This is on my Fedora box btw.

I ended up deleting the three dlist files that were reporting a file size mismatch. Then I deleted and recreated the database. I was able to run a successful backup after that. I think Wasabi was having some kind of issue because I was having problems deleting objects. Seems to be okay for now at least for me.

I don’t know what to say about that. I just tested to a b2:// URL from Windows and Linux, and mine worked, however it said *** Remote folder is not empty, aborting when I tried to reuse a bucket and folder, and could not create a new bucket but could create a new folder (I just changed the name from an old URL). Got a few upload failures both cases (regular Duplicati would probably just have retried those up to 5 times).

That’s what I would have suggested (maybe after retrying a plain recreate again). You need that database.

Thinking about an earlier statement I made, I think that test of SharpAESCrypt with do and WinRAR was an appropriate proof of a corrupt file, because you decrypted (as much as could be done) before trying unzip. You might want to at least run one --log-file log to see if you get any Warnings about bad lengths, like I did. Running with a higher –backup-test-samples than the rather low default of 1 for awhile might also be good.

One thing I forgot to ask is whether anybody was on a recent Duplicati version. There was an update of the AWSSDK DLL in 2.0.3.24 (I think), and I’m hoping it still works fine with all of the S3-compatible providers…

So do you think I should simply delete the corrupt file as well or can it be recovered somehow and re-uploaded?

Sorry, I mixed up my users. I had seen you had already done this series ultimately winding up with a recreate:

which means it’s probably not too extremely slow (like it is for some people). Did you save the old database? There’s enough info in it that (I think) repair can rebuild a missing remote dlist, but if database is gone and remote dlist is gone, you probably can’t get the dlist back unless somehow it’s now working right on backend. How the backup process works gets into details, but the dlist is basically a file listing plus references to data, which can be of varying age (old or new). If everything comes back together by deleting 20181122T130004Z then you lose the view of what all your files were like then. Did you survey enough to feel other files are OK?

Before deleting the file on the backend, please make sure it still looks bad. Did you ever ask Wasabi support for a personalized reply, as @wasabi-jim suggested? Maybe they can think of a way to get the file back, or possibly would want to study other files for similar problems. I’m not sure how much direct visibility they have even to file lengths to look for other file sizes suspiciously even in length in hex. Or you could offer them info.
–upload-verification-file holds file names, lengths and SHA-256 hashes for your files, but I’m not sure it runs when the backup is currently broken. If it won’t, it might still be a useful thing to have for any future problems.

If it turns out you have no old database, the problem still exists, Wasabi can’t help or explain, and you’ve got some confidence the damage is limited, then save off a just-in-case copy of the dlist, delete it, then recreate,
unless someone reading this has a better idea. I don’t know if there’s a way to preserve all of the backend in case this takes a few tries (similar idea to saving copies of the database, but takes space and provider tools) but you can, if you like, avoid actually changing things (until you see the plan) by adding a –dry-run to repair which is sometimes a little surprising in its plan, but with no local database it should just try to recreate that…

For anyone who hasn’t noticed, there’s another weird issue on Wasabi at Backup won’t complete with 2.0.4.4 resolved (perhaps dangerously) by seemingly using unencrypted HTTP. The actual cause of that is still more unclear, especially until additional reports come in pointing to some common factor to get suspicious about…

This is what Wasabi told me:

Thanks for the note and reaching out to us.   We are sorry for the problem.     We are still investigating the root cause on our end but can you tell us whether or not it would be possible for you to re-upload the objects in question?   

From our observations in our logs, the 2 objects in question are:

[duplicati-20181122T130004Z.dlist.zip](http://duplicati-20181122t130004z.dlist.zip/).aes in bucket 59489 dp-maggie

and

[duplicati-20181122T070000Z.dlist.zip](http://duplicati-20181122t070000z.dlist.zip/).aes in bucket 54866 dp-marge

Thanks,
Jim

And when I asked where I can get those files from?

 you are correct in terms Duplicati generating those files and storing them in Wasabi. Let me reach out to the Duplicati folks and see what they recommend. Sorry for the inconvenience

Jim

To me that points at an issue at their end as I wasn’t the only one it seems.

I do have some local backups from each night though I’d need to look if they include the profile folder holding the database at the time - I use a service so it’s stored in the Windows folder somewhere. If I got the database files back, how could we rebuild the file from it?

Let me see if I can get @JonMikelV in on this due to past experience (and tests) with rebuilding dlist (and I think dindex) files on the backend, however going this route would make me even more nervous about what might happen (see my point on copying off the destination and using --dry-run). There have been cases of restoring an old copy of the database from an image backup, doing repair, and having the result be making the destination look like the database (basically deleting new versions), whereas one would like the reverse.

Feel free to start seeing if you have a suitably recent local backup of the database for remote. Maybe it can also help in some other way. Meanwhile, maybe I can find some of these other forum posts that I’m recalling.

For the Fedora server I don’t have any other backups, something I will look at for the future, but for the Windows server I do have backups of the databases, it’s the first one:

I don’t think I did anything to the database until the 23rd so that one should be as it was after that particular backup. Can it be used in someway to just rebuild that one file without putting it back onto the live system?

Going back to the Fedora backup I think I’ll simply start again with it therefore it should be good going forward. What’s the best way to “reset it” as I’d like to keep all the settings/config as they are now - just being lazy.

Repair doesnt work was the test to see if missing dlist and dindex files could be regenerated from database.

Automatic backups stopped AND rolled back when restoring drive image of Ubuntu-partition was the test of setting a database to an old copy, then trying repair. In addition to deleting newer backups, it had various other issues that I’d forgotten and didn’t look at closely at that time, so I’m liking this idea less than before…

You can also look at the post just above mine for an example of a full-scale disaster due to database mixup.

This would be safer, but I can’t think of a simple way. That’s a pretty unusual (but perhaps useful) capability. There are awkward solutions such as duplicating the backup (perhaps even to local disk, using rclone), and then setting up a clone of the original backup on another system (don’t mix types, e.g. Windows to Linux) to get the dlist back, to move to the live system, but that seems pretty extreme, given your redundant backups.

So reading the “Repair doesn’t work” thread, I should probably just try renaming the affected file as it should get rebuilt when I run the repair, at least try this on the Fedora backup before I reset it, and it might do the trick? Unless I misunderstood it.

It wasn’t my test, but reading through the rest of the article, another person had issues which might have been unrelated (feel free to form your own opinion). Renaming to anything starting with the usual prefix (duplicati by default) could cause Duplicati to consider it its own unknown file, and delete it. Safest might be to download and delete, however a rename to a different –prefix seems like it might get it far enough away. I haven’t tested.

Note that nothing will get rebuilt unless you have a database current enough to know it, and don’t forget all the other problems I mentioned. It might do the trick, but there are plenty of bumps in those articles you might hit…

EDIT: However if the Fedora backup still has the original database and maybe just a bad dlist, success seems more likely. Going completely back to old backup of the database is the case that makes me more nervous…

Ok. I just realised that the Fedora server is no longer showing the error about file being incorrect, I suppose because of the rebuild. I think for this one I’ll simply start again.

I’ll give downloading then deleting the file on the Windows machine a go and see what happens. I just need to know what to perform after I do the repair to check everything is ok - even if it takes a few hours. Otherwise that too will simply reset to start again.

That’s been covered some already, primarily for damage assessment, but it can also be used to test repair.
There’s recently even a more authoritative confirmation that you have a couple of ways to test file integrity:

Question about TEST command vs. Backup with –full-remote-verification and backup-test-samples

So basically either the test command or a backup with high --backup-test-samples should see if things are present as Duplicati thinks they should be. Maybe a repair with --dry-run to make sure it picks nothing up.

Ultimately, self-tests are no substitute for an actual test restore from the backup, to make sure it seems OK.

As an unrelated note, make sure you don’t have any full disks, in case that might have truncated some file, however with multiple reports from different people, I suspect something went wrong at Wasabi at that time.

Running a log at at least Information level might have been handy to see if the uploads showed any issues. This is what Backblaze B2 gave me yesterday on a backend test. Maybe B2, maybe Internet. Hard to know.

Thank you so much for all the help and insights - much appreciated.

I’ve “fixed” the Windows backup - I deleted the file being reported, ran repair, the file came back, ran a backup, it reported a similar file for the next day, did the same for that and a further backup was ok. Will now run a test as you mention and see if it can find.

For the Fedora server I reset it, I was a bit confused because the delete backup job option says it will delete the content of the backup, but it didn’t. It even left the old database behind. Maybe I didn’t do it right, but I am clearing the files out myself before I run the initial backup.

Next I suppose is the other machine with the “channel is retired” message - that’s my workstation at home so it will have to wait for further checks.

Damn, I really don’t do myself any favours - I forgot I had scheduled a reboot of my Fedora server at 16:30 and it did it right in the middle of the new backup - d’oh!!! :roll_eyes:

Wow - that’s quite the set of issues, sorry I didn’t make it over her sooner!

But it sounds like you’ve got it under control - and everything I read about restoring old databases and the like looked correct to me.

I don’t think I tested this yet, so I’m going to try it now - but I’m curious if Duplicati can “recover” from having ONLY dblock files (no dindex / dlist / database). :scream:

As for the “channel is retired” issue we may need to review more detailed logging of what happened just before the error.

I didn’t know you were going to delete backup jobs. I hope you exported a copy, to import back in.

There’s an unchecked checkbox in the “Delete remote files” section. Possibly you didn’t check it?
Duplicati’s probably trying to make it hard to accidentally delete your actual destination backup…

The database being left behind is a known bug that wastes space and leaves confusing leftovers.

Good luck with cleanups. I guess you’ll be testing how well the interrupted-backup code works. :wink: