Backups suddenly failing - 4 machines now

ts678 · November 26, 2018, 12:27pm

Sorry, I mixed up my users. I had seen you had already done this series ultimately winding up with a recreate:

which means it’s probably not too extremely slow (like it is for some people). Did you save the old database? There’s enough info in it that (I think) repair can rebuild a missing remote dlist, but if database is gone and remote dlist is gone, you probably can’t get the dlist back unless somehow it’s now working right on backend. How the backup process works gets into details, but the dlist is basically a file listing plus references to data, which can be of varying age (old or new). If everything comes back together by deleting 20181122T130004Z then you lose the view of what all your files were like then. Did you survey enough to feel other files are OK?

Before deleting the file on the backend, please make sure it still looks bad. Did you ever ask Wasabi support for a personalized reply, as @wasabi-jim suggested? Maybe they can think of a way to get the file back, or possibly would want to study other files for similar problems. I’m not sure how much direct visibility they have even to file lengths to look for other file sizes suspiciously even in length in hex. Or you could offer them info.
–upload-verification-file holds file names, lengths and SHA-256 hashes for your files, but I’m not sure it runs when the backup is currently broken. If it won’t, it might still be a useful thing to have for any future problems.

If it turns out you have no old database, the problem still exists, Wasabi can’t help or explain, and you’ve got some confidence the damage is limited, then save off a just-in-case copy of the dlist, delete it, then recreate,
unless someone reading this has a better idea. I don’t know if there’s a way to preserve all of the backend in case this takes a few tries (similar idea to saving copies of the database, but takes space and provider tools) but you can, if you like, avoid actually changing things (until you see the plan) by adding a –dry-run to repair which is sometimes a little surprising in its plan, but with no local database it should just try to recreate that…

For anyone who hasn’t noticed, there’s another weird issue on Wasabi at Backup won’t complete with 2.0.4.4 resolved (perhaps dangerously) by seemingly using unencrypted HTTP. The actual cause of that is still more unclear, especially until additional reports come in pointing to some common factor to get suspicious about…

Taomyn · November 26, 2018, 12:46pm

This is what Wasabi told me:

Thanks for the note and reaching out to us.   We are sorry for the problem.     We are still investigating the root cause on our end but can you tell us whether or not it would be possible for you to re-upload the objects in question?   

From our observations in our logs, the 2 objects in question are:

[duplicati-20181122T130004Z.dlist.zip](http://duplicati-20181122t130004z.dlist.zip/).aes in bucket 59489 dp-maggie

and

[duplicati-20181122T070000Z.dlist.zip](http://duplicati-20181122t070000z.dlist.zip/).aes in bucket 54866 dp-marge

Thanks,
Jim

And when I asked where I can get those files from?

 you are correct in terms Duplicati generating those files and storing them in Wasabi. Let me reach out to the Duplicati folks and see what they recommend. Sorry for the inconvenience

Jim

To me that points at an issue at their end as I wasn’t the only one it seems.

I do have some local backups from each night though I’d need to look if they include the profile folder holding the database at the time - I use a service so it’s stored in the Windows folder somewhere. If I got the database files back, how could we rebuild the file from it?

ts678 · November 26, 2018, 12:56pm

Let me see if I can get @JonMikelV in on this due to past experience (and tests) with rebuilding dlist (and I think dindex) files on the backend, however going this route would make me even more nervous about what might happen (see my point on copying off the destination and using --dry-run). There have been cases of restoring an old copy of the database from an image backup, doing repair, and having the result be making the destination look like the database (basically deleting new versions), whereas one would like the reverse.

Feel free to start seeing if you have a suitably recent local backup of the database for remote. Maybe it can also help in some other way. Meanwhile, maybe I can find some of these other forum posts that I’m recalling.

Taomyn · November 26, 2018, 1:13pm

For the Fedora server I don’t have any other backups, something I will look at for the future, but for the Windows server I do have backups of the databases, it’s the first one:

I don’t think I did anything to the database until the 23rd so that one should be as it was after that particular backup. Can it be used in someway to just rebuild that one file without putting it back onto the live system?

Going back to the Fedora backup I think I’ll simply start again with it therefore it should be good going forward. What’s the best way to “reset it” as I’d like to keep all the settings/config as they are now - just being lazy.

ts678 · November 26, 2018, 1:20pm

Repair doesnt work was the test to see if missing dlist and dindex files could be regenerated from database.

Automatic backups stopped AND rolled back when restoring drive image of Ubuntu-partition was the test of setting a database to an old copy, then trying repair. In addition to deleting newer backups, it had various other issues that I’d forgotten and didn’t look at closely at that time, so I’m liking this idea less than before…

You can also look at the post just above mine for an example of a full-scale disaster due to database mixup.

This would be safer, but I can’t think of a simple way. That’s a pretty unusual (but perhaps useful) capability. There are awkward solutions such as duplicating the backup (perhaps even to local disk, using rclone), and then setting up a clone of the original backup on another system (don’t mix types, e.g. Windows to Linux) to get the dlist back, to move to the live system, but that seems pretty extreme, given your redundant backups.

Taomyn · November 26, 2018, 1:25pm

So reading the “Repair doesn’t work” thread, I should probably just try renaming the affected file as it should get rebuilt when I run the repair, at least try this on the Fedora backup before I reset it, and it might do the trick? Unless I misunderstood it.

ts678 · November 26, 2018, 1:39pm

It wasn’t my test, but reading through the rest of the article, another person had issues which might have been unrelated (feel free to form your own opinion). Renaming to anything starting with the usual prefix (duplicati by default) could cause Duplicati to consider it its own unknown file, and delete it. Safest might be to download and delete, however a rename to a different –prefix seems like it might get it far enough away. I haven’t tested.

Note that nothing will get rebuilt unless you have a database current enough to know it, and don’t forget all the other problems I mentioned. It might do the trick, but there are plenty of bumps in those articles you might hit…

EDIT: However if the Fedora backup still has the original database and maybe just a bad dlist, success seems more likely. Going completely back to old backup of the database is the case that makes me more nervous…

Taomyn · November 26, 2018, 1:44pm

Ok. I just realised that the Fedora server is no longer showing the error about file being incorrect, I suppose because of the rebuild. I think for this one I’ll simply start again.

I’ll give downloading then deleting the file on the Windows machine a go and see what happens. I just need to know what to perform after I do the repair to check everything is ok - even if it takes a few hours. Otherwise that too will simply reset to start again.

ts678 · November 26, 2018, 2:29pm

That’s been covered some already, primarily for damage assessment, but it can also be used to test repair.
There’s recently even a more authoritative confirmation that you have a couple of ways to test file integrity:

Question about TEST command vs. Backup with –full-remote-verification and backup-test-samples

So basically either the test command or a backup with high --backup-test-samples should see if things are present as Duplicati thinks they should be. Maybe a repair with --dry-run to make sure it picks nothing up.

Ultimately, self-tests are no substitute for an actual test restore from the backup, to make sure it seems OK.

As an unrelated note, make sure you don’t have any full disks, in case that might have truncated some file, however with multiple reports from different people, I suspect something went wrong at Wasabi at that time.

Running a log at at least Information level might have been handy to see if the uploads showed any issues. This is what Backblaze B2 gave me yesterday on a backend test. Maybe B2, maybe Internet. Hard to know.

Taomyn · November 26, 2018, 2:48pm

Thank you so much for all the help and insights - much appreciated.

I’ve “fixed” the Windows backup - I deleted the file being reported, ran repair, the file came back, ran a backup, it reported a similar file for the next day, did the same for that and a further backup was ok. Will now run a test as you mention and see if it can find.

For the Fedora server I reset it, I was a bit confused because the delete backup job option says it will delete the content of the backup, but it didn’t. It even left the old database behind. Maybe I didn’t do it right, but I am clearing the files out myself before I run the initial backup.

Next I suppose is the other machine with the “channel is retired” message - that’s my workstation at home so it will have to wait for further checks.

Taomyn · November 26, 2018, 3:32pm

Damn, I really don’t do myself any favours - I forgot I had scheduled a reboot of my Fedora server at 16:30 and it did it right in the middle of the new backup - d’oh!!!

JonMikelV · November 26, 2018, 9:01pm

Wow - that’s quite the set of issues, sorry I didn’t make it over her sooner!

But it sounds like you’ve got it under control - and everything I read about restoring old databases and the like looked correct to me.

I don’t think I tested this yet, so I’m going to try it now - but I’m curious if Duplicati can “recover” from having ONLY dblock files (no dindex / dlist / database).

As for the “channel is retired” issue we may need to review more detailed logging of what happened just before the error.

ts678 · November 26, 2018, 9:32pm

I didn’t know you were going to delete backup jobs. I hope you exported a copy, to import back in.

There’s an unchecked checkbox in the “Delete remote files” section. Possibly you didn’t check it?
Duplicati’s probably trying to make it hard to accidentally delete your actual destination backup…

The database being left behind is a known bug that wastes space and leaves confusing leftovers.

Good luck with cleanups. I guess you’ll be testing how well the interrupted-backup code works.

Taomyn · November 27, 2018, 7:21am

Yes I exported the job first - I remembered doing that a few times when I first started using Duplicati as it was an easy way to use the same job across some different machines.

I don’t recall if I checked the box though I thought I did - let’s hope I don’t have to use it again anyway.

So far the Windows backups have been ok, although my workstation one did pop up a warning about finding duplicate folders again - I did a quick repair last night and will see what happens later today. The Fedora server’s new backup has just started, I reset it again rather than risk it doing a restart in light of the other problems, so that should be nice and fresh in a few hours.

Taomyn · November 30, 2018, 12:21pm

Just to conclude this thread, all is well again and all backups working as expected - well except where I tried to test 2.0.4.5_beta which seems to have a few things missing so I went back to 2.0.4.4_canary

Thank-you all for the help.