Clearing out remnants of a partially-uploaded, extremely large file?


#1

Yesterday I accidentally included a ~200GB file (an uncompressed movie) in my B2 backup set, and didn’t notice until today. It had uploaded maybe 60 - 80% of it by the time I caught it. I had to do a force stop, which worked (though Duplicati complained about locked DB until I restarted it). However my B2 bucket still has the orphaned files in it - and B2 doesn’t exactly let you sort your bucket by most recent or anything, so I have really no way of hunting down the extra pieces even if I did think it would work.

I’ve run a “verify files” followed by re-running the backup job since excluding the unwanted file and it finished successfully (and seemed to delete a few things), but my B2 bucket is still clocking in at about 160GB larger than it should be. What extra stuff can I do to get Duplicati to find and delete the unnecessary dblock files?
image


#2

Currently running “compact now” (manually, since i have the no auto compact advanced option enabled in my backup job)… it initially sped through and cleared out over 100GB of unneeded files, but that whole time and even now, for at least 2 hours now, the status bar has been sitting on “Starting backup …”, and it is now no longer doing anything even according to the profiling logs. Any hints here would be helpful - I’m hesitant to force interrupt it and cause further issues.


#3

Compact Now seems to have done the trick with respect to the file remnants - I was hesitant to run it because for this particular backup job I was several hundred gigs into my initial backup before changing the Volume size from 50MB to 250MB, so I was apprehensive that it would spike my download bandwidth for B2 before dealing with the completely extra pieces. Lukily it seems to have handled that part first.

I allowed the operation to run overnight and upon checking back in this morning I found it had eventually continued on to grabbing old small filesets and replacing them on the destination with the large version. I let it go until a few minutes ago and went ahead and soft-quit the operation (using the “stop after current upload” option). Now of course it seems that several gigabytes worth of local temp files have been left behind. All of these would have been from the Compact Now operation as it was the only thing running during that timespan.


(All but a small handfull of items listed here are from the last day; some might be the original backup job that got interrupted, and most are presumably from the Compact operation. As far as I can tell, none of these should have been left behind.)


#4

Sigh… so after I cancelled the Compact Now operation, last night’s scheduled backup run ran automatically and completed successfully. But then (after running a different backup set, and being forced to force cancel and restart Duplicati to get it to unlock the database), now when I run the original B2 backup job again, I’m now getting the dreaded “Unexpected difference in fileset” error:

Duplicati.Library.Interface.UserInformationException: Unexpected difference in fileset version 11: 2018-09-21 2:05:00 (database id: 397), found 163435 entries, but expected 163440

(I’m on the 2.0.4.12 canary on this machine - the auto-updater doesn’t seem to be working at the moment).

Edit: urghh… I’ve tried running Repair on the database, only to be told:

Destination and database are synchronized, not making any changes

Meanwhile, running the job fails instantly with the same error message. I don’t want to have to do a Recreate on this database, as the total backup set is over 800GB. Any suggestions? :cold_sweat:

Edit 2: following the directions from an old thread about this error, I ran the “delete” command line tool to delete Version 11. Now when I try to run backup, I get the same error but with different mismatch numbers and pointing to Version 10. I’m running delete on that one too, now. How many versions will I need to delete??

Edit 3: Deleted version 10, now it’s reporting an error with version 9 :sob: I give up for now.

Edit 4: I decided to go for broke and run a recreate. I had reasonable luck with this in my smaller backup job when I got this error previously. crossing fingers

Edit 5: Ugh… the recreate is still running (over 7 hours later) and seemingly downloading every dblock file from B2… this is gonna cost me $10 in extra bandwidth fees :-/

Edit 6 (and final): The recreate finished overnight; luckily it only needed to download 17 or so more gigs, so nowhere near the whole 800 GB fileset. It had me worried there for a bit.


Is there any way to fix an "unexpected difference in fileset" error other than deleting?
#5

To cap it all off, I just noticed that the very next automated backup got the “unexpected difference in fileset” error and no backups have been run since then. (I’ve been away from the PC for a few days.)

Repair does nothing. ListBrokenFiles does nothing. I don’t want to undergo the bandwidth expense of doing the Recreate function again until I have some sort of reassurance that this error isn’t just going to keep popping up every other day. Ugh.

I could use some feedback right about now.


#6

I see in your post in Is there any way to fix an “unexpected difference in fileset” error other than deleting? that the error was on version 10 from 2018-11-28. I was sort of hoping that the error would have been in version 0 because in that other topic I was speculating that maybe compact can cause this sort of multi-version mess. Until somebody can look over their logs or supply a database bug report, I still don’t know. Do you have either? If you saved your database before that recreate, it has your old logs for that backup, and possibly might even be coaxable into doing a database bug report, if reinstalled just for that purpose.

If on the other hand you have logs from your post-recreate effort, they might give some clues about what happened. One possibility is that broken filesets can be built directly from recreate (given issues in files). That theory could be tested by doing a limited recreate such as direct restore would do, but let’s hold off while strategizing. What would you like to get back of your old backup, and how fast do you want a new?


#7

In fact it seems to me that it’s in the oldest version, regardless of which version the oldest one is, even just after i’ve deleted the formerly oldest version. I’m currently attempting a delete on version 10 just to see what happens.

Where do hardcopy logs save to? I’ve never really seen any in my duplicati directories.

I did save a backup copy of the old database before my Recreate last week - what would I do with that?

FWIW, this all seems to have stemmed from interrupting a backup job in the middle of a partially-completed upload of 1 huge (200GB) file, after which I ran Compact Now to attempt to clean up orphaned pieces (which worked to some extent apparently). Something mismatched gets left behind somewhere in that process. I apologize that I muddled it so far that it’s no longer possible to remember easily where exactly what happened. But it seems like it might be easy for devs to recreate these conditions - just interrupt a several-GB file in its upload to a storage host and watch the hilarity ensue.

I would also note that when Compact Now is cancelled, it leaves tons of temp files behind in the temp directory, just sitting there forever.

I’m not particularly panicked about my current backup job - what I’m panicked about is the feeling that something so seemingly simple can throw a monkeywrench into the whole works, and that “repair” is so seemingly useless at detecting the issues that the backup operation just failed for. But as far as my data is concerned, I have time. I’m currently planning to migrate a few of my more irreplacable things (picture collection etc) to a new B2 bucket in a secondary Duplicati backup job, and maybe gradually remove everything from the “big” backup set other than the truly bulky stuff, assuming I can eventually repair it by whatever means.


#8

They have to be set up manually with –log-file. Adjust wordiness with –log-file-log-level. My own practice is to keep one going with as much detail as it can do because if something breaks it’s helpful both personally and to maybe get code changed for me and everybody. That’s more than I’d ask of the typical user though. Some lower level like Information or Retry might be worth keeping up (while not making too huge a log file).

Agreed on that, however there are smaller test cases in line for problems that are impacting lots of people. Maybe support could set this up, and at least look into reasonable methods to recover as well as possible. Not saying there are enough support volunteers either, but developers up to debugging this may be rare…

@warwickmm has been adding some code fixes for Duplicati temp dup-xxxx files not being deleted, but I don’t know if they cover your case. Some of the prior leftovers didn’t involve a cancel, but just happened…

A rewrite of repair/recreate began in December, I believe with focus on making repair succeed more often. Time will tell how successful it is, and I have no details on its progress or its ambitions. There will probably always be non-repairable situations, but there’s clearly a lot of room for improvement on what it now does.

Duplicati’s rather complex algorithms are more than humans can manually perform or clean up when they run into problems, so I share your concerns about the available tools sometimes not being sufficient either. This is why I advise people to treat Duplicati as what it is – beta software which frequently does a great job but occasionally breaks and is hard to fix. Sometimes restarting a new backup is the way out if acceptable. My own practice is to not put Duplicati into situations where a breakage would be a tremendous loss to me. I also take image backups around Windows’ twice-a-year updates, adding additional protection for old files.

Do you by any chance have another Windows machine that’s not running Duplicati that could work on the failure situation (possibly including the old saved database) while you get your main machine going again? There are also ways to get multiple Duplicati databases on one machine, e.g. by using it in different users.


#9

All my PCs are currently running Duplicati to some extent. I might turn off automatic backups on the affected job though and redo the backup in multiple/smaller chunks before making further attempts to repair it.

FWIW i ran a delete on Version 10, which took about 18 hours (and used around 80GB of download bandwidth from B2), and now I get:
image
It seems almost as if the original issue infected all versions or something. I have no idea why i have to delete version 10 for it to complain about version 9, though, unless the deleting of versions is causing mismatches in the newer versions…