Backup failing due to unexpected difference in fileset

adhawkins · October 5, 2018, 3:54pm

Hi,

One of my backups has started failing every time it runs. It’s one I had to recreate due to being unable to rebuild the database (it took well over a month so I gave up).

I’m running Duplicati - 2.0.3.11_canary_2018-09-05.

Failed: Unexpected difference in fileset 50, found 4013 entries, but expected 4014 Details: System.Exception: Unexpected difference in fileset 50, found 4013 entries, but expected 4014 at Duplicati.Library.Main.Database.LocalDatabase.VerifyConsistency (System.Int64 blocksize, System.Int64 hashsize, System.Boolean verifyfilelists, System.Data.IDbTransaction transaction) [0x002bc] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Operation.Backup.BackupDatabase+<>c__DisplayClass32_0.<VerifyConsistencyAsync>b__0 () [0x00000] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Operation.Common.SingleRunner+<>c__DisplayClass3_0.<RunOnMain>b__0 () [0x00000] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Operation.Common.SingleRunner+<DoRunOnMain>d__21[T].MoveNext () [0x000b0] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 --- End of stack trace from previous location where exception was thrown --- at Duplicati.Library.Main.Operation.BackupHandler+<RunAsync>d__19.MoveNext () [0x01037] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 --- End of stack trace from previous location where exception was thrown --- at CoCoL.ChannelExtensions.WaitForTaskOrThrow (System.Threading.Tasks.Task task) [0x00050] in <6973ce2780de4b28aaa2c5ffc59993b1>:0 at Duplicati.Library.Main.Operation.BackupHandler.Run (System.String[] sources, Duplicati.Library.Utility.IFilter filter) [0x00008] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Controller+<>c__DisplayClass13_0.<Backup>b__0 (Duplicati.Library.Main.BackupResults result) [0x00035] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action1[T] method) [0x0011d] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 Log data: 2018-10-05 14:00:38 +01 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error System.Exception: Unexpected difference in fileset 50, found 4013 entries, but expected 4014 at Duplicati.Library.Main.Database.LocalDatabase.VerifyConsistency (System.Int64 blocksize, System.Int64 hashsize, System.Boolean verifyfilelists, System.Data.IDbTransaction transaction) [0x002bc] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Operation.Backup.BackupDatabase+<>c__DisplayClass32_0.<VerifyConsistencyAsync>b__0 () [0x00000] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Operation.Common.SingleRunner+<>c__DisplayClass3_0.<RunOnMain>b__0 () [0x00000] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 at Duplicati.Library.Main.Operation.Common.SingleRunner+<DoRunOnMain>d__2`1[T].MoveNext () [0x000b0] in <a152cea42b4d4c1f8c59ea4a34e06889>:0 — End of stack trace from previous location where exception was thrown — at Duplicati.Library.Main.Operation.BackupHandler+<RunAsync>d__19.MoveNext () [0x003d6] in <a152cea42b4d4c1f8c59ea4a34e06889>:0

Any suggestions?

Thanks

Andy

adhawkins · October 8, 2018, 7:46pm

I’ve tried a ‘Database repair’ and also ‘Verify files’, neither seems to have helped.

Any suggestions?

Andy

ts678 · October 9, 2018, 11:54pm

After having tried other options as you have, sometimes you have to use The DELETE command to delete the troubled backup version. 0 is the most recent, and I don’t know how long ago 50 is, but the Restore dropdown should give you an idea. For maximum safety you could use the Database menu on the job to see the path of the SQLite database for that backup, and make a copy (just in case something gets worse when trying the fix). Deleting versions is normally very common, e.g. retention rules do it all the time, but all is not normal right now. EDIT: Some easy ways to do the delete are either to export the job as a command line and adjust, or to use the web UI job Commandline option and adjust (no Commandline arguments, --version=50, Command now delete).

Sami_Lehtinen · October 10, 2018, 5:03am

That happens with Duplicati more or less randomly. I’m just now recreating about 2 terabyte backup, because this error hit me again: “Fatal error => Detected non-empty blocksets with no associated blocks!”.

But that server was running old version of duplicati, so probably and hopefully that issue have been fixed already.

Also the slow database rebuild is strange issue. Sometimes it happens quickly, and sometimes it’s extremely slow. I guess it depends if it’s trying to recover missing blocks or not. Normal restore without local database happens quite quickly, in case there’s nothing wrong with the backup.

But as you can see, the forum is full of similar reports. That’s why I’m always using at least two backup products in parallel.

One case yesterday:

adhawkins · October 11, 2018, 9:42am

Ok, tried a delete command to delete version 50, but the issue is still present.

Looking at the ‘restore’ dialog, it says there are only 9 versions available. The dropdown then only lists 5 ‘restore from’ options.

Any further suggestions?

Andy

ts678 · October 12, 2018, 6:30pm

EDIT: The error message was changed in 2.0.4.5 beta (and earlier in 2.0.3.12 canary) so you probably don’t need this translation any more, and the version that you delete is the version the message said, for example:

Unexpected difference in fileset version 0: 12/5/2018 6:39:53 PM (database id: 3), found 2 entries, but expected 3

Users of versions without this fix still need to translate, but it’s also possible the version to delete is usually 0.

(prior text)
From looking into the code awhile, I think the number on this message is the internal one, and doesn’t match numbers shown on the dropdown, which based on your 9 versions probably ran from 0 to 8, so asking for 50 probably simply deleted nothing. Is that message still the same (fileset 50 and all)? I couldn’t locate any code which I might persuade to map 50 to one of 0-to-8, but I think you can find it from the database with a viewer. There are 43 rows in the Fileset table on one of my backups, Dropdown shows 0-42, internal ID are 100-142 and mapping is possibly finding the ID from your message, readng a column left, and subtracting 1 from that. EDIT: This is incorrect in the screenshot because the larger timestamp should be at the top and be version 0 in the GUI dropdown. Clicking on the timestamp column can probably sort everything for you, then you need to adjust to 0-based numbers if your particular SQLite browser starts numbers at 1 (as the one shown does).

I’ll stop by #4286 where a similar possible mapping problem showed up, to see if anybody’s still set up to test.

Please let me know if you’d feel comfortable with this. Looking at other cases like this, many were resolved by deleting and recreating the database, however your opening remark was that didn’t go well last time you tried.

adhawkins · October 13, 2018, 2:20pm

Hi

The error message is still the same, referencing fileset 50.

This is the contents of my ‘fileset’ table. What ‘version’ should I specify to ‘delete’ then? 4?

Thanks

Andy

ts678 · October 13, 2018, 4:19pm

Good thing you asked, and posted a picture. I corrected my previous post. There’s an extra step to sort the filesets so that the newest one winds up with the lowest record number. Our viewers both number from 1, so reading through the timestamp values (because I can’t sort your picture by clicking) says fileset 50 would be backup number 0 (i.e. the most recent one) in pretty much the entire rest of the UI except for the message…

Because this technique is kind of an experiment, you can also sanity-check it by putting the Timestamp into a converter from Linux time to normal clock time. In GMT, one converter states it’s October 4, 2018 4:00:00 AM, and you should see that as 0 at top of Restore dropdown, with whatever that time would be in your time zone.

0 is also attractive because it could be from new damage. I hope there’s not old damage behind this though. Although something falling apart in an old backup is probably possible, damage while creating a new backup seems more likely to me. When the dust settles (or maybe before), you might say if anything odd happened.

I’m not sure if anyone will want it, but if you use the job menu Reporting --> Create bug report you can save a sanitized version of your database to a local file, in case someone wants to see the details behind your error. Ultimately the wish would be to understand the issue well enough to reduce frequency and improve recovery.

Maybe the safest and most visible way to do the delete is in the job’s Commandline menu item after changing the command to delete, and adjusting the other fields to match the delete command syntax which (compared to backup) looks like you’d clear out the box with the sources files and paths. Also add --version=0, and to do this as carefully as possible, you can add --dry-run, check it, and give that a try first to see what would occur were it not for --dry-run. For good verbosity, add --console-log-level=Information. If it’s easier, you could add these to the job temporarily instead of doing manual entry at least twice (once for trial, and once for a repair).

I’m hoping this gets backups back in business without any further issues. If there’s anything absolutely critical from the current backup that you want to preserve, maybe restore (from a sick backup) is possible before… Best case would be if the backup is more for short-term needs, and we’re doing this in the hope of recovery without having to start completely over or try the painful database recreate. If you’re up to try this, please do.

adhawkins · October 16, 2018, 9:25am

Ok, I’ve initiated the ‘delete’ command for version 0. Sadly I didn’t add any extra logging, but I’ve gone into About → System Logs and viewed the live log. This is all I get at the moment:

15 Oct 2018 21:12: Deleting 1 remote fileset(s) …

15 Oct 2018 21:12: All backups to delete:

15 Oct 2018 21:12: Backups outside of all time frames and thus getting deleted:

21:12 yesterday is probably about the time I kicked it off. Will see how long this takes to complete, but it’s been running for over 12 hours now…

Andy

JonMikelV · October 16, 2018, 12:54pm

Thanks for trying this out!

My guess is the delete went pretty quick and now you’re waiting for compacting to finish. If there anything new on the live log?

adhawkins · October 16, 2018, 1:06pm

It didn’t go well. Duplicati seems to have crashed during the delete operation.

I’ve just restart it, and it’s now going through the pending backups that it missed. Will see what state it’s in after it’s finished.

I’m starting to wonder whether running a ‘Canary’ build is the right thing for me. How would I drop back to a ‘Beta’ release?

Andy

ts678 · October 17, 2018, 8:19pm

Downgrading / reverting to a lower version gives general procedures, however it can require manual work to downgrade databases, depending on how much change has happened since the old beta version. It’s much simpler if one is willing to export backup jobs then start over, but sometimes people value their old versions. Your case is different because you had a broken backup to start with, and possibly the crash made it worse. Beyond the documented server database downgrade, it looks like the job database format has changed too.

I’m going to look into downgrading procedures sometime, meanwhile you can consider the direction you like. Maybe you’ll have a better idea where things landed after all the action finishes. Concerning the right thing, canary is sometimes better (due to fixes) than beta, sometimes worse (new bugs), and always less “known”.

adhawkins · October 22, 2018, 8:18am

Hi,

Well, after the delete of version zero, my backup is now running successfully again.

Is there any way to check that all the files have been backed up successfully? I wonder if deleting the version would mean that some of the files aren’t backed up again?

Thanks

Andy

ts678 · October 22, 2018, 11:13am

Deleting version 0 should just makes things look as if that version was never there. The checks for changes should be done against the database records for the new most-recent version, and might upload a bit more.

For checking backups, some may look at the Restore display to see if it looks as expected, some may rely on the default sampled testing done after a backup and settable (as high as they can stand) with –backup-test-samples, some may favor The TEST command periodically (the same as the automatic testing, but it’s run on demand instead of all the time), some may do sampled restores (recommended regardless of anything else).

Testing ALL files directly could take lots of space and download use, but I’ve done sanity checking on smaller backups just by restoring everything then checking Windows File Explorer folder information against original. Better tools exist if one prefers to compare folders more closely, e.g. to test that all file contents are identical.

kees-z · November 21, 2018, 10:05am

I had the same error (version 2.0.4.2 Experimental).

Deleting whatever backup version didn’t help (version 0, the version it was complaining about, a range of versions around the version it complained about, any version in the SQLITE fileset table that contained the number in the error message anywhere).

A database repair din’t help either, the check files operation gave the same error.

At the end, delete and recreate database solved it for me.

ts678 · November 22, 2018, 8:42pm

If the delete testing used the version complained about, or an ID from the fileset tables, that may be incorrect assuming my analysis above is correct. It hasn’t been well-proven yet, but I’m not sure your deletes disprove. Right now, my fileset table has 49 entries with ID from 151 to 199. My theory is that the error messages would give the 151-199 number, whereas what the UI requires for input is 0 to 48 (zero-based version of 49 entries).

was hard to follow. You need to find the row based on ID, then compute a 0-based row number with 0 newest (which is opposite the way the database browser probably numbered rows, even ignoring the off-by-1 issue).

Did the fileset number in the complaint change? Did the number of entries in the Restore dropdown change? Seeing no changes might mean you have a backup like mine that’s run long enough to drop off old versions.

kees-z · November 23, 2018, 10:04am

I don’t know the exact numbers anymore, but the record numbers matched the ID’s quite nicely, probably because no backup versions were deleted for this backup job.

Assuming that backup version 266 was the version it was complaining about:

I first deleted version 266. From that moment, the error messages reported a problem with version 265.
So I deleted version 260 to 270. These versions included all references to 265 (record number and backup version).
From this moment, errors were reported for (I guess) version 259.
I deleted the local database for this backup job an created a new one. DB creation completed successfully, I haven’t seen any error messages since then.

In summary, from what I remember, deleting any version from the backup set completed successfully, but resulted in an error message for one version newer than the one(s) I deleted. Repairing the database completed successfully, but didn’t change anything to the error messages. After a full recreate of the local DB, the backup worked without issues.

ts678 · November 24, 2018, 1:06am

I found my workaround for the unsuitable-for-UI-use fileset numbers may be obsolete as of 2.0.3.12 canary:

https://github.com/duplicati/duplicati/compare/v2.0.3.11-2.0.3.11_canary_2018-09-05…v2.0.3.12-2.0.3.12_canary_2018-10-23

This fix is in 2.0.4.2 experimental too, and it might explain how the error message’s fileset number changes, for example if something newer than the bad one was deleted, the UI number of the bad fileset grows lower, however my claim remains that database ID numbers from the Fileset table can’t be put directly in --version because low numbers are the most recent in the UI numbering, but the oldest when looking at the ID field…

@kees-z was talking about complaint numbers, and also database records, so I’m not sure of exact steps, however if the message looked different than the old one (see source), and its number was used for delete but the fileset error remained (see first bullet), then I hope there’s not an issue with the new fileset mapper.

Gijs · September 14, 2019, 10:27am

I know this is a fairly old thread but I experienced the same: difference in fileset. Tried a lot of solutions recommended here, but nothing worked. Then I looked in the folder containing the .sqlite databases and I found a database backup predating the fileset timestamp. I stopped the duplicati service, I moved the current backup database and renamed the backuped db to the name of the current db. Then I ran a backup. Everything smooth sailing from there.

ts678 · September 14, 2019, 11:39pm

Welcome to the forum @Gijs

f the backup DB is pretty recent, this might be a reasonable approach, though a future repair might delete some extra files that the stale DB didn’t know about. If the DB is very old, it might take a substantial amount of new backing up to get the DB and the backup files updated from whenever the DB was made. The main reason for the backup files is for a way to get back onto an older database format if a new Duplicati (e.g. a Beta) doesn’t run satisfactorily. Databases are upgraded automatically, but if you go back to older Duplicati, it won’t handle DB formats that didn’t exist when it was written. That’s the main use for the backup file – it’s in an older DB format.

Speaking of new Duplicati, at least one way to get this was fixed in v2.0.4.22-2.0.4.22_canary_2019-06-30:

Fixed data corruption caused by compacting, thanks @ts678 and @warwickmm

but the only way to get it (because 2.0.4.23 beta is basically 2.0.4.5 plus a warning) is a recent canary, e.g. v2.0.4.28-2.0.4.28_canary_2019-09-05 and then you can change Settings back to Beta channel, if you like. This error was happening to me way too much, so I was on 2.0.4.22 (which worked very well). 2.0.4.28 is not yet very well proven, but seems to be doing OK. At some future date it should lead to new feature beta.