Is there any way to fix an "unexpected difference in fileset" error other than deleting?

ajkessel · February 28, 2019, 10:08pm

I’ve never been able to sustain Duplicati backups very long because I always get an “unexpected difference in fileset” error after a few days that blocks further backups. It appears that the error is always (or almost always) just a difference of one file. Is there any way to determine why this keeps happening, what the missing file is, how to prevent/fix it?

The only suggestion I’ve seen in this forum is to delete the bad fileset. I can do this, although I takes several hours. I recently had this error for fileset version 3, so deleted that, but before I could do a successful backup I got the error for fileset version 1. If I delete 1 and then have to delete 0, don’t I then lose all my base image so my next incremental backup is basically everything?

Also, is the “ListBrokenFiles” command related to this issue? I thought I would try that on the broken version (after a repair had no effect), and got this output/error:

Input command: list-broken-files
Input arguments:
        file:///[REDACTED]
          
Input options:
backup-name: [REDACTED]
dbpath: [REDACTED]
encryption-module: aes
compression-module: zip
dblock-size: 50mb
passphrase: [REDACTED]
retention-policy: 1W:1D,4W:1W,12M:1M
skip-files-larger-than: 15GB
console-log-level: profiling
debug-output: true
version: 1
          
The operation ListBrokenFiles has started
Starting - Running ListBrokenFiles
Starting - ExecuteScalarInt64: INSERT INTO "Operation" ("Description", "Timestamp") VALUES ("ListBrokenFiles", 1551389697); SELECT last_insert_rowid();
ExecuteScalarInt64: INSERT INTO "Operation" ("Description", "Timestamp") VALUES ("ListBrokenFiles", 1551389697); SELECT last_insert_rowid(); took 0:00:00:00.083
Starting - ExecuteReader: SELECT "Key", "Value" FROM "Configuration"
ExecuteReader: SELECT "Key", "Value" FROM "Configuration"  took 0:00:00:00.000
Starting - ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC
ExecuteReader: SELECT "ID", "Timestamp" FROM "Fileset" ORDER BY "Timestamp" DESC took 0:00:00:00.000
Starting - ExecuteReader: SELECT DISTINCT "B"."Timestamp", "A"."FilesetID", COUNT("A"."FileID") AS "FileCount" FROM "FilesetEntry" A, "Fileset" B WHERE "A"."FilesetID" = "B"."ID" AND "A"."FileID" IN (
SELECT DISTINCT "ID" FROM (
  SELECT "ID" AS "ID", "BlocksetID" AS "BlocksetID" FROM "FileLookup" WHERE "BlocksetID" != -100 AND "BlocksetID" != -200
UNION
  SELECT "A"."ID" AS "ID", "B"."BlocksetID" AS "BlocksetID" FROM "FileLookup" A LEFT JOIN "Metadataset" B ON "A"."MetadataID" = "B"."ID"
)
WHERE "BlocksetID" IS NULL OR "BlocksetID" IN
  ( 
    SELECT DISTINCT "BlocksetID" FROM
    ( 
      SELECT "BlocksetID" FROM "BlocksetEntry" WHERE "BlockID" NOT IN
        (SELECT "ID" FROM "Block" WHERE "VolumeID" IN
          (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks"))
        UNION
      SELECT "BlocksetID" FROM "BlocklistHash" WHERE "Hash" NOT IN
        (SELECT "Hash" FROM "Block" WHERE "VolumeID" IN
          (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks"))
        UNION
      SELECT "A"."ID" AS "BlocksetID" FROM "Blockset" A LEFT JOIN "BlocksetEntry" B ON "A"."ID" = "B"."BlocksetID" WHERE "A"."Length" > 0 AND "B"."BlocksetID" IS NULL
    )
    WHERE "BlocksetID" NOT IN (SELECT "ID" FROM "Blockset" WHERE "Length" == 0)
  )
) AND "A"."FilesetID" IN (SELECT "ID" FROM "Fileset"  WHERE  "ID" IN (58)) GROUP BY "A"."FilesetID"
ExecuteReader: SELECT DISTINCT "B"."Timestamp", "A"."FilesetID", COUNT("A"."FileID") AS "FileCount" FROM "FilesetEntry" A, "Fileset" B WHERE "A"."FilesetID" = "B"."ID" AND "A"."FileID" IN (
SELECT DISTINCT "ID" FROM (
  SELECT "ID" AS "ID", "BlocksetID" AS "BlocksetID" FROM "FileLookup" WHERE "BlocksetID" != -100 AND "BlocksetID" != -200
UNION
  SELECT "A"."ID" AS "ID", "B"."BlocksetID" AS "BlocksetID" FROM "FileLookup" A LEFT JOIN "Metadataset" B ON "A"."MetadataID" = "B"."ID"
)
WHERE "BlocksetID" IS NULL OR "BlocksetID" IN
  ( 
    SELECT DISTINCT "BlocksetID" FROM
    ( 
      SELECT "BlocksetID" FROM "BlocksetEntry" WHERE "BlockID" NOT IN
        (SELECT "ID" FROM "Block" WHERE "VolumeID" IN
          (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks"))
        UNION
      SELECT "BlocksetID" FROM "BlocklistHash" WHERE "Hash" NOT IN
        (SELECT "Hash" FROM "Block" WHERE "VolumeID" IN
          (SELECT "ID" FROM "RemoteVolume" WHERE "Type" = "Blocks"))
        UNION
      SELECT "A"."ID" AS "BlocksetID" FROM "Blockset" A LEFT JOIN "BlocksetEntry" B ON "A"."ID" = "B"."BlocksetID" WHERE "A"."Length" > 0 AND "B"."BlocksetID" IS NULL
    )     
    WHERE "BlocksetID" NOT IN (SELECT "ID" FROM "Blockset" WHERE "Length" == 0)
  )       
) AND "A"."FilesetID" IN (SELECT "ID" FROM "Fileset"  WHERE  "ID" IN (58)) GROUP BY "A"."FilesetID" took 0:00:18:44.361
Running ListBrokenFiles took 0:00:18:45.347
          
Mono.Data.Sqlite.SqliteException (0x80004005): SQLite error
cannot rollback - no transaction is active
  at Mono.Data.Sqlite.SQLite3.Reset (Mono.Data.Sqlite.SqliteStatement stmt) [0x00096] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SQLite3.Step (Mono.Data.Sqlite.SqliteStatement stmt) [0x00046] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SqliteDataReader.NextResult () [0x00129] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SqliteDataReader..ctor (Mono.Data.Sqlite.SqliteCommand cmd, System.Data.CommandBehavior behave) [0x00051] in <fe9fd999cd9f407db94500dce293e66f>:0
  at (wrapper remoting-invoke-with-check) Mono.Data.Sqlite.SqliteDataReader:.ctor (Mono.Data.Sqlite.SqliteCommand,System.Data.CommandBehavior)
  at Mono.Data.Sqlite.SqliteCommand.ExecuteReader (System.Data.CommandBehavior behavior) [0x00006] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SqliteCommand.ExecuteNonQuery () [0x00000] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SqliteTransaction.IssueRollback (Mono.Data.Sqlite.SqliteConnection cnn) [0x00012] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SqliteTransaction.Rollback () [0x00008] in <fe9fd999cd9f407db94500dce293e66f>:0
  at Mono.Data.Sqlite.SqliteTransaction.Dispose (System.Boolean disposing) [0x0001e] in <fe9fd999cd9f407db94500dce293e66f>:0
  at System.Data.Common.DbTransaction.Dispose () [0x00000] in <a566c189d19d4271ab58420d6f875b13>:0
  at Duplicati.Library.Main.Operation.ListBrokenFilesHandler.Run (Duplicati.Library.Utility.IFilter filter, System.Func`6[T1,T2,T3,T4,T5,TResult] callbackhandler) [0x00059] in <fbbeda0cad134e648d781c1357bdba9c>:0
  at Duplicati.Library.Main.Controller+<>c__DisplayClass33_0.<ListBrokenFiles>b__0 (Duplicati.Library.Main.ListBrokenFilesResults result) [0x0001c] in <fbbeda0cad134e648d781c1357bdba9c>:0
  at Duplicati.Library.Main.Controller.RunAction[T] (T result, System.String[]& paths, Duplicati.Library.Utility.IFilter& filter, System.Action`1[T] method) [0x0011d] in <fbbeda0cad134e648d781c1357bdba9c>:0

JonMikelV · February 28, 2019, 11:26pm

I think the “no transaction is active” error is relatively new and unrelated you the fileset difference error as it happened on one or two of my backups one then never again (so far).

Can you post your actual fileset error?

If you disable your custom retention policy (make note of the string then set to “keep all” does the error go away?

Alternatively, if you use “restore from destination” do you see backups available for restore that were reported as failed?

Basically, I’m trying to confirm a hunch that the issue is in the retention policy code that runs AFTER the backup is completed (meaning it’s restorable from the destination) causing the job to be flagged as “failed” (meaning it doesn’t get recorded as restorable in the local database).

ts678 · February 28, 2019, 11:46pm

It’s suspected to be related to interrupted backups, including interruption caused by intermittent networks. There are recently test cases, and bugs filed, so this is probably waiting for a developer to look at what’s going on. Reproducible problems are a fine first step though. There’s much less hope of fix without them.

“Unexpected difference in fileset” errors [2.0.3.9]

Error / DB corruption: Unexpected difference in fileset version X: found Y entries, but expected Z #3644

Failed put operations do not result in error #3673

It’s not really a missing file, but an internal difference between ways to count files that should be the same. With enough SQL queries, one could probably derive better information, but I don’t have the query offhand. Knowing the file is just the first step. After that would come getting things in agreement (or try the deletion).

It’s unclear what all the causes might be, but avoiding interruptions by things like sleep or reboot may help, and network issues can sometimes be avoided with a larger –number-of-retries. You can see whether it’s been doing retries recently by checking Job → Show log → Remote to check for put → rename → put sequences which imply retries. Click on the lines to see Size and Hash info to be more sure it was a retry. For longer-term unattended logging, you can add –log-file and –log-file-log-level=retry to your job or server.

ajkessel · March 1, 2019, 12:03am

Thanks all—I will follow up on the items above. I should add, though, that this particular installation is a local backup just to an external drive, so there should not be any network issues. It’s a Linux box that’s been up for months, no sleep or reboot.

ajkessel · March 1, 2019, 6:56pm

Can you post your actual fileset error?

2019-02-27 03:36:00 -05 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
Duplicati.Library.Interface.UserInformationException: Unexpected difference in fileset version 1: 2/13/2019 1:00:01 AM (database id: 58), found 149520 entries, but expected 149521
  at Duplicati.Library.Main.Database.LocalDatabase.VerifyConsistency (System.Int64 blocksize, System.Int64 hashsize, System.Boolean verifyfilelists, System.Data.IDbTransaction transaction) [0x00370] in <fbbeda0cad134e648d781c1357bdba9c>:0 
  at Duplicati.Library.Main.Operation.Backup.BackupDatabase+<>c__DisplayClass33_0.<VerifyConsistencyAsync>b__0 () [0x00000] in <fbbeda0cad134e648d781c1357bdba9c>:0 
  at Duplicati.Library.Main.Operation.Common.SingleRunner+<>c__DisplayClass3_0.<RunOnMain>b__0 () [0x00000] in <fbbeda0cad134e648d781c1357bdba9c>:0 
  at Duplicati.Library.Main.Operation.Common.SingleRunner+<DoRunOnMain>d__2`1[T].MoveNext () [0x000b0] in <fbbeda0cad134e648d781c1357bdba9c>:0 
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw () [0x0000c] in <8f2c484307284b51944a1a13a14c0266>:0 
  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Threading.Tasks.Task task) [0x0004e] in <8f2c484307284b51944a1a13a14c0266>:0 
  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Threading.Tasks.Task task) [0x0002e] in <8f2c484307284b51944a1a13a14c0266>:0 
  at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Threading.Tasks.Task task) [0x0000b] in <8f2c484307284b51944a1a13a14c0266>:0 
  at System.Runtime.CompilerServices.TaskAwaiter.GetResult () [0x00000] in <8f2c484307284b51944a1a13a14c0266>:0 
  at Duplicati.Library.Main.Operation.BackupHandler+<RunAsync>d__19.MoveNext () [0x003cd] in <fbbeda0cad134e648d781c1357bdba9c>:0

If you disable your custom retention policy (make note of the string then set to “keep all” does the error go away?

I had “smart retention” (rather than anything more custom/specific). I just ran backup again with “keep all” set instead, and it failed in the same way. I can try deleting the problematic version 1 and then running the backup once more and see if that works any better.

Alternatively, if you use “restore from destination” do you see backups available for restore that were reported as failed?

Yes, “restore from destination” shows versions 0-5 as all available, which includes version 1 which triggered the exception.

ts678 · March 1, 2019, 8:49pm

When I got this error, it was version 0, the latest. Your original post was version 3. I think Duplicati checks all the versions, and if so, I wonder if errors in old versions might be introduced by operations that change them, such as compact? It’s easy to tell if compact has run by looking in the job logs. Here’s an example:

CompactResults:
    DeletedFileCount: 14
    DownloadedFileCount: 5
    UploadedFileCount: 4
    DeletedFileSize: 195118534
    DownloadedFileSize: 183452545
    UploadedFileSize: 96832127
    Dryrun: False
    MainOperation: Compact
    ParsedResult: Success

Sometimes reporting options such as email also give log details you wouldn’t ordinarily see without setting up a --log-file (which would be a reasonable idea because it may give some clue as to what’s going on…).

Things like logs and log analysis are a great start. From a debug point of view, a system without network issues, sleeps, or reboots should be a good candidate for looking at what other sorts of things make this.

If your external drive has space to spare (at least you’re not paying a cloud provider by amount stored…), –no-auto-compact would remove that from the picture (to see if it helps). You can also try letting versions build up instead of trimming with retention policy. Basically, experiment to see what things avoid the issue.

If you add a log file to the job or server, --log-file-log-level=information would have compact and delete info.

ajkessel · March 1, 2019, 8:56pm

I do see a number of CompactResults entries in my log file, and have a running log file at level “Information” set up on the local drive. Right now I’m trying to delete the most recent bad fileset (which takes several hours), disabling any retention policy, and see if that helps at all. If not, I can try disabling auto-compact next. This is all going to an 8TB external USB drive, so plenty of room to experiment for a while without running out of space.

drwtsn32 · March 2, 2019, 4:41pm

I am getting this error a LOT on one of my backup sets. I dutifully “delete” the offending backup but the problem returns at least once a week, just on this one backup set on one machine.

I don’t believe it is triggered by network communication errors.

ajkessel · March 2, 2019, 5:00pm

Having deleted first version 3, then turned off retention policy and deleted version 1, the latest backup failed with the same 1-file off error for version 0. So now I’m deleting version 0 (which takes about 10 hours) and will try again. Given how long deleting and backing up take, it’s very slow going for troubleshooting.

warwickmm · March 2, 2019, 5:24pm

Are you able to reproduce the problem with a local drive destination (not an external drive)?

ajkessel · March 2, 2019, 6:50pm

Are you able to reproduce the problem with a local drive destination (not an external drive)?

Not easily, because there is not enough local space to make a copy of the data being backed up. Is the error generated based on the combination of what’s in the sqlite file as well as the actual backup chunks on the external drive?

JonMikelV · March 4, 2019, 1:58pm

Yes - the database thinks there should be X files but the destination has Y, so Duplicati errs on the side of caution and let’s you know there’s something unexpected going on.

ajkessel · March 4, 2019, 2:57pm

One possible wrinkle is all my external drives are formatted EXFAT with the idea that in a disaster recovery situation, that will be the easiest to mount on any platform (Windows/Linux/OSX). I know Duplicati creates a large number of files; perhaps there is a Linux performance issue with that filesystem. I think read/write is no slower with EXFAT but maybe the particular filesystem operations Duplicati is doing are slowing things down. If someone thinks this is a likely cause, I can try to set up a new backup drive formatted with, e.g., ext3 or ext4.

ts678 · March 4, 2019, 9:09pm

Comparison is internal. Queries done below:

github.com

duplicati/duplicati/blob/356ca81d8bfa2d16a353400ce50432ca681a407d/Duplicati/Library/Main/Database/LocalDatabase.cs#L822-L825


      
          var expandedCmd = string.Format(@"SELECT COUNT(*) FROM (SELECT DISTINCT ""Path"" FROM ({0}) UNION SELECT DISTINCT ""Path"" FROM ({1}))", LocalDatabase.LIST_FILESETS, LocalDatabase.LIST_FOLDERS_AND_SYMLINKS);

          var expandedlist = cmd2.ExecuteScalarInt64(expandedCmd, 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID, filesetid);

          //var storedfilelist = cmd2.ExecuteScalarInt64(string.Format(@"SELECT COUNT(*) FROM ""FilesetEntry"", ""FileLookup"" WHERE ""FilesetEntry"".""FilesetID"" = ? AND ""FileLookup"".""ID"" = ""FilesetEntry"".""FileID"" AND ""FileLookup"".""BlocksetID"" != ? AND ""FileLookup"".""BlocksetID"" != ?"), 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID);

          var storedlist = cmd2.ExecuteScalarInt64(@"SELECT COUNT(*) FROM ""FilesetEntry"" WHERE ""FilesetEntry"".""FilesetID"" = ?", 0, filesetid);

The storedlist is simple, and expandedlist is complex, but the counts are supposed to be the same. Something went inconsistent, so you got told. Duplicati has many self-checks. This one is doing its job… What needs to happen (IMO) is to track down ways these arise, and find a suitable developer to fix them. There may be time and expertise challenges. Not everybody has the skillsets to work on core algorithms.

JonMikelV · March 4, 2019, 11:24pm

My mistake - I may have been thinking of a different error.

I agree that messages like this are Duplicati doing it’s job - it just needs to go the extra mile(s) to avoid the problem and/or walk the user through the solution.

I’ll get on that as soon as I win the lottery.

drakar2007 · March 5, 2019, 7:34pm

I just got this error on my largest backup set, even after the litany of issues I had with it last week that i’d thought i’d solved on my own. This happened to me on my very next automatically-scheduled backup (despite it having run successfully several times when kicked-off manually earlier that day).

Repair does nothing. I’m not interested in doing another Recreate now without evidence that it’ll actually work long-term. I’m about to just give up.

Repair finishes almost instantly and claims this, even just moments after the above is seen when trying to run a backup:

ajkessel · March 5, 2019, 8:21pm

After deleting version 0 (the most recent one that Duplicati was complaining about), I got this error:
Duplicati.Library.Interface.UserInformationException: Found 158834 remote files that are not recorded in local storage, please run repair
Running repair manually didn’t work and it seems to have automatically triggered database recreate. Looks like that is going to take about two weeks to complete.

JonMikelV · March 6, 2019, 12:22pm

I don’t know of anything that would automatically trigger a database recreate (@kenkendk, please correct me if I’m wrong) so it sounds like your local database went away somehow.

I’m not sure how a version delete would do that… An I correct inI assuming you’ve got 158834 files at the destination?

ajkessel · March 6, 2019, 2:15pm

I’m not sure how the recreate got started – the last thing I did, after getting the “please run repair” error, was to run a repair. Then, overnight – I believe in connection with the timed nightly backup – Duplicati self-initiated a rebuild. Here’s the log:

2019-03-03 01:57:03 -05 - [Error-Duplicati.Library.Main.Operation.FilelistProcessor-ExtraRemoteFiles]: Found 158834 remote files that are not recorded in local storage, please run repair
2019-03-03 01:57:04 -05 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
Duplicati.Library.Interface.UserInformationException: Found 158834 remote files that are not recorded in local storage, please run repair
  at Duplicati.Library.Main.Operation.FilelistProcessor.VerifyRemoteList (Duplicati.Library.Main.BackendManager backend, Duplicati.Library.Main.Options options, Duplicati.Library.Main.Database.LocalDatabase database, Duplicati.Library.Main.IBac
kendWriter log, System.String protectedfile) [0x00104] in <fbbeda0cad134e648d781c1357bdba9c>:0 
  at Duplicati.Library.Main.Operation.BackupHandler.PreBackupVerify (Duplicati.Library.Main.BackendManager backend, System.String protectedfile) [0x0005f] in <fbbeda0cad134e648d781c1357bdba9c>:0 
2019-03-03 01:57:09 -05 - [Information-Duplicati.Library.Modules.Builtin.SendMail-SendMailComplete]: Email sent successfully using serverD$smtps://smtp.gmail.com
2019-03-03 09:11:13 -05 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Repair has started
2019-03-03 09:11:14 -05 - [Information-Duplicati.Library.Main.Operation.RepairHandler-RenamingDatabase]: Renaming existing db from [REDACTED].sqlite to [REDACTED].backup
2019-03-03 09:11:17 -05 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2019-03-03 10:07:33 -05 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (155.11 KB)
2019-03-03 14:29:45 -05 - [Information-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-RebuildStarted]: Rebuild database started, downloading 4 filelists

Yes, find reports exactly 158834 files in the (local) backup destination directory.

I should also note that in the few months I’ve been trying to use Duplicati, problems like this have arisen a few times. In the past, I’ve just given up, reformatted the backup destination, and started over. But an initial backup takes at least two weeks, maybe longer. I’m not sure if rebuilding is even any faster than just doing a fresh backup.

ajkessel · March 6, 2019, 3:55pm

If there are 158834 files in my backup destination directory, and 158834 files missing, does that mean that rebuilding the database will require Duplicati to register a “Backend Event - Completed” in the log 158834 times? At the current rate, I’m only getting about one such entry per minute, so recreating the database will take three months. These are pretty tiny files (20-100KB) so the slow speed seems crazy. Any suggestions for why this is?