Found inconsistency in the following files while validating

morse · December 31, 2018, 7:32pm

Hello,

I’ve got a serious problem with Duplicati that is currently preventing me from running any backup operations, and that so far I have been unable to either fix or work around.

My backups worked fine (2.0.3.3 beta on Debian 9 x64, to B2 backend, 1.2TB backup size, 1.4TB in the data store) until 28/12, when I got the following error message:

System.IO.InvalidDataException: Found inconsistency in the following files while validating database:
/home/shares/abc/computers/Applications/backup.ISO, actual size 6039240704, dbsize 6039138304, blocksetid: 254312
. Run repair to fix it.

Attempting to repair the database does nothing:

Duplicati.Library.Main.Operation.RepairHandler-DatabaseIsSynchronized]: Destination and database are synchronized, not making any changes

I attempted to rebuild the database, but it was going to take so long (1MB completed after about an hour, for a database 6GB in size) that I cancelled the operation and restored a copy of the previous database.

I upgraded to the latest beta (2.0.4.5_beta_2018-11-28) but this has not helped at all.

I successfully attempted to use the CLI tool to list-broken-files, it lists the above file (and only the above file) in all backup sessions.

However, when I attempted to use the CLI tool to purge-broken-files, it fails with the error message:-

root@moriarty:~# duplicati-cli purge-broken-files “b2://bucketname-backup-shares/moriarty/shares?auth-username=censored&auth-password=censored” --dbpath=/root/.config/Duplicati/SSUBISBTXT.sqlite

Enter encryption passphrase:

ErrorID: PassphraseChangeNotSupported
You have attempted to change a passphrase to an existing backup, which is not supported. Please configure a new clean backup if you want to change the passphrase.

This error is completely nonsensical, in no way I am attempting to change any password, the password is definitely correct (cut & paste, not typed, and in any case it works with the list-broken-files command).

Essentially, I’m out of options to fix this thing and get my backup running again. Aside from the fact that Duplicati has somehow lost data on the data store, and thus destroyed the backup of a file, it is now failing to fix the resulting mess and preventing me from backing up at all.

Any help that anyone can offer at this stage would be gratefully received.

Wim_Jansen · January 1, 2019, 8:07am

Any special characters in your pwd? Try running the command in the gui.

ts678 · January 4, 2019, 4:29pm

Speaking only to the password issue for now, list-broken-files command code doesn’t appear to check for a password change. Actually top-level purge-broken-files code doesn’t either but it runs something that does.

When you cut and pasted the password, was it from your record, from the original database, or elsewhere?

Do things proceed well if you restore a sample file using direct restore, with the password you think is right?

If it comes down to password-testing individual files from the destination, that’s also possible, but more work, using mono /usr/lib/duplicati/SharpAESCrypt.exe or AES Crypt, which possibly is also in your repo.

morse · January 4, 2019, 9:25pm

Hi, thanks for replying. Yes there are special characters, but I’ve tried giving the password both (escaped) via the command line, and also via stdin when prompted, and it fails with the same error both ways.

However, I don’t think that special characters are the issue, because the list-broken-files command works perfectly - when given the password either on the command line, or via stdin

morse · January 4, 2019, 9:33pm

The password is definitely correct, and has remained unchanged for the whole existence of the backup.

It was copied & pasted from my password manager, where it was generated in the first place. The password there is identical to the password in the GUI.

I’ve not tried doing a direct restore - my understand is that, first, this command must reconstruct the database. I tried to reconstruct the database, at the pace that I saw, this process would take (literally) months.

At the end of the day, until 1 week ago this backup system was working perfectly, then it broke, for reasons I don’t understand, and now won’t back up at all, won’t repair the database, won’t purge the “broken” files because the tools fail with irrelevant error messages about passwords, and reconstructing the database (which might not even solve the issue) will take months.

I know the software is marked as beta, but seriously, this is really deficient performance given the usage case. Not the failure itself, which might be expected, but the complete failure of any repair tool to operate.

It now looks like I’ve going to have to switch to an alternative backup system, which will requires weeks of uploads in a new format. Highly frustrating.

drwtsn32 · July 28, 2019, 4:23pm

I hit this problem on one of my machines yesterday. Have never seen it before.

Error message:

Failed: Found inconsistency in the following files while validating database:
C:\Virtual Machines\Delphi XE6\Delphi XE6.vmdk, actual size 32042713088, dbsize 32042680320, blocksetid: 167359 C:\Virtual Machines\Delphi XE6\Delphi XE6.vmdk, actual size 32202293248, dbsize 32202260480, blocksetid: 169268 C:\Virtual Machines\Delphi XE6\Delphi XE6.vmdk, actual size 32203866112, dbsize 32203833344, blocksetid: 170897 . Run repair to fix it.

This is a large-ish VMDK but hasn’t been modified in over a month.

I tried running repair - it completed almost instantly but made no difference. The same error appeared next time I tried to run a backup.

I then restored the sqlite database for this backup set that was created at the end of the last successful backup job. Unsurprisingly this made no difference, but I wanted to check just to be sure.

I restored the next older sqlite backup version and then kicked off a backup:

Failed: Found 3 remote files that are not recorded in local storage, please run repair

Totally expected since the slightly older database backup wasn’t aware of the final backup before the problem occurred. Ran a repair and got this weird error:

[Error-Duplicati.Library.Main.Operation.RepairHandler-FailedNewIndexFile]: Failed to accept new index file: duplicati-i2e03ff426cd540ae97cdabf9be893713.dindex.zip.aes, message: Volume duplicati-bb5adb84c128b4a21973a939db00fdc4e.dblock.zip.aes has local state Deleting

Ran the repair a second time and it succeeded. (??)

Ran a backup and it worked fine - original error is gone.

If I didn’t have database backups, I suppose the solution would have been to delete/recreate.

Questions:

What may cause this type of issue in the first place?
Does repair ever actually do anything for this type of problem?

ts678 · July 29, 2019, 8:04pm

What Duplicati version is this (possibly also including recent history if it’s recently changed)?

Do you have a copy of something close to the original database that was showing the issue?

What backend is the backup using?

Do you have any additional log information beyond the quite brief summaries kept by default?

Looking over brief summaries, were there any compact operations done leading up to issue?

If you have similar files, are they always a multiple of 64KiB? This one was, all three backups.

This one also came up short 32KiB three times. Are you running a non-standard --blocksize?

EDIT: If your DB backups are kept on a rolling basis, don’t let them roll and lose history of this.

drwtsn32 · July 29, 2019, 9:16pm

Version 2.0.4.22, running on this machine since July 8. Prior to that was running 2.0.4.21 (for a short while) and prior to that 2.0.4.5 since its release.

I have a copy of the database with the problem and a copy of the database I restored just before the problem occurred. If it can help with troubleshooting, I can provide copies.

Backend is WebDAV on a remote NAS.

I have no extra logging besides what Duplicati does.

I have auto compact enabled. The backup job that seems to have resulted in a messed up local database - no compaction actually happened (0 files deleted, 0 files downloaded, 0 files uploaded). Same with the backup job prior to that one. But the next prior backup job did do a bit of a compaction.

On this particular machine this is the only VMDK I have. But I do have another machine at home with probably 20 VMs. Not sure if VMDKs are always multiples of 64KiB.

My block size and volume size are set to default.

ts678 · July 29, 2019, 9:44pm

Because of 2.0.4.13 DB bug report code keeps original FileLookup Path intact (privacy regression) #3790 I’m more reluctant than usual to ask for bug reports. You can delete the table with DB Browser for SQLite and do VACUUM if you like. Or while you have it open, just look into the DBs yourself. That message is at:

github.com

duplicati/duplicati/blob/cd27a175000932209d6af415000493d9c4af68cb/Duplicati/Library/Main/Database/LocalDatabase.cs#L783


      
              ""A"".""ID"" = ""B"".""BlocksetID""

          

          ";

                          // For each blockset with wrong lengths, fetch the file path

                          var reportDetails = @"SELECT ""CalcLen"", ""Length"", ""A"".""BlocksetID"", ""File"".""Path"" FROM (" + combinedLengths + @") A, ""File"" WHERE ""A"".""BlocksetID"" = ""File"".""BlocksetID"" AND ""A"".""CalcLen"" != ""A"".""Length"" ";

                          

                          using(var rd = cmd.ExecuteReader(reportDetails))

                              if (rd.Read())

                              {

                                  var sb = new StringBuilder();

                                  sb.AppendLine("Found inconsistency in the following files while validating database: ");

                                  var c = 0;

                                  do

                                  {

                                      if (c < 5)

                                          sb.AppendFormat("{0}, actual size {1}, dbsize {2}, blocksetid: {3}{4}", rd.GetValue(3), rd.GetValue(1), rd.GetValue(0), rd.GetValue(2), Environment.NewLine);

                                      c++;

                                  } while(rd.Read());

                                  

                                  c -= 5;

                                  if (c > 0)

and right above that is a query you can run in DB Browser to see when/how those blocksets went broken.

SELECT 
    "A"."ID" AS "BlocksetID", 
    IFNULL("B"."CalcLen", 0) AS "CalcLen", 
    "A"."Length"
FROM
    "Blockset" A
LEFT OUTER JOIN
    (
        SELECT 
            "BlocksetEntry"."BlocksetID",
            SUM("Block"."Size") AS "CalcLen"
        FROM
            "BlocksetEntry"
        LEFT OUTER JOIN
            "Block"
        ON
            "Block"."ID" = "BlocksetEntry"."BlockID"
        GROUP BY "BlocksetEntry"."BlocksetID"
    ) B
ON
    "A"."ID" = "B"."BlocksetID"

Your output and run time will be longer, but here’s what I got with that on small random DB I tried query on:

The key thing that makes analysis more possible is your belief the VMDK didn’t change yet its backup did. What I was hoping for was 32KiB block size, and we lost a block. I’m not sure what else is done at 32KiB.

BlocksetEntry table has all the Block entries for a given BlocksetID. Did bad DB have any changes there? Unfortunately this is a big file, but you could use regular table highlighting (e.g. control-left-click) then Copy for comparison in your favorite text line comparison program, or fc if you don’t have anything better to use.

There’s an sqldiff.exe program available if you want to get fancier than the more manual compare above…

EDIT:

looks like it’s from the below. I’m not familiar with repair code (what are the chances anyone available is?), however given that index files index block files, I’d guess that the index file is skipped because its dblock is being deleted. In the DB, looking for signs of that in RemoteOperationTable might be good. Check its time, which might give a clue as to whether or not it may be part of whatever chain of events led to your issue…

github.com

duplicati/duplicati/blob/cd27a175000932209d6af415000493d9c4af68cb/Duplicati/Library/Main/Operation/RepairHandler.cs#L246


      
                  }

                  

                  // All checks fine, we accept the new index file

                  Logging.Log.WriteInformationMessage(LOGTAG, "AcceptNewIndexFile", "Accepting new index file {0}", n.File.Name);

                  db.RegisterRemoteVolume(n.File.Name, RemoteVolumeType.Index, size, RemoteVolumeState.Uploading);

                  db.UpdateRemoteVolume(n.File.Name, RemoteVolumeState.Verified, size, hash);

                  continue;

              }

              catch (Exception rex)

              {

                  Logging.Log.WriteErrorMessage(LOGTAG, "FailedNewIndexFile", rex, "Failed to accept new index file: {0}, message: {1}", n.File.Name, rex.Message);

              }

          }

                                  

          if (!m_options.Dryrun)

          {

              db.RegisterRemoteVolume(n.File.Name, n.FileType, n.File.Size, RemoteVolumeState.Deleting);

              backend.Delete(n.File.Name, n.File.Size);

          }

          else

              Logging.Log.WriteDryrunMessage(LOGTAG, "WouldDeleteFile", "would delete file {0}", n.File.Name);

If somehow you find a block that was in VMDK blockset went missing, deletion of something might be why. Comparing Fileset tables would be one way to see if a version of the backup was deleted e.g. by retention.

drwtsn32 · July 29, 2019, 10:59pm

The VMDK last modified timestamp is 6/28/2019 - so it definitely wasn’t recently changed. Maybe Duplicati decided to reprocess it as if the file was touched?

Actually I don’t think so… I checked the backup log for the three closest backups up to when the problem started, and the “size of examined files” was a lot smaller than the VMDK size, so I don’t think Duplicati reprocessed it. Puzzling.

I’ll try your idea of running that SQL query and looking thru the tables… will let you know if I find anything.

Appreciate your efforts!

ts678 · July 29, 2019, 11:08pm

I think one clue of that would be if the File table for its path jumps to a new BlocksetID and/or MetadataID. Logging would also give clues as to what Duplicati was thinking if you had a log at Verbose or better, e.g.:

2019-07-26 16:10:46 -04 - [Verbose-Duplicati.Library.Main.Operation.Backup.FilePreFilterProcess.FileEntry-CheckFileForChanges]: Checking file for changes C:\PortableApps\Notepad++Portable\App\Notepad++\session.xml, new: False, timestamp changed: True, size changed: True, metadatachanged: True, 7/24/2019 5:56:06 PM vs 7/23/2019 9:13:32 PM

What’s most helpful for analysis is a Profiling log with --profile-all-database-queries, but its lots of lines…

drwtsn32 · July 30, 2019, 4:12am

I ran it with a WHERE clause at the end to make it easy to spot the anomalies, and yeah that query confirms the error message:

I checked those BlocksetIDs in the “good” database and was able to confirm that it is the calculated length that changed:

I was not able to copy/paste the entire list of blocks for a careful examination (I’ll try more stuff in a bit), but the number of records returned by this simple query is the same in both the good and bad databases:

Bad db on left, good db on right. The other two BlocksetIDs also show the same number of block ID records between good/bad.

EDIT:

Was able to save BlocksetEntry to CSV by prepending “CREATE TABLE xxx AS” to the SQL query, then using DB Browser to export that new table to CSV.

Ran a compare using Notepad++ Compare plugin… no differences. So the differences must be in the sizes in the Blocks table, I guess…

EDIT2:

Yep I checked the Blocks table and there are differences. Hmmm

ts678 · July 30, 2019, 1:20pm

Great work! (keep going on the “Hmmm” )

I’m not super-expert in DB, but I didn’t think either blocksets or blocks were supposed to “just change size”. As you note, a change in a block will (via math) throw off the calculated size of the blockset. Because large file’s got lots of blocks, I wonder if one could filter blocks that are at default 102400 from a search for shrink. Someone really good at custom-made SQL queries could probably search easily, but that person’s not me.

drwtsn32 · July 30, 2019, 2:17pm

Yeah I don’t know if I’ll look into this more right now… this is above my pay grade.

But it seems to point to a bug somewhere.

ts678 · July 30, 2019, 2:53pm

Take a break, breathe deeply

More seriously, I’m not sure how well-stocked the project is with the right blend of skills and available time. Your materials are the best hope of tracking this rare problem down. Few people have time-series DBs… Posting a series of DB bug reports (to your level of comfort with privacy) can at least preserve something.

drwtsn32 · July 30, 2019, 3:29pm

lol, good advice!

How do I generate a DB bug report? I will keep the databases and other info I collected … can post it if a dev is willing to dig deeper.

ts678 · July 30, 2019, 5:46pm

Creating a bug report however the claim about removing original names doesn’t fully apply to your version. 2.0.4.13 DB bug report code keeps original FileLookup Path intact (privacy regression) #3790 has an idea from me that maybe fixing the regression could just mean adding a table drop. There are also other ideas. You’re very fast at doing pull requests (I’m still figuring it out). Care to do a drop, build, use it, then request?

EDIT: For the series of DBs, I haven’t tried it, but I suspect you could just rename the current one out of the way, copy an old DB where current used to be, and do a bug report from that. Then put the latest DB back.

One other note is that some people say the bug report takes a long time. I don’t know if it’s normal or a bug.

drwtsn32 · July 30, 2019, 8:12pm

Sure, I’ll give it a shot and let you know!

drwtsn32 · July 31, 2019, 3:54pm

Bummer, had this problem happen again today. I will keep the bad database and prior good database (again) for analysis.

I checked compact operations again - there was no compacting on the prior two jobs.

I did notice that there was a fileset deletion due to retention policy. (As there was when I had this issue a few days ago.) Perhaps that is the culprit?

drwtsn32 · August 1, 2019, 1:44am

This time I did a full database recreation (instead of restoring an older backup of the db). Recreate took a while but afterwards everything looks good again. It performed a backup without error.

Going to do some comparisons between the “bad” database and the recreated one when I have more time.