Unexpected difference in fileset version(s)

fixitmjc · October 18, 2023, 3:35pm

Hi,
My backups suddenly began failing with the following error:

Failed: Unexpected difference in fileset version 18: 01/12/2022 12:20:00 (database id: 91), found 114104 entries, but expected 114112

so following the guidance given in other topics, I deleted version 18 and tried again but it still failed

Failed: Unexpected difference in fileset version 14: 01/01/2023 12:20:00 (database id: 120), found 111433 entries, but expected 111441

So I gave the benefit of the doubt and re-ran the delete command, this time on version 14.

Backups still fail with an error report for a shortage on 8 entries, but this time on version 13.

I’m not sure that deleting yet more backup versions is going to solve this - has anyone seen this before?

gpatel-fr · October 19, 2023, 6:00am

Hello

speaking of version, do you use a recent version of Duplicati ? there was a fix for that in well, 2020.

ts678 · October 19, 2023, 10:37am

Beware of old guidance aimed at a now-fixed bug. Infrequent remaining cases may work differently.

Is backup scheduled? Do you have logs for all backups expected? Was failure at end of a backup or beginning of the next one? Also look in About → Show log → Stored for odd things around that time. Finding a maybe-hard-to-spot potential cause would be helpful, but it won’t directly fix things for you.

I’m undecided whether it would be better for Duplicati to study all versions and report all problems, or have you find them one at a time. Full reporting could be a lot if somebody has hundreds of versions.

The good news is the latest version is 0, so there aren’t that many to pick through if it comes to that…

If you like, you can create a database bug report, put it somewhere, and share a link to it in the forum. Perhaps someone (preferably someone whose SQL is better than mine) can find clues or a forecast…

fixitmjc · October 20, 2023, 2:45pm

Sorry @gpatel-fr I should know better. I’m running v2.0.6.3

@ts678 - The report is here - database bug report
In answer to your questions:
Is backup scheduled? YES
Do you have logs for all backups expected? YES
Was failure at end of a backup or beginning of the next one? BEGINNING

look in About → Show log → Stored for odd things around that time:

What I did do was to recover some files from another backup set which was running on a different machine a while ago and now no longer exists.

It could well be that this is what has corrupted things. Is a local database rebuild a wise thing to do?

ts678 · October 21, 2023, 12:29am

Yes and no. A good test that rebuild works (e.g. for disaster recovery) is to get the database path then use OS tools (e.g. a file explorer) to rename it just in case, then see if Repair can rebuild it (although it loses logs because they’re not on Destination). If all goes well and fast enough, you can continue with either DB. What I’m unclear on is what else you did, and why you mention it with that rebuild queston.

I guess what no longer exists is the system, as a non existent backup set wouldn’t restore. Is a backup set the destination files without a database, thus forcing database rebuild? If the rebuild was done in a different job, then it wouldn’t have any impact on the job that went bad, whose database isn’t a rebuild.

I looked at your database bug report and found some other discrepancies. The main slowdown was an SQL query taking three hours, but I trimmed it down manually (with a lot of trial and error). Now it’s fast.

Warning to SQLite users – don’t count on the query planner optimizing out extra columns in subqueries. This might be fixed in 2023-05-16 (3.42.0), with an optimization regression fixed in 2023-09-11 (3.43.1).

What’s your retention and compact setup? I see what might be a manual compact at October 15, 2023 11:01:44 AM UTC, but no job log and no record of what destination changes it did. Database rollback is possibly responsible for the missing info (that’s why log files are safer), but did this fail? Again check in About → Show log → Stored which is where errors from failed operations tend to go. I see a whole set:

There’s a gap from 384 through 390. 391 at October 18, 2023 2:24:48 PM looks like a manual delete, but it’s doing more versions than I would have expected from the information you posted on deletions.

ts678 · October 21, 2023, 11:08am

While waiting for some comment on the history of things (see above post):

Manually making up for incomplete Duplicati study, here’s a versions study.
You have several recent ones that look consistent, and the oldest one also.

352 138168 138168
351 138646 138646
349 138646 138646
348 138871 138881 10
347 137637 137647 10
341 138452 138460 8
336 139076 139084 8
313 136269 136277 8
282 133777 133785 8
253 127572 127580 8
235 125995 126003 8
205 122759 122767 8
178 120695 120703 8
156 119102 119110 8
68 113762 113762

Database ID 68 is October 27, 2022 12:20:00 PM UTC. Is history valuable?
Is daily backup important? Do you have space? You could start a new one.
This one is probably going to take awhile, with some effort on further study.
Duplicati’s tools should be tried, e.g. The LIST-BROKEN-FILES command.
I’m not sure what that does inside, but some of your files look broken to me.
More manual SQL probing is (slowly) possible, or we can check destination.
Sometimes a rebuild of the database will help if destination files are still OK.
If you want to test that, save a copy of current database first, as said above.
There is also a Python script that can check destination and guess if it’s OK.
Technically it works on unencrypted local copies of the dlist and dindex files.

ts678 · October 21, 2023, 7:24pm

I asked if Compact got run manually, and now I’ll ask about Test which looks
like it runs VerifyConsistency which is what makes the complaint being seen.
I know Compact has been a source of pains, but the exact sequence is hard
to see from the bug report because logs were deleted to protect user privacy.

I’m guessing Duplicati won’t be able to name which files were broken, as the
highest level problem is FilesetEntry referencing nonexistent File table rows.

SELECT * FROM “FilesetEntry” LEFT JOIN “FixedFile” ON “FilesetEntry”.“FileID” = “FixedFile”.“ID” WHERE “FilesetEntry”.“FilesetID” = 205 AND “FixedFile”.“ID” IS NULL

205 is special because SQL wouldn’t complete, unlike with prior ID values.
Turns out that query planner (that again) chose to scan rather than search:

Problem was fixed by adding an index to FixedFile (originally the File view):

One symptom of a scan is in an enormous number of database reads shown
by tools such as Sysinternals Process Monitor or I/O Reads in Task Manager.
This might not actually show up at the drive level because the OS soaks it up.

gpatel-fr · October 21, 2023, 7:37pm

I dont understand what you mean by this. Surely it can’t be that you Frankensteined a backend with files from 2 different backups ?

ts678 · October 21, 2023, 9:21pm

looks a little earlier than the question about Compact, then the Test. The log situation looks like this:

so I’m wondering (and asked about) whether something didn’t complete. Check Operation UTC time:

380 October 12, 2023 12:20:00 PM
381 October 13, 2023 12:20:01 PM
382 October 14, 2023 1:55:24 PM
383 October 15, 2023 10:53:30 AM

Notice that 382 made no log, which makes me wonder what happened in there but we may not know.
383 log looks like it’s cleaning up from 382 not finishing well, and need increases for clues on history:

"Messages":[
"2023-10-15 11:53:30 +01 - [Information-Duplicati.Library.Main.Controller-StartingOperation]: The operation Backup has started"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  (3.91 KB)"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-KeepIncompleteFile]: keeping protected incomplete remote file listed as Uploading: duplicati-20231014T135524Z.dlist.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-SchedulingMissingFileForDelete]: scheduling missing file for deletion, currently listed as Uploading: duplicati-bd2722d010ec748b58425f8a98c0a07a7.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-SchedulingMissingFileForDelete]: scheduling missing file for deletion, currently listed as Uploading: duplicati-ba6332017268d4c9dadc0eea95a05a083.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-SchedulingMissingFileForDelete]: scheduling missing file for deletion, currently listed as Uploading: duplicati-b035645e1ba5b44019bbb8e4b9e3a4ada.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Temporary: duplicati-b668e45677e9b4a63a34ae349df445f05.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Temporary: duplicati-b86ba3f6b522943cfbca7a126e153d8c9.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-b178bc563fec146e29150c34b5881af4c.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-bad55f0e957ea46f383ea80d0f6631efa.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-b976063604c42429595ec7fc2d7e81119.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-bac69f4529b2642c29173566138f7356e.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-be71cdb6bec414d95bc097820f75d0112.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-ba2f8108c05df4b0fa3addf23ea4b72b3.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-bd82c252457964df68af57268b9e28967.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-bb1a85b7fa0a14fc4a24caf258311004c.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-b2e1e98f014da4c8ea553c83b4ed7c5ce.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-b98ce13ac37e44b0187b31a57ac41523b.dblock.zip.aes"
"2023-10-15 11:54:13 +01 - [Information-Duplicati.Library.Main.Operation.FilelistProcessor-RemoteUnwantedMissingFile]: removing file listed as Deleting: duplicati-b55568858a2d64f4aac7286036c44deda.dblock.zip.aes"
],"Warnings":[],"Errors":[]

EDIT 1:

RemoteOperation table for 382 looks odd. Too few dindex and no dlist at the end (did it error early)?

EDIT 2:

Although database records can’t always be trusted due to potential transaction rollback, check files
from ID 382 at the list at the start of 383. The Data value can be put in a spreadsheet, and sorted.

{“Name”:“duplicati-bd7d2cc7a46e24b9b8031c777f6231842.dblock.zip.aes”
{“Name”:“duplicati-ib65bc38682b844e598c8a6c4e3fc87cd.dindex.zip.aes”

are the answer of what’s on Destination and dated Oct 14. What kind of storage type is destination?

EDIT 3:

It looks like some network (probably Internet) connected service, going down and killing the backup.

This is visible in the full-width view of the RemoteOperation table where one can see the put retries:

Retries are done using a different name, so you have to look at the Size and Hash to see the retries.
Default number-of-retries is 5, and when the last one fails, you’re done which should cause a popup.
About → Show log → Stored should also show it. Again asking for some information on runs history.

The network failure allowed two files up on Oct 14, as previously named – one dblock and its dindex.
The dindex put was at October 14, 2023 1:57:31 PM UTC, and it looks like local time is an hour later.
The last attempt at a put was October 14, 2023 2:15:47 PM UTC, so check server log a little beyond.

The Restore dropdown of versions should show a partial backup on Oct 14 despite likely early failure:

Backup 383 while it’s trying to clean up the mess from 382 uploads a synthetic filelist of work so far, basically what backup 381 left for 382 plus whatever 382 managed to pick up before its early death.

The list at the start of 383 showed no dlist file from 382. dlist is late, and network had already failed.
383 cleanups from 382 mentioned “listed as Uploading: duplicati-20231014T135524Z.dlist.zip.aes”.
Interestingly, 383 on Oct 15 uploads the dlist for 382 late at the same second that it uploads its own:

The hhmmss went from 135524 to 135525. This is typical of a replacement dlist (unlike dblock plan).

ts678 · October 24, 2023, 12:12pm

@fixitmjc

Second sentence needs to be clarified. Nobody has been able to understand exact steps behind that.

Still need the answer from the first sentence, ideally extended to the point of bug report’s preparation. Without that, combing through the bug report (many actions from wherever it first failed) is guesswork.

I mentioned it looks like backup at 382 acted like it had a network go down (what is destination type?), then 383 tried to clean up. After that, there is a huge logs gap while you do things with unknown result.

I added a column with dates and times to try to stir up some recall of what was done, why, and results.

For things that did not make a regular job log entry (perhaps due to an error), try the server log instead.

You can open About → Show log → Stored and copy lines back to time of failure as a starter technique. Sometimes lines can be expanded, but that makes them large, so starting with a summary may be best.

The two delete operations (the first was seemingly multiple-version, with possible compact at the end) actually did make logs, but they got deleted for privacy reasons. Maybe some source paths were seen?
Maybe you recall how they went, or could look at their job logs? If abnormal, I might ask for details later.

First question is when problem was first seen. After that comes whatever can be said about steps since.

Thank you for any hints. I’m working hard on this but there are too many gaps that only you can help on.

fixitmjc · October 30, 2023, 4:49pm

Wow, I’m very grateful @ts678 for all the time you have devoted to this.
I must apologise for the gap in communication, but I was working away from home for the last week.

I will try to answer you as fully as I can. Your suppositions are mostly spot-on.

My storage is on an online S3 storage solution. Backup ID 382 on 14th October suffered from a local internet outage of about an hour, whilst it was working causing that failure.

The next morning, I noticed an alert for this and I manually started the normal daily backup early (1053 UTC) whilst watching for errors. It seemed to complete and all seemed reasonable after what had happened the previous day. I then selected compact as it seemed like a good idea at the time to tidy up. As I recall, it didn’t say that it failed but it didn’t say it made any changes either!

ID 385 (1120 UTC) is the scheduled daily backup, which started at its pre-set time. It didn’t take long as there wasn’t much to do, not surprisingly.

Here is the relevant About → Show log → Stored entries:

log_data_from_server_duplicati.zip (3.0 KB)

They go back to the 14 October but there are no entries before this still available. Since the 16th October, the errors are all of the “unexpected difference in fileset” type that I originally wrote about, beginning with ID 386. On 18th October I deleted the two fileset versions (14 & 18). The other entries are repair attempts that didn’t resolve the issue.

@gpatel-fr I shall now try to explain what I was doing regarding “recovery from another backup”. I can see now it was about the worst possible day I could have chosen, as Duplicati was only just recovering from the error it had on the previous day!

A few months previously, I had a Raspberry Pi on my network working as SMB fileserver, with hard drive storage through USB. This was backed up using a local install of Duplicati with an online S3 target. The disk began to fail so most of the wanted data was copied off and the Pi and the disk declared redundant.

Recently it was realised that a few files had not been copied over before it was decommissioned. Since the S3 backup target still existed, I wanted to restore these files and so I used the windows Duplicati instance which now has the problem.

To do it, I simply selected Restore / direct restore from backup files and entered all the relevant details. After a few attempts to get all the parameters correct, I managed to get a list of files, select the wanted ones and restore them onto the local PC. That’s it.

Some other points from the contributions above:

I have tried LIST-BROKEN-FILES command but it simply returns
Listing remote folder …
Return code: 0
So I guess there are no broken files?

The versions study is interesting with the three recent ones seeming consistent, and the oldest one also.

352 138168 138168
351 138646 138646
349 138646 138646
348 138871 138881 10
347 137637 137647 10
341 138452 138460 8
336 139076 139084 8
313 136269 136277 8
282 133777 133785 8
253 127572 127580 8
235 125995 126003 8
205 122759 122767 8
178 120695 120703 8
156 119102 119110 8
68 113762 113762

I’m not terribly worried at losing intermediate versions (I use smart backup retention) so I propose to delete all those with missing entries as the best course of action. What I would like to know is how to produce this table so that I can keep an eye out to see if discrepancies start again. Am I looking at using an SQL database tool?

Thanks again to both of you.

gpatel-fr · October 30, 2023, 5:17pm

Right, I don’t think this has anything to do with your problem. So it must have come from your network problem. Normally a network hang should result in an aborted job and database structure should be mostly right (not fully good maybe but recoverable). If the abort has been hard, such as a reboot or a loss of power while the database was active (that is, not resulting of a timeout in normal Duplicati code), it can be more difficult. Database transactions can protect only when the application is closed in a regular way, for a database such as Sqlite.
If such a hard crash happened, maybe it could be worth it to check the database at the Sqlite level: download the Sqlite client from sqlite.org, close Duplicati, open the database (lookup the path first in Duplicati), and run

pragma integrity_check ;

fixitmjc · October 30, 2023, 6:50pm

pragma integrity_check ;

returns ok so the database probably isn’t damaged. Thanks @gpatel-fr .

ts678 · October 30, 2023, 8:50pm

From some quick prior testing, it seemed like the lack of log is normal if compact isn’t necessary then, however the server log (what a mess) shows things going wrong after 384 compact 10/15 12:01 local. Unexpected difference showed up 10/16 at its beginning. Suspects include compact or backup at 383 which had quite a mess to tidy up. I posted 20 quantity-limited log messages, but 64 more got tossed.

One thing I want to look at more is whether compact runs the check for “Unexpected difference” if not, things might have been wrong going into it, and so compact may or may not have made things worse.

There was a compact on 10/10 as part of a backup, so possibly manual compact didn’t find work to do.
Comparison of the destination list at start and end of 383 backup didn’t find any not-yet-seen delete, however as a general concept I’d think damage to old backup versions would suggest some sort of an improperly coordinated (or maybe transaction rollback damaged) removal or mistracking of a dblock…

There are still a lot of gaps, so I’m not sure the exact sequence of events here will ever be identifiable. This (and several similar incidents) does suggest that maybe certain network failures can make these. Simple failures don’t seem to be enough, so it might be some combination of things (and hard to find).

It was hand made. I hope it’s right. Watching it in advance isn’t too helpful, as first error stops Duplicati. Perhaps an enhancement would be to do a more comprehensive report before finally bailing out of test:

github.com

duplicati/duplicati/blob/706c64a89954d70d6a0977228cbe150ba6f0d745/Duplicati/Library/Main/Database/LocalDatabase.cs#L817-L832


      
          foreach (var filesetid in cmd.ExecuteReaderEnumerable(@"SELECT ""ID"" FROM ""Fileset"" ").Select(x => x.ConvertValueToInt64(0, -1)))

          {

              var expandedCmd = string.Format(@"SELECT COUNT(*) FROM (SELECT DISTINCT ""Path"" FROM ({0}) UNION SELECT DISTINCT ""Path"" FROM ({1}))", LocalDatabase.LIST_FILESETS, LocalDatabase.LIST_FOLDERS_AND_SYMLINKS);

              var expandedlist = cmd2.ExecuteScalarInt64(expandedCmd, 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID, filesetid);

              //var storedfilelist = cmd2.ExecuteScalarInt64(string.Format(@"SELECT COUNT(*) FROM ""FilesetEntry"", ""FileLookup"" WHERE ""FilesetEntry"".""FilesetID"" = ? AND ""FileLookup"".""ID"" = ""FilesetEntry"".""FileID"" AND ""FileLookup"".""BlocksetID"" != ? AND ""FileLookup"".""BlocksetID"" != ?"), 0, filesetid, FOLDER_BLOCKSET_ID, SYMLINK_BLOCKSET_ID);

              var storedlist = cmd2.ExecuteScalarInt64(@"SELECT COUNT(*) FROM ""FilesetEntry"" WHERE ""FilesetEntry"".""FilesetID"" = ?", 0, filesetid);

          

              if (expandedlist != storedlist)

              {

                  var filesetname = filesetid.ToString();

                  var fileset = FilesetTimes.Zip(Enumerable.Range(0, FilesetTimes.Count()), (a, b) => new Tuple<long, long, DateTime>(b, a.Key, a.Value)).FirstOrDefault(x => x.Item2 == filesetid);

                  if (fileset != null)

                      filesetname = string.Format("version {0}: {1} (database id: {2})", fileset.Item1, fileset.Item3, fileset.Item2);

                  throw new Interface.UserInformationException(string.Format("Unexpected difference in fileset {0}, found {1} entries, but expected {2}", filesetname, expandedlist, storedlist), "FilesetDifferences");

              }

          }

ts678 · October 30, 2023, 9:08pm

There’s always a chance (with a damaged backup) things could go wrong. Worst result I guess is loss of the versions that seem good at the moment. Although this has been quite a deep dive so far, more diving would be possible if you want to either have the Python script look at your dlist and dindex files, or simply save your current database (rename is probably good) and see how Repair does on giving a clean one. If it gets to past about 70% on the progress bar (especially after 90%), there’s data missing, so it will be slow as it eventually downloads all dblocks to try to find data not in a dindex (as is normal).

Error: The socket has been shut down was another temporary communication error, but a DB recreate painfully built a database. What made it painful was the need to get dblock files. A special build helped.

fixitmjc · October 31, 2023, 6:53pm

We’ve rather come to the end of this now, due to my oversight.

I initiated a commandline/delete command for the known bad versions but forgot to check the dates of the ones that I wanted to remain, with reference to the current date. The Smart Backup settings meant that it also removed all but one of those that I was intending to keep, as they were now outside of its keep filter!

There was one good backup remaining, from the day after the internet failure that precipitated all this.

I have now done an incremental backup (the first for 2 weeks) which proceeded normally. I think I will configure a complete new backup though, just in case, when I have some time.

I’m grateful for all the top quality support. The errors during this have all been mine, but I’ve learned some useful lessons on how it works. Thank you.

Robert_Spierenburg · November 22, 2023, 11:41am

What was the final solution to this? I have the same thing happening every since I upgraded to Duplicati - 2.0.7.1_beta_2023-05-25?

ts678 · November 22, 2023, 6:48pm

If you’re asking original poster, we’ll have to see if anything more can be said.

If you’re asking if whatever the mystery was got solved – no it’s still a mystery.

So instantly? I haven’t heard that before. Regardless, feel free to open a topic.

gpatel-fr · November 22, 2023, 8:03pm

The original post was about a previous version.