Unexpected difference in fileset (why must the topic be unique)

Since my last update (2.0.7.1_beta), to which I was very late installing, I have had an issue with “Unexpected difference in fileset”. Whenever I get this my usual process is to just delete the offending version. This does not work anymore. When I delete a version it just jumps to the next one in line and says that one has the problem and continues no matter how many versions I delete… So far my only solution has been to completely delete the backup and start over. Now it has happened. I have a backup that is a little over 700gb is size and is far away through sftp. Rebuilding the database is not an option as that will take weeks to months to complete (based on previous experience). I tried all the tricks I know (not very many). list broken files and purge broken files does not have any results. Finally I got desperate and installed “duplicati-2.0.7.101_canary”. When I run the backup it errors out with this…

2024-03-31 22:31:58 -07 - [Error-Duplicati.Library.Main.Operation.BackupHandler-FatalError]: Fatal error
UserInformationException: Unexpected difference in fileset version 43: 1/9/2024 11:07:40 AM (database id: 768), found 201032 entries, but expected 201036
Unexpected difference in fileset version 42: 1/10/2024 10:03:11 AM (database id: 769), found 200397 entries, but expected 200401
Unexpected difference in fileset version 41: 1/11/2024 10:30:02 AM (database id: 770), found 200634 entries, but expected 200638
Unexpected difference in fileset version 40: 1/12/2024 10:06:13 AM (database id: 771), found 200674 entries, but expected 200678
Unexpected difference in fileset version 39: 1/13/2024 7:35:32 AM (database id: 772), found 200545 entries, but expected 200549
Unexpected difference in fileset version 37: 1/15/2024 9:00:00 PM (database id: 774), found 200993 entries, but expected 200997
Unexpected difference in fileset version 36: 1/16/2024 9:00:00 PM (database id: 775), found 201062 entries, but expected 201066
Unexpected difference in fileset version 35: 1/18/2024 9:40:22 AM (database id: 776), found 200714 entries, but expected 200718
Unexpected difference in fileset version 34: 1/19/2024 7:56:36 AM (database id: 777), found 201975 entries, but expected 201979
Unexpected difference in fileset version 33: 1/20/2024 9:00:00 PM (database id: 778), found 201421 entries, but expected 201425
Unexpected difference in fileset version 32: 1/22/2024 9:00:00 PM (database id: 779), found 201883 entries, but expected 201887
Unexpected difference in fileset version 31: 1/23/2024 9:00:00 PM (database id: 780), found 201569 entries, but expected 201573
Unexpected difference in fileset version 30: 1/24/2024 9:00:00 PM (database id: 781), found 201623 entries, but expected 201627
Unexpected difference in fileset version 29: 1/25/2024 9:00:00 PM (database id: 782), found 202436 entries, but expected 202440
Unexpected difference in fileset version 28: 1/26/2024 9:18:16 PM (database id: 783), found 202099 entries, but expected 202103
Unexpected difference in fileset version 27: 1/27/2024 9:00:00 PM (database id: 784), found 202149 entries, but expected 202153
Unexpected difference in fileset version 26: 1/29/2024 9:00:01 PM (database id: 785), found 202061 entries, but expected 202065
Unexpected difference in fileset version 25: 1/30/2024 9:00:00 PM (database id: 786), found 202572 entries, but expected 202576
Unexpected difference in fileset version 24: 1/31/2024 9:00:00 PM (database id: 787), found 202279 entries, but expected 202283
Unexpected difference in fileset version 23: 2/2/2024 7:48:22 AM (database id: 788), found 202312 entries, but expected 202316
Unexpected difference in fileset version 22: 2/5/2024 10:13:03 AM (database id: 789), found 202402 entries, but expected 202406
Unexpected difference in fileset version 21: 2/5/2024 9:00:00 PM (database id: 790), found 201479 entries, but expected 201483
Unexpected difference in fileset version 20: 2/6/2024 9:00:00 PM (database id: 791), found 201627 entries, but expected 201631
Unexpected difference in fileset version 19: 2/7/2024 9:00:00 PM (database id: 792), found 201605 entries, but expected 201609
Unexpected difference in fileset version 18: 2/9/2024 7:36:34 AM (database id: 793), found 201683 entries, but expected 201687
Unexpected difference in fileset version 17: 2/9/2024 9:11:17 PM (database id: 794), found 201649 entries, but expected 201653
Unexpected difference in fileset version 16: 2/10/2024 9:00:00 PM (database id: 795), found 201726 entries, but expected 201730
Unexpected difference in fileset version 15: 2/12/2024 9:00:01 PM (database id: 796), found 201898 entries, but expected 201902
Unexpected difference in fileset version 14: 2/14/2024 7:54:14 AM (database id: 797), found 201946 entries, but expected 201950
Unexpected difference in fileset version 13: 2/15/2024 8:01:40 AM (database id: 798), found 232464 entries, but expected 232468
Unexpected difference in fileset version 12: 2/15/2024 9:00:00 PM (database id: 799), found 234106 entries, but expected 234110
Unexpected difference in fileset version 11: 2/16/2024 9:00:00 PM (database id: 800), found 235199 entries, but expected 235203
Unexpected difference in fileset version 10: 2/17/2024 9:00:00 PM (database id: 801), found 235254 entries, but expected 235258
Unexpected difference in fileset version 9: 2/20/2024 12:21:36 PM (database id: 802), found 235265 entries, but expected 235269
Unexpected difference in fileset version 8: 2/20/2024 9:00:00 PM (database id: 803), found 236450 entries, but expected 236454
Unexpected difference in fileset version 7: 2/22/2024 8:51:39 AM (database id: 804), found 236543 entries, but expected 236547
Unexpected difference in fileset version 6: 2/22/2024 9:00:00 PM (database id: 805), found 236879 entries, but expected 236883
Unexpected difference in fileset version 5: 2/23/2024 9:00:49 PM (database id: 806), found 236891 entries, but expected 236895
Unexpected difference in fileset version 4: 2/26/2024 10:53:15 AM (database id: 807), found 236896 entries, but expected 236900
Unexpected difference in fileset version 3: 2/26/2024 9:00:00 PM (database id: 808), found 231873 entries, but expected 231877
Unexpected difference in fileset version 2: 2/28/2024 8:54:13 AM (database id: 809), found 232111 entries, but expected 232115
Unexpected difference in fileset version 1: 2/28/2024 9:00:00 PM (database id: 810), found 236205 entries, but expected 236209
Unexpected difference in fileset version 0: 3/1/2024 9:28:29 AM (database id: 811), found 237161 entries, but expected 237165

I went back in the logs to the first day this happened and found this…

System.AggregateException: One or more errors occurred. ---> System.AggregateException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at CoCoL.AutomationExtensions.<RunTask>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<FlushBackend>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
   --- End of inner exception stack trace ---
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
   --- End of inner exception stack trace ---
   at CoCoL.ChannelExtensions.WaitForTaskOrThrow(Task task)
   at Duplicati.Library.Main.Controller.<>c__DisplayClass14_0.<Backup>b__0(BackupResults result)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.Backup(String[] inputsources, IFilter filter)
   at Duplicati.Server.Runner.Run(IRunnerData data, Boolean fromQueue)
---> (Inner Exception #0) System.AggregateException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at CoCoL.AutomationExtensions.<RunTask>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<FlushBackend>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
   --- End of inner exception stack trace ---
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
---> (Inner Exception #0) System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at CoCoL.AutomationExtensions.<RunTask>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<FlushBackend>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()<---

---> (Inner Exception #1) System.AggregateException: One or more errors occurred. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at CoCoL.AutomationExtensions.<RunTask>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<FlushBackend>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()
   --- End of inner exception stack trace ---
---> (Inner Exception #0) System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at CoCoL.AutomationExtensions.<RunTask>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<FlushBackend>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.BackupHandler.<RunAsync>d__20.MoveNext()<---
<---
<---

What steps can I take to repair this? I am a tech but not experienced with Duplicati beyond doing basic commands through the GUI.

→ Merlin

What were you on before? If something got worse since previous Beta, we can review Canary releases for possible causes. Is this something regular enough that you would be willing to run older Canary (between Beta) while attempting to see where behavior changed? We also really need to learn what sets these off, which means I’ll be asking for a lot of situation info, logs, etc.

As this is not a new problem (despite your worrisome report), it has never worked for everybody, and there have been cases where people kept on deleting, so it took a lot of time to run through.

The error always existed in the next one in line, but the reporting stopped on first hit. It now gives all the bad versions at once, so you know how easy (or difficult) it will be to do the one-at-a-times.

Certainly not a good solution. This also sounds alarmingly common to you, so more history to ask.

Choosing sizes in Duplicati covers things to tune. Default blocksize of 100 KB is too small for you, and probably adds to the slowness, although a badly damaged backup does terribly on Recreate.

Canary helps this a bit, if it gets to 70% on the progress bar, or About → Show log → Live shows dblock downloads (searching for block that some source file needs that went missing somehow).

Do you recall any details such as described above, run any performance tools, have other clues?

For example, a very busy hard drive suggests one situation, while a core worth of CPU busy says another. Viewing About → Show log → Live → Profiling for slow SQL would be a direct approach.

Performance problems generally result from lack of some resources, and tools help identify what.

What OS is this? I’m very familiar with what Windows has, but Linux tools vary too much to name.

Hard drive? SSD? RAM disk? Good CPU and RAM? Runnable awhile if it’s needed for recovery?

Link speed and reliability? Does it drop out much during Duplicati use? That can cause problems. Sometimes one can retry a lot to ride through. Defaults might be low, if a link gets in trouble a lot.

As mentioned, that’s the full report on your situation, while previous Duplicati left you blind about. How many versions do you have? You got more bad versions than I have as my total versions…

If that’s nearly your entire backup, there might be little point in version deletes. If there’s history in older versions that you want to keep, there might be a reason to try to rescue what’s here. At this point it gets into your personal preferences. There is probably no risk-free approach, but value of historical backup is unknown to me, as is need to keep current on routine backups going forward.

Is there enough space at some destination to keep old backup, in case an old version is needed?

Duplicati.CommandLine.RecoveryTool.exe

This tool can be used in very specific situations, where you have to restore data from a corrupted backup.

because it’s forgiving of damage, while regular Duplicati self-checks lots, so may give complaints.

The complaints suggest a somewhat regular schedule. Please check logs on the first failure and backup before. Beta tends to not log failures in usual log (fixed in Canary), so you look in server log at About → Show log → Stored for things that might have led to the error that was seen later.

Got any extra systems, USB storage, database skills, fast SFTP tools, willingness to run Python?

EDIT:

I’m fishing for preferences and resource surpluses or shortages to try to figure out the best plan which can then be tried. Even if you don’t try for recovery, I’d like to understand why you’ve had problems before, and particularly what might have led you to get in this problem unusually often.

Greetings!
Thank you for taking the time to respond to my post, and thank you for going through it in detail. I naturally have angry bitch voice… so my responses below may not translate well into text. I would like to make it clear that your effort and time is appreciated. If I come off sounding bad, it is not my intention or actual intended tone.

What Canary releases, I said this is happening in the CURRENT BETA.

I am aware of that. Which is why I am here… asking… for a… solution… (:

Respectfully, I do not know what this means… I do not know how this helps me. I do not know what to do with this information.

I am not sure what you are trying to find out or accomplish here. Duplicati takes weeks to months to complete through sftp over the Internet on every single computer I have ever run the rebuild process on that was over a few hundred gig in size. I was under the impression that this was just an understood unavoidable thing. All computers are idle and have 16GB+ of RAM with high speed Samsung SSD’s. And actually so does the remote backup drive. All Duplicati clients are on Windows 10 Pro or 11 Pro.

The block size is currently WAY larger than 100 KB… it is currently at the default of 50mb. I see nothing in that page about how to smoothly change from 50mb to something larger. I do not feel that changing to something larger than 50mb would have any sway on the current bug/situation I am dealing with.

My connection, where the SFTP Server is located is VERY stable. I am on a gigabit fiber connection. I cannot speak to the remote location where Duplicati is. When I remote the machines their connections feel solid. Speed tests return acceptable speeds. I can say that I have seen this happen on about 5 computers now… and they were all in different locations using different connections and providers. And only after my last upgrade to 2.0.7.1_beta.

You do not think it is odd that it would suddenly find a problem (missing block or files or whatever it is) with every single version?

42 versions… it is all right there in the OP.
There is something wrong with Duplicati.
As stated… I have more computers that have had this problem in the past and I have another computer that CURRENTLY has this issue. Its backup is very small (27gb) and I can easily just delete it and recreate it. That computer also has a secondary local backup. So I can take my time fooling with it as long as I can get them to turn on the computer for me. At the moment I am going to export their backup data and copy it to a local machine to see if I can get it onto a machine here and work on it locally to figure out a process that I can run on the larger backup… (I am talking about the 27gb backup that is broken)

Please see the Original Post. It is there. You have everything that I have.

  • I know how to run an SQL Query but not on Duplicati. And I am not super experienced with building query’s.
  • I do not have extra systems on the remote site. And I cannot lock down the server and any computers for months while a rebuild runs.
  • I do not know what a “Fast SFTP tool” is. I was under the impression that all SFTP ran at the speed of the internet they were passing through. I typically use WinSCP.
  • I am not afraid of installing python and running commands. I just do not know python.

I am going to ask the client to buy a USB drive so that I can setup a local backup while I sort this out.
The SFTP server is a docker container running on an UNRaid linux box.

If you, or anyone, could suggest some steps I could take to figure this out and resolve this… I would appreciate it.

UPDATE!

On the two computers that are currently having this issue… which we will refer to as “KC-02” (758gb) and “LJB-01” (27gb) I am zipping up their AppData folders and transferring them here where I will set them up on a local computer. This will give me the ability to work on them at will and as well rebuild the database on the larger backup in a timely manor. However, I very much would like to figure out clear steps I can take to resolve this without a full database rebuild (If that even solves this).

but to find where it got in, one would need to figure out which Canary between Betas added it.
The current Beta is identical to Canary just before. Any release is cumulative. Not all get Beta.
If you’re sure Beta got worse, that’s how to track it down. But maybe something else changed.

Look over release history if it’s still unclear how Canary releases occur between Beta releases.

If, for whatever reason, historical Canary cannot be run, that’s how it is, so we’re blind on when problem got in (if you’re going to keep reporting – FWIW I’m not seeing it, and reports are few).

What part? I’m sure you’ve seen progress bars before. Duplicati puts them on the home screen, however this one appears at the top of the screen during database recreate, moving left to right.
The 70% number is how far of the total distance to the right it got. That’s where dblock reading is starting. If you look at your destination, these are the big files containing the source backup data, which is why you don’t want them all downloaded. Each dblock has an index called a dindex that says what’s in the dblock, thus no slow search (last 10% is exhaustive) of dblock files is required.

If you were watching the progress bar during your slow DB recreate, and it was the same backup (without any fresh start), and it got absurdly slow at the end, it probably will if you Recreate again.

It helps figure out which is the best path, and which path might best be avoided.

Performance bottlenecks that can be removed in order to make things run faster.

Maybe, maybe not. The blocksize does make a difference, as do the file counts.

You’re likely confusing Remote volume size (a.k.a. dblock-size) with blocksize.
The GUI unfortunately highlights the former and doesn’t yet highlight the latter…

Blocksize is an individual data block, e.g. source file data. A dblock is a package
containing many blocks, because you wouldn’t want to do SFTPs of 100 KB files.

The one that slows large backups is blocksize, because too many blocks is slow.
SQL processing scales up badly with big tables. Fewer blocks keeps sizes down.

They would not likely have been idle during recreate. It’s too late if you didn’t look.
Performance issues need to be looked at when they’re happening, or it’s a guess.

Do you have them report backup errors anywhere, short of big errors like this one?
Duplicati retries transfers 5 times by default. If it still can’t get through, it’s an error.
Errors in current Beta tend not to mak job logs (fixed in latest Canary), but popups
potentially would alert the user (assuming they look at web interface occasionally).

Asking because network failures at the wrong times have caused problems before.

Assuming you mean in the version report, versions share a great deal of their data.

Features

Incremental backups
Duplicati performs a full backup initially. Afterwards, Duplicati updates the initial backup by adding the changed data only. That means, if only tiny parts of a huge file have changed, only those tiny parts are added to the backup. This saves time and space and the backup size usually grows slowly.

Deduplication
Duplicati analyzes the content of the files and stores data blocks. Due to that, Duplicati will find duplicate files and similar content and store them only once in the backup. As Duplicati analyzes the content of files it can handle situations very well if files and folders are moved or renamed. As the content does not change, the next backup will be tiny.

What this means is that newer versions build on older ones, so data loss in an old version can affect a new version as well. Of course, such data loss happening at all is the question here…

You posted versions with errors. I want to know the total versions, or do you mean all are bad?

To turn the tables a bit, how does that help me? I’m looking for your help to track down an error.

which doesn’t do anything to help understand the problem. Logs, as mentioned, give some clue. Smaller backups can also do database bug reports in a reasonable time. Big ones get too slow. Unfortunately, Canary has improved sanitization, and maybe Beta would be best done with PM.

Turning computers off during Duplicati work is another way to possibly hit error handling issues. Ideally it should handle all of that. In reality, one sometimes must find the sensitive places to fix.

This is why I was poking for history leading up to the problem, but remote third parties are hard. Sounds like you can remote in and look around, if computer is up, but still harder than local PC.

If you mean a job export, you probably also need their database, as seen on Database screen. The usual job logs are in there, but About → Show log → Stored lives in Duplicati-server.sqlite.

So maybe we don’t know the total backup versions yet. So it goes until you grab a look or a DB.

I would try to invent something for you, but it’s uncharted territory as you want to avoid recreate.
General idea is to try to fix the inconsistency in the database with SQL, delete remote files, and have Repair replace them. The latter works fine on an intact database. Yours are a bit unknown.

Is this the SFTP server at your local site, containing encrypted backups which you can decrypt?

It’s a question of parallelism. The Internet runs at a certain speed for a given transfer, but speed tests and other maximizing-throughput things do parallel transfers. I don’t know if WinSCP does, however it doesn’t matter if the destinations for Duplicati is local to you (still not sure if that’s so).

I have a script that tries to forecast slow recreate by looking at dindex files (explained earlier) to see if they provide all the blocks needed by the source files (information on source is in its dlist). Recreate that gets past 70% on progress bar is reading dblock files, and that slows things down.

Expert advice is very much welcome to help me out, but recently has been in rather short supply.

I’ve given a lot of specific steps involving logs. Databases and bug reports of them can help a lot, although large backups get really big and slow because their databases are big and slow as well.

Databases are pretty portable, so if you can get one, you can just drop it into a dummy job for DB bug report creation. Some other info on the host system will be wrong, but that’s not the key part.

So you likely got the server database and all the individual job databases, and might have all logs to examine. That’d be a good place to start, but possibly bug reports will eventually be necessary.

Installing latest Canary purely to make better sanitized bug reports might also be possible, and a lot simpler than asking clients to run something not recommended for production until well-tested.

The usual tools such as Repair and purge-broken-files don’t seem to fit well, so it may be kind of invented and tried as described above. Expert help in Duplicati and SQL would certainly help out.

It has been a long time since I even attempted to recreate a database on a remote backup. But if memory serves it got stuck somewhere between 75% and 90%… But slow is slow and so remote rebuilds are out of the question. I am in the process of rebuilding the smaller database locally now to see if that resolves the problem. And I am also transferring the 700gb+ backup to a local SSD to rebuild locally then I will move the AppData folder back to the remote machine. But I cannot keep doing this.

At this point no paths have been recommended and all paths I have tried have been fruitless… Only path I have left is rebuilding or starting from scratch. I will know shortly if rebuilding is even an option.

I have yet to find directions on how to easily change the blocksize on an existing backup.

Yup you were correct I did not notice there are 2 separate settings… And I know nothing about setting that up what math to use… or how to convert an existing backup. I read those pages and did not find that info in there.

I have central reporting setup and it does retain error information.
The source of the problem could be anything from internet issues to the client turning off the computer during a backup. The problem is… that the problem is weaving itself throughout every version.

All bad every time it happens. That is what is new in this beta.

It is something new between the beta releases. That is what I am saying.

No… I meant I turned off the backup, copied the entire duplicati data folder, then transferred it to a computer in my office where i can toy with it directly. I have backed up the backup folder as well. This will allow me to do anything I want and then return it to its original broken, or fixed, state and put it back on the machine. I have done this now for both the currently affected machines.

Yes… And as of this moment I have copied those backup folders off the SFTP server to a local SSD as well as the Duplicati config folder from the remote machines. So I now have everything here. On a spare workbench machine, And I can decrypt or do whatever I need if needed. I also have now setup both remote machines with local backups to hold them over.

I will report back shortly with the results of the recreate database on the smaller backup.

I should note I have these settings (and more) on all machines…
–rebuild-missing-dblock-files=true
–retry-delay=60s

UPDATE!

Delete and Recreate did resolve the problem on the smaller backup. So… to me… this means it is a local problem with the local database. I also did a test restore of a file and that succeeded as well.

Then you can watch the progress bar which is a little hard to guess sometimes, so there’s also About → Show log → Live → Verbose to see if it’s downloading dblock files, which is not ideal.

Does this mean the logs are now deleted, or were logs or database preserved to get their info?

You’re ignoring the idea I gave of fixing database with SQL and fixing destination with database. Granted it’s experimental. There are also variations on Recreate, if you see performance issues. SQLite memory caches can be increased, for example. This has tested helpful on big recreates.

Generally you start fresh. That’s why I’m lobbying for an increase in default, and greater visibility.

Anything other than perfect backups coming through?

If you mean backup versions, I explained that in detail already, although you might not like it.
This is the downside of the benefit you gain by not having each backup be a full copy of files.

I don’t know if that’s been reported before. Your own comment was a little vague on history:

I guessed you meant this latest case. Can you say where the usual easier process solved it?

I’m saying that it’d be nice to know which of the seven Canary releases in between changed it.
It’s a wish, but maybe not one you can grant, and maybe not one we can. It’s not reproducible.

The things you’re doing are what I would have suggested for good speed if you had equipment.

It’s good to know that the destination was OK, as errors there are harder to get straightened out.

I have in fact zipped up the data folder for the Duplicati from both computers. So if there are any logs in there they are preserved.

Not at all. I just missed any steps you might have given toward that end.
I am open to learning how to do this and experimenting with it.
I now have a copy of the full 700gb+ backup folder and config in its broken state backed up. I can mess with it and reset it back to broken as much as needed till I find a process that works. I expect this will happen more times in the future and having a fast fix is worth the up-front effort. Especially if you are here to help me through it. I expect there are others who would benefit from this as well.

It retains all that the json uploads from the endpoint on the last failure. I presume this is the same information that can be gotten from the logs on the endpoint.

LJB-01 is now back on-line. and backing up properly. I still have a copy of the AppData folder and the backup folder from when it was not. We can work with that to try and solve this directly since it is smaller.

I am now rebuilding the database for KC-02 on the SSD local to the computer. I imaged a copy of Windows to a 2TB SSD to get this going. I am timing how long it takes to complete. I imagine it will run all night. It has been running for 30 minutes and is 1/3rd done. I will be able to reset this back to its corrupt state as well.

Once KC-02 is complete I will follow any directions you have supplied with SQL commands and tools. I have “Heidi SQL” on my PC. I have not checked to see if it can open SQLLite files… assuming that is what this is using. If you direct my toward any windows or linux tools I will obtain and use them to complete whatever tasks you believe should be done to resolve this or retrieve information toward that end. I am also not opposed to supplying you directly with the sql database is it does not contain any personal information.

I have been testing an SQL method that definitely falls into the fast-and-dirty category.
Ideally an expert developer would help here, but lacking that, I’m testing on a backup.

The problem we’ve seen before is that a backup version references a file that got lost.
It’s unknown how they get lost – that’s what all the logs and bug reports might help on.

Unexpected difference in fileset version 42: 1/10/2024 10:03:11 AM (database id: 769), found 200397 entries, but expected 200401

for example, expresses surprise that 4 files that should be there had their records lost.
The Recreate approach would replace the records. The SQL removes the expectation.
At this point, Verify files might get happy, and backup might replace the 4 lost files.

Old versions of those files are still in the destination (one hopes), but does that matter?
Sometimes sacrificing broken files is a way out, and it’s the purge-broken-files way.
Unfortunately, neither that, nor Repair, notice this problem. Maybe a dev can solve that.

Meanwhile HeidiSQL says it can do SQLite. I use DB Browser for SQLite, but SQL part
probably is the same if you can figure out the rest of handling a DB for some SQL work.

So I believe this may fall into what you just said. The rebuild took 16 hours… I can only imagine how long that would have taken over the WAN using SFTP. There were some warnings and an error.

WARNINGS:
2024-04-02 18:24:19 -07 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MismatchInBlocklistHashCount]: Mismatching number of blocklist hashes detected on blockset 17648. Expected 0 blocklist hashes, but found 1
2024-04-02 18:55:53 -07 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 1 missing volumes; attempting to replace blocks from existing volumes
2024-04-03 10:18:31 -07 - [Warning-Duplicati.Library.Main.Database.LocalRecreateDatabase-MissingVolumesDetected]: Found 1 missing volumes; attempting to replace blocks from existing volumes

ERROR: 
2024-04-02 18:49:07 -07 - [Error-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-MissingFileDetected]: Remote file referenced as duplicati-bcdc5669548e14bffaf0d907fa95b8c72.dblock.zip.aes by duplicati-i850e434dcda84ce4b8e489a4336c5f02.dindex.zip.aes, but not found in list, registering a missing remote file

I did verify that this file is in fact not in the backup. What should I do about this?
I did run purge-broke-files at the start to see if this resolved it and it did not find anything wrong. I suspect if I ran it now it might. Should I give that a go again? I just ran list-broken-files and it returned nothing.

I am feeling like the delete and rebuild of the database is the safest bet and that I should not waste time on the modification of the database directly.

Did you look at progress bar and/or live log? If no dblock download, WAN matters less.

Except for having to test a new recreate, moving it to another folder will see if it matters.
If there’s any actual missing data after dindex reads, it will have to go looking in dblocks.
I’m still asking if it did. You might be able to look at <job> → Show log → Remote to see
get (download) operations (in large quantities). Did downloads end in dindex or dblock?

What time does “at the start” mean compared to “now” and relative to the recreate shown?

Then probably purge-broken-files will do nothing. These don’t see “Unexpected difference”.

It’s certainly the usual path, but you were hugely concerned about how much time it takes.
Was the 16 hour run the big backup or are we still waiting to see how long that one takes?

After deleting 10 files at random on my smaller backup, SQL quick fix took 0.124 seconds.
Maybe we’ll hear from an expert sometime on whether or not it seems safe enough to use.

At about 1.5 to 2 hours in the Progress bar hung at about 75% for a while then I went to sleep.
there is no live log for this.

Yes… to both.

When I ran out of options, prior to initially posting I tried running the list-broken-files and it returned nothing so I ran the purge-broke-files and it also returned nothing.

Prior to today it never once occurred to me that I could just copy the whole backup data folder to a local computer and run it locally, than copy it back. 16 hours or even just a couple days is an acceptable timeframe to rebuild a backup if needed. 90% of my backups are under 200gb anyhow. On the servers, where most of my big backups exist, I break the folders up into individual backups to keep the sizes down.

I am going to continue reporting when this specific issue occurs as I feel that it may be something new that may need to be addressed on a programming level. I will retain the endpoint broken databases for a while in case anyone in the Duplicati crew ever wants to examine them. I am looking forward to getting the extended error reporting into the Beta from the canary.

The potential main hurt is an exhaustive search from 90% to 100%. Hard to say it did that part.
Job “Complete log” has “BytesDownloaded” and “FilesDownloaded” to compare to destination.

Yes, the goal is always to prevent these unusual (for most people) problems. By the way, your
other posted Warning and Error messages are quite rare too. Try a web search, if you wish to.

I’d like to focus on the “Unexpected difference” and would still like the log info that I requested.
Problems typically come from somewhere, and the prior history sometimes sheds some clues.

EDIT:

An example of clues are in Unexpected difference in fileset; unable to delete broken version
where the March 15 job log showed a cleanup (but was it done well?) from March 14 failure.
I’m trying quite hard to get the March 14 server log, but no luck, so unknown how this began.

“BytesDownloaded”: 8631852281
“FilesDownloaded”: 12797

Looking at the error messages, there appears to be exactly 4 files missing in all sets.
Combined with the exception message and stack trace, it seems the failure happens during the backup process.
My best guess is that this has happened during the compact phase, and some logic has deleted the 4 files. I am not sure why the exception would cause this, but the compact phase should not remove the files if they are part of other filesets. I worry that this would have happened even if the exception did not occur, but it is also possible that the exception causes a rollback of the database state, which is what creates the inconsistency. Yet another guess is that the 4 files are actually “orphan” files, but are somehow associated with each fileset, which I can diagnose with a bugreport.

Just so I understand the flow of things:

  1. You have 2.0.7.1-beta installed
  2. You run some 41 backups without issues
  3. You run backup 42, and it fails with the stack trace containing: A connection attempt failed...
  4. You get the error “Unexpected difference in fileset” when running again
  5. You install 2.0.7.101-canary, but see the same error

This is an error that is reported somewhat often, so I would be eager to dig into it some more.
If you still have the original database, it would nice to see if there are any other errors reported or any other messages attached with the backup that can help pinpoint the code flow.
Unfortunately any additional error messages are not reported on exceptions, until version 2.0.7.102-canary (this version would need to be running in the instance the error happens) for the remote log to contain details.
If you still have access to the database, can you look in the LogData table and see if there is some log data for the run of (presumably) version 42? The data is in JSON and the relevant parts are likely the Messages entry that should contain all log-data for the run, ending with the A connection attempt failed .... If you can share this entry with me privately, I can have a look.

That is good news, because it points to either a rollback issue or a the “orphan” situation. Both can be resolved trivially, once we figure out exactly what happens.

The error says that it found a dblock file that was only referenced by a dindex file. Most likely cause of this is that the compact process was attempting to delete remote files when it failed, so the dindex file was supposed to be deleted but could not be deleted. Due to the inconsistency error, it newer got to the cleanup step in subsequent runs that would have cleared it.

I would venture you can either delete duplicati-i850e434dcda84ce4b8e489a4336c5f02.dindex.zip.aes and re-run the recreate (sorry, long wait, I know), or try the purge.

Yes, I would say, manual fixing of the database is easy to mess up, which will give unexpected problems later.

Greetings Ken,
Thank you so much for taking the time to respond and give me your input.

I have the original databases with the corruption for both computers. I am happy to let you examine it directly.

I am looking at the “Log Data” list and they are dated and they all have the ID of 1.

I will try the purge. If it does not purge the file I will try deleting the file and do a recreate again.
Edit: purge gave no results. I have deleted the file and am recreating the database.

I don’t have your destination to compare with. 8 GB seems a little small to be a whole backup. Working some math, 12797 files means about half are 50 MB dblock files, so 320 GB backup.

There were far more backups that had been run. The thought was that all the backups broke, however if you look closely, version 38 somehow did not, although it should exist. Notice gap:

Unexpected difference in fileset version 39: 1/13/2024 7:35:32 AM (database id: 772), found 200545 entries, but expected 200549
Unexpected difference in fileset version 37: 1/15/2024 9:00:00 PM (database id: 774), found 200993 entries, but expected 200997

Database ID having gotten up to 811 means this might be a very old backup, and those have concerns themselves, as Duplicati was worse years ago. Sometimes old defects stay around.

Those were dismayingly free of meaningful clues (at least to me) where in the backup it was.
Verification is done at both the start and end of the backup, but there’s no sense of time here.

Compact is a common trouble spot, and the log file devotes a whole section to it. Please look, which is again a request for logs, because we don’t have them and they can add needed info.

FWIW, the other recent case of this didn’t run a compact, so that’s why I suspect it either had something go wrong the prior day (the log I can’t get), or the cleanups at next backup broke it.

Current Beta unfortunately doesn’t make a job log on error, but server log has some, and time sometimes can fill in when trying to piece things together. Original database can also add info, however we’d usually only want a bug report (preferably done by Canary), unless you’re sure you’re not leaking too much private info (mainly its path names though) with original database.

There’s a whole lot hinging on the compact process that we don’t even know was ever done.
If we’re lucky enough to see it in a log, great. If not, sometimes a bug report will show a clue, because compact has such a recognizable pattern of remote operations of dblock download.

You see multiple rows where the ID column is 1? That seems impossible. It’s the primary key.
If you’re looking at the recreated database, you need to be looking at the original one instead.

Format obscures it a little, but version 43: 1/9/2024 11:07:40 AM (database id: 768) is actual oldest affected version. I’m not sure what this is aiming at. Some latent error that caused trouble just now? Regardless, the easiest (though it must be done carefully) way to view a log if original system is unavailable is by dropping the database into some dummy job just to see logs.

Without that, one could just fish in the chronological logs for the right dates, otherwise one could take the fileset database id (say 768) to the Fileset table ID to map it to OperationID to look at its LogData Message. Personally, I’m more interested in any logs at failure times or just before then.

Destination Store is: 614.05G

I have no memory of it, but it is possible I removed that version at some point to resolve another problem.

It has been around for a long while. As it is meant to be a temporary emergency cloud backup I only have been actively storing 30 to 60 versions of daily backups.