...please verify the sha256 hash ... again

K32768 · September 27, 2021, 8:33pm

Dear all,

I’m aware that there are already some threads regarding that topic. I have read them, but the cause or the remedy - if any - does not fit my case.

I have been using Duplicati 2 for about three years, first without any such issue, only since about one year ago I have encountered these warnings directly after Backup: … remote file duplicati…dblock.zip.aes is listed as Uploaded with size xxx but should be yyy, please verify the sha256 hash zzz.

At first I really thought that something has gone terribly wrong, deleted the files, ran the purge-broken-files command etc. I thought it was a one-time glitch, but it kept coming back. Sometimes; about every third backup.

I then tried to check closer and the “remote” files (two meters away on a OpenSuse15.3 machine) were perfectly fine. Correct Filelength, and also correct hash256. It is a little bit difficult to check that hash, but generating the sha256 hash as text value, converting it to hex(binary) and then base64 encoded has given exactly the desired result. Those backup files are just fine. Usually when I run the “test all” command from the Duplicati machine (Windows 10 x64), there is NO problem. 100% successful test.

Duplicati 2.0.6.3_beta_2021-06-17, on Win10 x64 (same issue on two different machines)
Target OpenSuse15.3 Linux, current Samba 4.13.6.

It is always annoying being told that there was a problem with the backup, but even after checking that those files are fine, there stays a bad feeling if I can really trust my backup.

I’m not sure which additional info could help to track the issue down, but please let me know how I can help to provide you with useful information.

Thanks,
Paul

ts678 · September 27, 2021, 10:08pm

What are some xxx and corresponding yyy values?

Was this on OpenSuse, or on Windows over SMB?

OK, so this problem might be transient. The best view you can get of what Duplicati saw after backup is backup’s Show log → Remote. The default backup ends with 3 get, and below that is the list that saw (presumably) a length that it didn’t like. Click on the list line and see what length it saw for the “bad” file.

If the problem is occasional, scrolling back through history is possible but slow. You might do better with
DB Browser for SQLite on a copy of the local backup Database. Look in the RemoteOperation table, e.g.

You can identify the backup by its UTC timestamp on its dlist file. Click on Data in list just below that. You can copy it to a text editor of your choice and see what the size of the file was when Duplicati asked.

Corrupted files on shared folder in local LAN #4076 was a similar (but seemingly more persistent) issue where PowerShell couldn’t get the right size either, when operating over SMB. Lower-level observation is possible using tools like Sysinternals Process Monitor (not that hard) or Wireshark (more challenging…).

Above GitHub issue has what I mentioned, and more. If you’re good at SMB config, that’d be great too, but I’m not able to help much, First step is to see if Duplicati got a wrong size, and how “sticky” the size looks.

EDIT:

Controlling write-through behaviors in SMB has a Windows 10 setting that might help stop some potential “stuffed with buffers and caches” behaviors that possibly are helping get the unexpected file size returned.

NET USE /WRITETHROUGH

New-SMBMapping -UseWriteThrough

K32768 · September 28, 2021, 9:06pm

First, thank you very much for your quick answer!

I’m impressed with the details I’m confronted with and it took an hour or two to get my head around your explanations and to work through the database and the logs.

Here are some examples from the first “failed” backup after a fresh Windows 10 and also fresh Duplicati2 installation:

listed as Verified with size 209665501 but should be 209679533
listed as Verified with size 209692045 but should be 209690205
listed as Verified with size 197933 but should be 190365
listed as Verified with size 146797 but should be 155117
listed as Verified with size 209688861 but should be 209638173
listed as Verified with size 209673261 but should be 209709357
listed as Verified with size 141005 but should be 153725
listed as Verified with size 209692829 but should be 196157
listed as Verified with size 209693837 but should be 209710813
listed as Verified with size 161965 but should be 209679181
listed as Verified with size 209642077 but should be 209637245
listed as Verified with size 209713517 but should be 209666957
listed as Verified with size 209709645 but should be 209706541
listed as Verified with size 209669357 but should be 209712413
listed as Verified with size 209667885 but should be 209641101
listed as Verified with size 147885 but should be 149597
listed as Verified with size 209628925 but should be 209671725
listed as Verified with size 151437 but should be 158221
listed as Verified with size 209669501 but should be 209615821
listed as Verified with size 209616861 but should be 209626333

I checked manually on both platforms on selected files and got the correct file length and correct sha256. (tools linux: bash / mc / sha256sum, win10: totalcommander)

Indeed, I can see that the list contains wrong file lengths. But when running a “test all” command later, everything is fine again.
Actually, what caught my eye is that the allegedly “wrong” file sizes actually exist, just for other files. These are not random values, all “wrong” file sizes in the examples above exist for other files.
Those entries also have the “LastModification” Timestamp from that wrong file!

Example: “… is listed as Uploaded with size 173965 but should be 209703197”

Wrong length/Timestamp:

{“Name”:“duplicati-b65a56fc4af684918ae9c3e045b6957ea.dblock.zip.aes”,“LastAccess”:“2021-09-27T21:28:58.497245+02:00”,“LastModification”:“2021-09-27T21:28:58.5158671+02:00”,“Size”:173965,“IsFolder”:false},

Correct length/Timestamp:

{“Name”:“duplicati-b65a56fc4af684918ae9c3e045b6957ea.dblock.zip.aes”,“LastAccess”:“2021-09-27T21:28:58.497245+02:00”,“LastModification”:“2021-09-27T21:29:08.0671763+02:00”,“Size”:209703197,“IsFolder”:false},

Source of wrong length and Timestamp:

{“Name”:“duplicati-id6742be42e48469e85304009b4e9c930.dindex.zip.aes”,“LastAccess”:“2021-09-27T21:28:58.497245+02:00”,“LastModification”:“2021-09-27T21:28:58.5158671+02:00”,“Size”:173965,“IsFolder”:false},

Wow! But where does this stuff get mixed up?
So I think you will recommend to set up some mouse traps with ProcessMonitor and see if we can catch something?

ts678 · September 28, 2021, 11:12pm

Excellent finding. Excellent question. I was looking for a pattern in the GitHub issue, but didn’t have all the file information, so couldn’t find anything. Sizes just seemed to be going up and down, much like you got.

Please do. That’s the next step down. It’s a bit hard to read (driver level, I think), but might spot something.

EDIT:

It might be easier if you can get this to happen on a smaller backup. Regardless the search tool may help.

A brute-force way to possibly spot something (or maybe chase the glitch away – who knows?) is to run a constant dir loop of the folder into a file where you can use find and find /v to look for info mismatch. PowerShell could probably do this more elegantly if you’re good at it, otherwise a batch file should suffice.

Thanks for pursuing this. I did take another look at Duplicati code, and it looks very simple, basically doing

github.com

duplicati/duplicati/blob/47cbe2d3bee37c6b2929bbaf9619ea82c43bcbf5/Duplicati/Library/Common/IO/SystemIOWindows.cs#L429-L433


      
          public IFileEntry FileEntry(string path)

          {

              var fileInfo = new FileInfo(PrefixWithUNC(path));

              return new FileEntry(fileInfo.Name, fileInfo.Length, fileInfo.LastAccessTime, fileInfo.LastWriteTime);

          }

FileInfo Class shows what .NET Framework puts in there, but last line above is Duplicati pulling info out.
I’m not seeing how anything under Duplicati’s control could attach the Name to other info for another file.

K32768 · September 29, 2021, 9:40pm

Thank you again for your quick reply.

Unfortunately I don’t have much experience with ProcessMonitor, none with WireShark or Powershell.
I tried a few steps with ProcessMonitor and I can see when Duplicati lists the directory, but there is nowhere the file sizes to be seen. If any, the EndOfFile values are misinterpreted as a fixed date (1601), I have found some unanswered forum qestion from 3 years ago, if this is possibly a problem in ProcessMonitor. No dice here, yet. Any idea where to look for the file sizes?

However, I have followed the brute-force way and did some not so quick, but nevertheless rather dirty hacks, monitoring the target directory,
a) from Windows

:while

set FILENAME=%TARGETPATH%\LISTDIR_%date%_%time:~0,2%.%time:~3,2%.%time:~6,2%.%time:~9,2%.txt

dir /-C %SOURCEPATH% > %FILENAME%

fc %OLDFILENAME% %FILENAME% >null 2>&1

if %ERRORLEVEL% EQU 0 (
del %FILENAME%
) else (
set OLDFILENAME=%FILENAME%
)

goto :while

b) from Linux

while true; do

TIMESTAMP=$(date -Ins)
FILENAME=$TARGETPATH/LISTDIR_$TIMESTAMP.txt
ls -al --time-style=full-iso $SOURCEPATH > $FILENAME

diff $OLDFILENAME $FILENAME &>/dev/null
if [ $? -eq 0 ]; then
    rm $FILENAME
else
    OLDFILENAME=$FILENAME
fi

done

Both scripts basically do the same thing and dump the directory listing into a file with a timestamp as part of the filename repeatedly. If the latest file is identical, it’s deleted right away again so only changes are saved.

I have never seen any problems in those logfiles, in the way that the only size changes of files occurred when they were written. Nothing weird seen there today.

In order to run those scripts, I set up some new small test backup configuration, where I can force source changes without messing up my “good” data. A small total data size of below 30GB didn’t trigger any problems, but now after 10 iterations making it bigger (now close to 50GB) I have a persistent set of errors of the kind we are discussing here (first 22, now 24). They are listed after each backup run, seemingly always the same ones. Once they were also listed when running the “test all”, but that went fine several times afterwards. But when checking for the allegedly wrong file lengths in the directory listings, all file sizes are as expected. Seems those target files themselves are just fine.

This is really strange. I need to call off the search for tonight.

ts678 · September 30, 2021, 12:24am

Maybe the same issue from the Duplicati GitHub Issue I cited. Works for my Windows to Windows SMB.

Mine seem to be on QueryNetworkOpenInformationFile (below has a filter for that, to lower the noise):

It sounds like you’re on to something (strange as it may seem). Thanks for the investigation so far.
Are you seeing this on Linux directly? If so, maybe that would get SMB off the hook as the cause…

K32768 · September 30, 2021, 7:40pm

I honestly hope I can contribute to the resolution of this issue.

I don’t quite understand your question. I have run that script on the bash, kind of intermediate between HDD/RAID/encrypted filesystem and outside fileshare via SMB. I had my doubts that something was going wrong there, but good to have double-checked; so far I have no evidence that the basic Linux setup is wrong. In that part I would have expected persistent errors more like transient ones. If things go wrong in a filesystem, they ususally stick and don’t disappear.

Now next section of the link: Samba server and Ethernet. The connection is twice 2 meters of cable, and a new (unmanaged) switch, 1Gbit/s. The old one has just recently quit the job with a failed power-supply, so we can rule that out now, too. Other end: the Windows10 machine (actually an old one which has lived through all Windows10 revisions without re-installation) and also a brand-new one, less than 4 weeks in operation. Showing the same behavior, so I thought it should share my experience here in the forum.
As I have written yesterday, the ProcessMonitor has a problem showing the “EndOfFile” parameter, not only for network shares,also for local file systems. After trying it again today, I have quickly abandoned ProcessMonitor and installed Wireshark.

I have expected a tedious learning curve with wireshark, but I had my first piece of trace after a few minutes: In all SMB2 packets I can list in Wireshark, the file lenghts of the allegedly wrong files are correct. I have spent quite some time poking around with some examples and all the values were correct. I even went the other way around and tried to find the all files with a few sample “wrong” filesizes. But as I have reported earlier, those wrong file sizes did only show up with their correct files. So far into the analysis I also have no evidence that SMB is wrong.

Moving on, back into the SQL-database of my Duplicati test configuration. I have again looked into the “RemoteOperation” Table and looked into the “list” data. I have just looked at the very end when I assume the verification after a backup job is running. All “get” Data is correct for size and Hash; I think that’s what we see in the warning after “should be”.
However, there are two “list” entries, both with the same number of files, the first contains correct sizes and Filestamps, the last one the wrong ones.

Searching further up in the “RemoteOperation” Table I can find multiple “list” entries with wrong values for timestamps and lengths. However, they seem to change back and forth, not always all and the same are wrong.

Not sure where I should look next. Any idea?

ts678 · October 1, 2021, 1:25am

It was late and I misinterpreted some things. Sorry. Looking at the scripts, it seems like you’re making a deduplicated set of listing snapshots so that when Duplicati complains you can grep or find in that set.

Duplicati did complain, but snapshots on Windows and Linux look fine. Then you tested more (thanks!).

I wonder if there’s any significance to this failure happening on systems that also show list problem?

I usually run PortableApps.com Process Monitor, but I just tested both the full download and the live file.
Maybe some other person who’s following can volunteer a Windows system to see if it works for them.

Is that the screenshot? Regardless, I’ll be talking about this case because there’s something presented.

The OperationID column maps to an ID in Operation table, so you can see what did RemoteOperation.
You can also try to find your way from the Timestamp. Convert with epochconverter.com or something.

Creating a bug report and posting a link to it is also possible, if you’d like someone else to look at the DB.
You’re doing a great job though.

Operation 20 looks like a backup with no source data changes (is that possible)? The first list might be the one before the backup. The second would be the one after the backup that didn’t happen. The test of default 1 sample set (dlist, dindex, dblock) follows. Or at least that’s the sequence I get from such a test.

If this sounds like what your Operation 20 was (run around September 29, 2021 9:22:35 PM UTC), that’s pretty strange. I’m not sure what Operation 19 was, doing all those dblock downloads 6 minutes earlier…

You could post a DB bug report if you like. I’m wondering if a PowerShell test would be worthwhile. I don’t write enough PowerShell to help easily, but in the GitHub issue I think PowerShell showed a problem, but Total Commander did not. The reason PowerShell is interesting is it uses .NET Framework like Duplicati.

I wonder if I @drwtsn32 can at least advise? It might use Get-ChildItem, but rest of script is beyond me.

I’d kind of like a developer opinion on what to try next. I suppose a new issue could be filed, but this is very much like the other one, except going quite a bit further into the digging. We’re not well set up to hand out specially instrumented builds, and there’s no nightly build to slide debug code into. Running a debugger or building a debug version from source is difficult, and I’m not sure who could give you adequate directions.

I sure wish Process Monitor worked. One thing that could be looked into is what filter drivers you’re using.
fltmc.exe can show and administer this. This is new territory for me, but here’s mine while procmon is up:

C:\>fltmc filters

Filter Name                     Num Instances    Altitude    Frame
------------------------------  -------------  ------------  -----
bindflt                                 1       409800         0
PROCMON24                               6       385200         0
WdFilter                                6       328010         0
storqosflt                              0       244000         0
wcifs                                   0       189900         0
CldFlt                                  1       180451         0
FileCrypt                               0       141100         0
luafv                                   1       135000         0
npsvctrig                               1        46000         0
Wof                                     3        40700         0
FileInfo                                6        40500         0

C:\>

Change Altitude of Process Monitor (ProcMon) says why altitude matters, although change looks hard enough that I’m not going to suggest doing it. The article also shows another sample filter conguration. Seeing if there’s anything “extra” in yours might be helpful. Also, maybe pause real-time virus scanner.

K32768 · October 1, 2021, 7:36am

Thank you very much for explaining what I’m seeing. It is hard to interpret all that data if you are not familiar with it.

Exactly. But it was run on Sept 30, not Sept 29. With the current condition, the error is triggered seemingly every time without the need of new backup data. That cuts the test case down and also the data that needs to be analyzed.

Acutally: one day earlier. Well, but I can explain it now: This was “test all” running successfully on Sept 29. All those “get” commands now make perfect sense.

I can try that one this evening and give it another shot.
Edit: I have tried it also with the Portableapps version, but nope. Any File size is shown as a
date, even totally normal files:

Edit2: I have tried diabling Sophos and there is a change (I have only tried on the local file system so far). If Sophos is active QueryNetworkOpenInformationFiles is sent (with the result shown), but there is a second Operation QueryStandardInformationFile, which delivers the correct EndOfFile. However, if I disable Sophos, there is no QueryNetworkOpenInformationFiles but only QueryStandardInformationFile. I honestly don’t understand why there is a change.

Well, there is the Sophos AV listed…

Filtername                     Anzahl von Instanzen    Höhe    Frame
-----------------------------  --------------------    ----    -----
bindflt                                 1       409800         0
Sophos Endpoint Defense                23       389220         0
PROCMON24                              11       385200         0
hmpalert                               10       345800         0
WdFilter                               11       328010         0
SAVOnAccess                            11       324000         0
storqosflt                              0       244000         0
wcifs                                   0       189900         0
CldFlt                                  0       180451         0
FileCrypt                               0       141100         0
luafv                                   1       135000         0
npsvctrig                               1        46000         0
Wof                                     7        40700         0
FileInfo                               11        40500         0

I will give the github thread another read. It indeed shows many similarities as we are getting deeper.

Well, I once tried to build the linux version on the Opensuse system, but failed miserably. I think, you are cross-compiling it? I was not prepared to go into building a debug version on Windows, but if there is a chance to help I would also look into that. Nevertheless I would need some guidance.

ts678 · October 1, 2021, 12:09pm

I was thinking of posting in that to see if anybody on that one might care to join in on with efforts here.
Maybe you can do a post. FWIW SMB destinations have a history of trouble. One user went to NFS.

EDIT: found it, so inserting a comment:

and there’s a note about CIFS with cache=none, which reminds me of net use /writethrough above.
Possibly some workarounds are good for problem debugging, but not good enough for production use…

A suitable developer would have to answer on the release builds. My builds usually have some trouble.

I might have wrestled both a Linux build and a Windows build to some state some years ago though…
How to build from source has some (old?) info, and a forum search will find some Visual Studio topics.

K32768 · October 1, 2021, 9:23pm

I have tried this evening again: Setup Wireshark+ProcessMonitor+Duplicati - same picture:

All EndOfFile sizes correct in Wireshark when checking the SMB commands
All EndOfFile sizes 1601-01-01 02:00:02 in ProcessMonitor (it does show only operation QueryNetworkOpenInformationFiles, but never QueryStandardInformationFile)
Same issue in Duplicati; complaining about 24 warnings, but the list is actually only 20 files in the log. Once also 48 warnings, but still only 20 files listed in the log. Just saying …
In most instances of the “Backup without new files” the first “list” operation is ok, the second has broken content, as shown in the screenshot from yesterday. Sometimes both are ok, sometimes neither. But the issue with the warning message has occurred every single time since yesterday.

Edit: I tried also

disabling Sophos AV
net use /writethrough
but no cigar.

By the way, I managed to get Duplicati-Debug up and running from the github sources, with the issue reproducible as in the current beta version. It tried to fight me, but it is running:

ts678 · October 2, 2021, 4:53pm

This is great news and great progress (thanks!), but the next question is how you would like to proceed.
One option is to modify the source to do some logging, following the many existing examples of logging.
Another would be to try to use a debugger attached to Duplicati to have tracepoints to display same info.

Tracepoints: Debug with less clutter
Log info to the Output window using tracepoints in Visual Studio

The problem for me is that I don’t use Visual Studio, but the earlier code I cited might be a place to start.
I haven’t proven it with a debugger (I run a different one), but possibly this is where the JSON is created:

github.com

duplicati/duplicati/blob/47cbe2d3bee37c6b2929bbaf9619ea82c43bcbf5/Duplicati/Library/Main/BackendManager.cs#L1061-L1084


      
          private void DoList(FileEntryItem item)

          {

              m_statwriter.SendEvent(BackendActionType.List, BackendEventType.Started, null, -1);

          

              var r = m_backend.List().ToList();

          

              StringBuilder sb = new StringBuilder();

              sb.AppendLine("[");

              long count = 0;

              foreach (var e in r)

              {

                  if (count != 0)

                      sb.AppendLine(",");

                  count++;

                  sb.Append(JsonConvert.SerializeObject(e));

              }

          

              sb.AppendLine();

              sb.Append("]");

              m_db.LogDbOperation("list", "", sb.ToString());

This file has been truncated. show original

K32768 · October 3, 2021, 10:18am

Thank you for the hints! It’s embarrassing, but it seems I’m stuck with what probably is a rookie problem:

Tracepoints don’t work: “the breakpoint will not currently be hit”.
What is even more strange, not even the “System.Diagnostics.Debug.WriteLine(count);” would provide any output…

There is no debugging output in the VS “Output” window.

Is there something I need to configure for debugging? I’m using the stock “Debug” version of “Duplicati.GUITrayIcon”

ts678 · October 3, 2021, 12:24pm

Because I don’t use Visual Studio, I can’t answer. Are there any Visual Studio users reading who can?

K32768 · October 8, 2021, 6:01pm

It was a busy week at work so I couldn’t do much. However, I found the solution of the debuggging issue today. It was in this forum, in another thread:

drwtsn32
Dec '19

I have found that you need to set AUTOUPDATER_Duplicati_SKIP_UPDATE=1 environment variable or realtime debugging doesn’t work.

Should be included as a note on the the [How to build from source], though.

I already had the idea that there was more than a single thread which the debugger could not monitor. I will try to dive deeper now. But beware, complete newbie to C#. I feel more at home with C on microcontrollers.

Even without tracing, I already see plenty of “Exception thrown: ‘System.IO.FileNotFoundException’ in mscorlib.dll”

warwickmm · October 8, 2021, 8:47pm