OneDrive high levels of uploaded file corruption

Anyone else seeing this? I switched back to OneDrive after awhile on Google Drive and Backblaze B2, and soon found lots of downloaded files (for example in the after-backup test sampling, or heavier manual test) failing their integrity test. I tested with Duplicati.CommandLine.BackendTester.exe with a URL adapted from Export As Command-line because this test wants an empty folder to start with. I saw errors on this test too.

Because errors are hard to isolate when one doesn’t know what the file is supposed to look like, I next used Duplicati.CommandLine.BackendTool.exe with a 64MiB test file that I used before, to get put and compare. Downloading through OneDrive (and this is OneDrive Personal, in case it matters) got the same corruption.

The corruption pattern is weird. The first odd thing is it starts and ends at even binary boundaries, like some filesystem or drive sector problem might do. Even weirder is the new content is valid Base64-encoded data although it doesn’t decode (at cryptii.com) to valid text, and looking at the decoded bytes in hex didn’t help…

CrystalDiskInfo interpreted SMART results on the hard drive look fine, and raw values are hard to interpret.
If this was a drive error, I’d expect errors to other destinations too, nevertheless I might try another system.
Other potential future work includes testing using OneDrive’s own client software, and testing with rclone.
This error seems to arise about once every GB of upload, which is way too high to be used for my backup.

I can test the same way… with a script to run the BackEnd tool, do a put, get, and compare. So far after about 20 iterations I don’t have any errors.

If you see the problem about once every 1GB, does that mean you see it about 1 out of 16 iterations of this test with a 64MB file?

My script is below. Parts are for my versioned installs from .zip and not wanting an update to sneak in.

The network log needs .config file setup to do network tracing which unfortunately doesn’t show all data
when operations (or whatever its internal unit is) are long, so one might capture 1024 bytes out of 4096.

echo off

set AUTOUPDATER_Duplicati_SKIP_UPDATE=true
set TOOL="C:\ProgramData\Duplicati\duplicati-2.0.6.100_canary_2021-08-11\Duplicati.CommandLine.BackendTool.exe"
set URL="onedrivev2://BackendTester?authid=REDACTED"
set FILE=64MiB.3.txt

del network.log
cd put
%TOOL% delete %URL% %FILE%
%TOOL% put    %URL% %FILE%
cd ..
cd get
del %FILE%
%TOOL% get %URL% %FILE%
cd ..
fc /b put\%FILE% get\%FILE%

In this test, maybe 8 or 10. In Duplicati.CommandLine.BackendTester.exe test, I used its options to test varying fixed sizes and quantities. Short files took lots of time and passed. Normal ones failed in about 1 GB.

A script modified to use copy to OneDrive took some work to persuade Files On-Demand to always upload the file – OneDrive was working quite hard to avoid that. So far I think I’ve had about 12 cycles come out OK.

Actual backup was about 10 GB and test all got 21 bad dblock files. Statistics are a rough approximation.

Do you mean you didn’t see an error when testing just 64MB files? I thought you did from your earlier post. I have been testing this powershell script with 64MB file and so far have not had any errors:

$tool = 'C:\Program Files\Duplicati 2\Duplicati.CommandLine.BackendTool.exe'
$url = 'onedrivev2://DupTest?authid=xxx'

$file = 'test64MB.txt'

$good = 0
$bad = 0
For ($i = 1; $i -le 128; $i++)
{
    $tmpfile = "testfile-{0}.txt" -f ('{0:d4}' -f $i)

    ''
    "Iteration:  $i"
    "Testfile:   $tmpfile"
    "Good Count: $good"
    "Bad Count:  $bad"

    Copy-Item $file $tmpfile
    $hash1 = Get-FileHash $tmpfile

    'Putting ...'
    & $tool put $url $tmpfile
    Remove-Item $tmpfile

    'Getting ...'
    & $tool get $url $tmpfile
    $hash2 = Get-FileHash $tmpfile

    If ($hash1.Hash -eq $hash2.Hash)
    {
        $good++

        'Deleting ...'
        & $tool delete $url $tmpfile
        Remove-Item $tmpfile
    }
    Else
    {
        $bad++

        Write-Warning "$tmpfile - hash mismatch - leaving files for analysis"
    }
}

64 MB file (as shown in script) did error eventually, after 8 or 10 cycles of upload then download of test file.
BackendTester used --min-file-size=1mb --max-file-size=1mb --reruns=100 fine (1 GB total). 10 MB failed.
Maybe I could have gotten a small file to fail if I’d worked at it enough, but I only ran one test using that size.

Tested on mono awhile (not a huge while) with success, back to Windows, errored on second upload.
Awhile later, after a Windows update (which had a .NET Framework update), failed on about 14th file.
Looked at this one closer than before, moving it to Linux to take advantage of the analysis tools there.
Details are lost in history, but it was something like the below, but doing it again might do it differently:

  • hexdump -v -e ‘16/ “%_p” “\n”’ to dump it as 16 characters per line
  • egrep -v -e “0123456789ABCD” -e “BLOCK NUMBER” to remove the original undamaged lines
  • fgrep -f with 256 wrong lines on the network trace partial hex dump to see if any were there (no)
  • egrep -v ‘[0-9A-Za-z+/]{16}’ to confirm the bad lines are Base64 alphabet, maybe without = pads
  • tr -d ‘\n’ to join the 256 bad lines into a 4096 byte character stream
  • hexdump -v -e ‘5/ “%_p” “\n”’ to dump 5 characters per line, to avoid an fgrep alignment concern
  • vi to chop off the last line which has a single character leftover end
  • fgrep still can’t match anything in the partial network trace (it can at length 3, but that’s too easy)

This is an enhanced version of past manual work, copying snippets of damage and searching in trace.
I’m wishing that I could watch the unencrypted traffic on the wire, but I’m not set up with such a facility.
Damage this time began at 0x196C000, which is a 16K multiple, and ended at 0x196CFFF, so is 4 KB.

Maybe @tygill has some thoughts on this. I could never get Network tracing to log Duplicati sending the special corruption pattern that I saw, but it doesn’t log the whole send. Still, I did look at a number of logs, although I only looked at one with special tools. Early tries were just searches for bad chunks in an editor.

Reason I bring this up is because of user issues today on OneDrive Personal and yesterday on Business. Maybe there will be more information coming, because right now it’s unclear if they see the problem here.

The main candidate for potential bugs that I would see for problems is the chunking used to upload large files. That said, when I wrote the logic I tried to test the various edge cases and never saw any problems like this (though my tests may not have been as extensive).I don’t remember if the default chunking boundaries are 16k and 4k, but if so then that would be another point in favor of that possibility. The logic for chunking is fairly straightforward though - the type of errors I would expect would be bytes shifting around, not byte values being randomly changed.

@ts678 @tygill

Hi, I am new in this forum and didn’t read the thread in detail, but I guess your OneDrive upload corruptions with Base64 data could be related to this issue with OneDrive:

Welcome to the forum @Ole

Thanks so much for spotting that. I did a lot of web search earlier for symptoms like mine, and this is it, although some parts vary a bit. I might give symptom input to the issue. I’d need help with code details.

1 Like

Thanks, and I only succeeded due to your very thorough and spot on analysis, which was very similar to this rclone post.

Not sure how to understand this but let me give you a status.

We managed to reproduce the issue using only the OneDrive web interface in Edge on Windows 2 days ago and yesterday Microsoft confirmed a reproduction and apparently fixed/reduced the issue at the same time. This has however not been fully acknowledged yet…

The rclone team around this issue has been testing the past 20 hours and haven’t found any issues, so it seems fixed or drastically reduced in probability.

I guess your next step would be to check the integrity of all backups uploaded to OneDrive (assuming Duplicati doesn’t check the SHA-1 after upload).

I updated the GitHub issue to say mine is looking the same way, although my test time is short so far.

These are by all different people, and can’t really be checked. My own older ones weren’t seeing this.
At least we’ll maybe have better advice if anybody on OneDrive shows up with file verification issues.

It supports so many different storage types that it takes a generic approach (checking directory listing) as a quick file health check, and does a sampled download of files to verify SHA-256, or even open the files.

md5 & sha1 inclusion in verification json #2189 is an enhancement request similar to your idea, but for an optional verification file that can be used with PowerShell or Python sripts when one has direct file access.

rclone Overview of cloud storage systems shows the variety of hash types (or none) the storage may use, and because Duplicati backups are long-lasting and portable, it keeps SHA-256 in backups and database.

Developer documentation

The backends encapsulate the actual communication with a remote host with a simple abstraction, namely that the backend can perform 4 operations: GET, PUT, LIST, DELETE. All operations performed by Duplicati relies on only these operations, enabling almost any storage to be implemented as a destination

Apparently using cloud storage features to verify transfers didn’t align with that design goal, or the code. Retrofitting at this time wold be tough because original backend developers are largely unavailable now.