I was the one who originally wrote most of the OneDrive v2 (and other Microsoft Graph backends), so I can provide a bit of context here.
The Microsoft graph backend specific retry logic comes from the OneDrive concept of ‘upload sessions’. Large files need to be uploaded to OneDrive in smaller pieces, so if a file is too large, a single IBackend.Put() call will cause these backends to create an upload session and then perform multiple calls to upload pieces of that file. By default, these fragments are ~10 megabytes. If any of those individual requests fail, these backends handle their own retry internally rather than attempting to preserve the state of the upload session between multiple Put() calls from the Duplicati infrastructure.
I wonder if this ‘upload session’ behavior is also part of why throttling isn’t working as expected (and if this is why network usage is somewhat ‘bursty’). I believe the way Duplicati handles automatic throttling is by introducing a Stream wrapper which performs throttling by only allowing bytes to be read at the throttled rate (see ThrottledStream). However, because the upload session reads the full fragment size into a buffer and then uploads it directly, the throttling will apply only to the process of reading and filling the buffer and not to the actual upload. One way to work around this might be to set a small --fragment-size
so that even while the individual fragments are sent unthrottled, they are small enough to be spaced out more to match the desired throttling state. (Though would only work if Duplicati applies the throttling to reading the source files on disk for uploading - if that isn’t how it applies throttling this might not really do anything. I think that is what the following code is doing in BackendUploader though:
// A download throttle speed is not given to the ThrottledStream as we are only uploading data here
using (var fs = File.OpenRead(item.LocalFilename))
using (var ts = new ThrottledStream(fs, m_initialUploadThrottleSpeed, 0))
using (var pgs = new ProgressReportingStream(ts, pg => HandleProgress(ts, pg, item.RemoteFilename)))
await streamingBackend.PutAsync(item.RemoteFilename, pgs, cancelToken).ConfigureAwait(false);
)
Code fixes to improve throttling behavior might include:
- Having the Microsoft graph backends automatically scale the fragment size to perform that workaround automatically (this is probably the easiest but least desirable fix).
- Adding a method on ThrottledStream (which these backends can try to call explicitly if the stream they get is a ThrottledStream) which will read as many bytes as possible without pausing and then return, rather than continuing to read and buffer the entire fragment size. (It may still need to pause in some cases, but hopefully would pause at most once per buffer returned.)
- The best option is probably to change the logic used by the individual upload session fragments to write the fragments using a ThrottledStream (e.g., using a StreamContent with a ThrottledStream over the fragmentBuffer instead of directly using a ByteArrayContent on line 464 and similarly using a ThrottledStream on line 547 to write to the request stream). It might be possible to write some sort of custom Stream implementation that wraps the input stream and does the fragmenting / limiting directly (e.g., instead of reading into an intermediate byte buffer), but that would be a bit more complex - in particular, I don’t know if Duplicati manages throttling from a central place (e.g., there is a global counter of bytes written / read) or whether those limits are maintained independently on each ThrottledStream.
The backoff / retry strategy comes from the OneDrive documentation here. Specifically:
- Use an exponential back off strategy if any 5xx server errors are returned when resuming or retrying upload requests.
- For other errors, you should not use an exponential back off strategy but limit the number of retry attempts made.
When I initially wrote this, I didn’t realize that OneDrive sometimes adds Retry-After headers. (It looks like they are documented here though.) A OneDrive specific fix for this would be to look for this header in MicrosoftGraphBackend.CheckResponse() and just insert a Thread.Sleep() call to wait that amount of time before continuing. A better fix might be to push that logic up into the OAuthHttpClient / OAuthHelper classes (so any backend using them automatically gets RetryAfter support, though I don’t know if that would have any other side effects).