With the current Duplicati bug, we don’t wait, go ahead and ask, data comes down – then it just stops…
See above. The stop is mysterious, but why would Duplicati not be running any timeout during that get
?
With the current Duplicati bug, we don’t wait, go ahead and ask, data comes down – then it just stops…
See above. The stop is mysterious, but why would Duplicati not be running any timeout during that get
?
we don’t know very well what is happening. In the OP case, problem is solved by not using compact, so no more information. In the test by @ts678, it seems to be ‘stuck’ in a loop involving no network access at all (Duplicati stuck on Compact when Getting/Putting from OneDrive and results in missing files - #35 by ts678). This is all very confusing and lacking a well defined reproducible problem (with the obvious exception of the retry-after not respected problem that is cropping everywhere). So my idea is to just fix this one and see what happens next.
Have to agree it’s confusing and hard to reproduce. That’s unfortunately often the case with backends.
I have little idea of the internals, but is parallel uploading or potential future parallel download an issue?
I don’t think the problem is the same for parallel uploading.
When putting data to a cloud server, you don’t expect to get something in return. With all Apis I know at least.
So there will be no problem reading the answer, what you get is a message saying ‘put succeeded’ or ‘put failed’. Duplicati expects that.
What the existing software is doing, is to take in account an hint to not try to put before X time.
In the opposite case, when downloading data, you expect an answer AND data. The existing software is getting the status for the Get, getting the hint that it should not try to get new data before X time, … and then crashing because the data is not at all what it expects: it is expecting nice binary data in the form of a Zip (optionally encrypted with AES), but instead it is getting a text message from the server explaining what is happening. So there is an exception generated and the backend manager recreates a new backend and then the hint of not sending before X time is lost with the backend.
This would be very bad for an upload, since it would mean that other parallel uploads would crash when the backend is recreated, but it can’t happen because the problem is different, Duplicati will not crash in this case, it will just register that the put has failed, an entirely possible and managed outcome of sending data to the server.
About the parallel download, it would be very bad of course but AFAIK we don’t do it.
Not really (if I understand you correctly). I have noticed a hang in other situations than compact, like simple backup operation. I just find it easier to reproduce on compact. Originally it was the compact operation where I noticed it, but now I’m fairly confident its not just compact, and also I suspect its not even OneDrive but instead some general waiting for something that isn’t running, like a lost network call or an async that isn’t started. Though I should state, I don’t remember why I concluded it probably wasn’t just OneDrive, maybe that was just to keep options open and not assume something false and dig deeply in the wrong rabbit-hole.
I’m glad to see you have managed to reproduce it several times now, ts678 (if I also got that right).
I think that the root cause is that you are exceeding the throttling limits for your account. The current problem that Duplicati has here is that it don’t show it clearly, because when downloading data it ignores the message from OneDrive that it is supposed to stop new requests. Even with the (hacky) fix I (maybe, for want of a better way) will add to Duplicati, if OneDrive asks Duplicati to wait for 5 hours, it will wait and there is no way around it. The only thing is that if the user opens the Live window, the reason for the hang (“stuck” as you say) will be displayed immediately.
It would be great if there was a way around Duplicati hanging for the entire duration of the Retry-After header. However I’m certain that is not the main issue here, since I have a good sense of when OneDrive starts throttling, and that happens MANY times later that what I’m pushing it at. I know this because rclone logs the header. I can via rclone download 300 GB in a day and throttling does not occur until the 200GB mark, plus it has never sent throtling of more than a day. Duplicati is stuck for longer than that, my guess is forever…
However its a fine start if we can get a clearer logging of the Retry-After header, and then just continue from there afterwards. That header ought be logged to info log anyway when its above a minute or so.
I have built release installers for current Duplicati + the patch here:
They stay available for 3 days.
Be careful what you wish for. It now looks like below. Warning level might be too much for final code.
2023-08-29 10:13:00 -04 - [Warning-Duplicati.Library.RetryAfterHelper-RetryAfterWait]: Waiting for 00:04:44.8208913 to respect Retry-After header
Current Duplicati can log (but not wait). I set --console-log-filter="*RetryGet"
to be selective:
Operation Get with file duplicati-bf2b17935960c428e9e74d49f2ce2cb35.dblock.zip attempt 1 of 5 failed with message: 429: Too Many Requests error from request https://public.dm.files.1drv.com …
This was after about 20 GB downloaded, so beyond order of magnitude less than your rclone. YMMV however after awhile I’ll have more of a feel for rhythm here, and whether it seems less prone to stop.
Raising it to warning was not an accident, my point of view is that having the server throttling the connection is an important event that deserves to be known by the user at the default level. Setting it at an unimportant level such as verbose will ensure that some users will miss it and not understand what happens. If it is a warning, the job will be flagged and I think that this is what should happen, it’s not a good idea to expect the users to read carefully the detailed result of each job that result in success. This does not happen in the real world.
I’m not liking the trend so far. Anyone want to guess at why it keeps going up even when waiting nicely?
2023-08-29 10:13:00 -04 - [Warning-Duplicati.Library.RetryAfterHelper-RetryAfterWait]: Waiting for 00:04:44.8208913 to respect Retry-After header
Retry-After: 250
Retry-After: 250
2023-08-29 10:18:46 -04 - [Warning-Duplicati.Library.RetryAfterHelper-RetryAfterWait]: Waiting for 00:03:59.7833502 to respect Retry-After header
Retry-After: 1585
Retry-After: 1585
2023-08-29 10:25:22 -04 - [Warning-Duplicati.Library.RetryAfterHelper-RetryAfterWait]: Waiting for 00:26:14.8682799 to respect Retry-After header
Retry-After: 3418
Retry-After: 3418
2023-08-29 10:53:48 -04 - [Warning-Duplicati.Library.RetryAfterHelper-RetryAfterWait]: Waiting for 00:56:47.8725435 to respect Retry-After header
Retry-After: 4012
Retry-After: 4012
2023-08-29 11:53:56 -04 - [Warning-Duplicati.Library.RetryAfterHelper-RetryAfterWait]: Waiting for 01:06:41.8625229 to respect Retry-After header
It seems like it gets about 1 GB (about 20 dblock of 50 MB each) between its throttling announcements.
There’s sort of an argument that throttling is a normal thing, but these delay levels are getting annoying.
Probably on purpose. That’s why I said that there is not much to be done. I have seen a 5 hours delay and (not sure) even worse.
on my rclone syncs I got up to 8 h delays, but never more than that. But then rclone also does a crazy slow decremental delay, where it ‘voluntarily’ waits for 5h on next download and then 3h, and so on for each file slowly reducing sleep one stop at a time.
Have any of you seen it go stuck with this canary yet?
Btw, really nice with Retry-After logging. I’ve started a test myself now.
Not me, but it’s still very bursty (almost filling the 100 Mbit/sec link it was on then) – then big pause.
I’m now going to add a --throttle-download=100kb
(actually bytes IIRC) which is less than what bursting and throttling was getting before. I’ll find out whether it will manage to do any more dblocks.
They have something called a pacer
which I don’t understand well. I’m faking that a bit with a throttle.
Was that the only up/download you’ve done to that OneDrive account in the last 24h? If so, yes its 1 order of magnitude less than I experienced. I did get small retry-after before the 200GB, I think, but not something counted in many minutes/hours. I have the data in an excel with throughput pr minute, though its hardly relevant.
Yeah exactly. It does a fine job when pacing below 1 minute, but insane when pacing each request starting at 8h. I have some notes I’ll send them along with my excel history, but that’s another story for another time.
However maybe its a bit interesting to hear that the daily throughput was not hampered by the pacer in rclone as long is it didn’t pace more than say hours. If I restarted rclone (nullifying the pacer) it would quickly soak up the remaining space towards the ‘invisible’ Microsoft OneDrive throttle-limit, but the avg throughput was clearly not affected. So I’m relatively confident that you’ll find no avg. difference in upload speed by adding a throttle to Duplicati, as long as duplicati correctly sleeps when requested by Retry-After. There might be a difference for a service with tons of small requests, but Duplicati has very few large requests because it just puts/gets dblocks and index-files. However I can understand why you feel inclined to test it, it drew me half insane trying to optimize it with rclone, until I realized it kind of worked great when at low pacing.
Longer than that – I think. OneDrive backup is off, and I stopped test
over a week ago.
Possibly it holds a grudge for a long time, or maybe a slow pace will improve its attitude.
Duplicati throttling itself is kind of bursty – not network packets like a router might attain,
but I get around a 6 minute delay after fast 50MB download. News here will take awhile.
If it doesn’t throttle at this slow pace, I might try increasing its pace to see what it will do.
Interesting. 1 week is very long, my tests indicate its probably more around 24h. I think its a good idea to see if you can get more throughput before throttling with network delay.
For the record I’m on a 1000/1000 Mbit connection, and OneDrive downloaded at max 200 Mbit/s with rclone without any additional throtling from my side.
Any idea how many parallel download connections that used? Possibly spreading traffic helps some.
Whether or not it will make OneDrive happier, parallel connections (when possible) raise throughput.
Duplicati does them on upload because it just has to move pre-made files. Download may be harder, because files need to be worked on afterwards. One easier use might be DB recreate dblock search, although it’s still not easy. We might be heading for download (rather than SQL) limited DB recreates,
when the dindex scheme falls short, from issues that are unclear due to the usual lack of information:
Unclear if this meant wait much more than a week then hit it without a delay, or tweak the delay to
"get more throughput before throttling. I plan the latter. It’s fine overnight, but at slow 500 MB/hour.
Throughput speed was independent of how many connections I ran (after throttling). Initially I ran 15 simultaneous threads, and got those 200 GB unthrottled, 300 GB daily throughput. Later on I reduced it to 1 or 2 threads, with the same effective speed (due to throtling). Though note that rclone’s pacer ensures that once the first Retry-After header has been received (with any small amount), it will use that pacer to throttle all the threads, until zero delay is reach in the pacer.
Yeah, parallel connections may raise throughput for many small files, though for Duplicati’s case where none of our files are crazy small, I find it likely that 2 parallel threads will be enough to get max througput. My rclone tests indicated 1 thread is enough for max throughput, except for small index files. However once we get throttled, it is irrelevant how many threads we use, we won’t get higher throughput.
Here’s my rclone excel stats, if you are interested: rclone-stats.xlsx
Note though that number of threads are not included. The tab ‘TransferSpeed fresh account’ was transfer from an account I hadn’t ‘violated’ yet…
I also just remembered an rclone-forum thread where the poster said it took 2-3 days before his account was no longer throttled due to crazy spam of requests. Possibly that was a case where their codebase didn’t yet acknowledge the Retry-After header.
I’m sorry that came out wrong. I meant “I think its a good idea to see if you can get more throughput BY throttling with network delay FROM DUPLICATI.” Just in case Duplicati behaves differently than rclone. So adding some pacer like throttling to your Duplicati config might give you the knowledge about OneDrive Retry-After throtling. However I’m not expecting that you’ll experience a larger throughput from Duplicati, as long as Duplicati correctly handles the Retry-After sleep header. But only if you feel like it.
I would just wait a day or two. Or use a different microsoft account.
Ah I just released you are probably on the free account (as you mentioned at the beginning). Quite possibly you will be throttled way before me, given that I’m on the family 365 account, with 1TB data (x 5 account). I didn’t connect those two dots before.
The new canary got stuck for me during the final endpoint validation:
See full logs here (except I removed all lines containing my filenames):
Compressed (zipped) Folder
Its been stuck for 2.5 hours, and the on the initial run it took only 5 minutes. In this run I backed up 9 GB to 1mb dblocks, compacted 1mb->6mb, then compacted 6mb->36mb. No errors, no stuckness. Then I started over, same initial backup set new remote, and it got stuck at the end verification. Each turn has been separated into separate log-files.
These are the last logs:
2023-08-30 17:46:18 +02 - [Profiling-Timer.Begin-Duplicati.Library.Main.BackendManager-RemoteOperationList]: Starting - RemoteOperationList
2023-08-30 17:46:18 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started: ()
There were a new retry’s but no Retry-After header. I also included the backup definition in the zip above, for completion. I ran with profiling logs.
FYI: I ran this canary on my windows machine, and all previously tests I have mentioned have been the current-beta running on linux.
Also note that this is not the first time i see it go stuck in ‘normal’ backup operations, ie. non-compact backup, so its not a new thing for this canary build.