Duplicati stuck on Compact when Getting/Putting from OneDrive and results in missing files

gpatel-fr · August 30, 2023, 8:14pm

Maybe there are several problems that are messing up the picture. The RetryAfter AND random failures caused by the network (the cloud or maybe something on the link). Searching for retry in your files sure gets quite a few ‘retrying’ and ‘failed’. Given the rather small data size, it’s quite strange to have so many network glitches.

So solving this could require to fix both problems, that is, the patch for the RetryAfter AND http-timeout set to a value appropriate for your block size (since you have set it rather small and your network is fast, 1 mn should do I think even in worse cases - even if the upload speed seems very low on your trace, never over 2Mb/s for a block but maybe it’s because they are rather small).

If I am not mistaken, you tried to set http-timeout before but did not keep it for these last tests here, so in your place I’d try to set it again.

arberg · August 30, 2023, 8:31pm

That is correct. Isn’t there a default timeout?

I’ll set 1 min timeout on the next run.

ts678 · August 30, 2023, 8:42pm

This looks different – it failed on a list instead of a get – and OneDrive started out by forgetting that your folder was there. Do you think folder was there at this time, but OneDrive somehow didn’t see it?

2023-08-30 17:03:28 +02 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.BackupHandler-PreBackupVerify]: Starting - PreBackupVerify
2023-08-30 17:03:28 +02 - [Profiling-Timer.Begin-Duplicati.Library.Main.BackendManager-RemoteOperationList]: Starting - RemoteOperationList
2023-08-30 17:03:28 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-08-30 17:03:31 +02 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationList]: RemoteOperationList took 0:00:00:02.753
2023-08-30 17:03:31 +02 - [Retry-Duplicati.Library.Main.BackendManager-RetryList]: Operation List with file  attempt 1 of 5 failed with message: The requested folder does not exist
Duplicati.Library.Interface.FolderMissingException: The requested folder does not exist ---> Duplicati.Library.Backend.MicrosoftGraph.DriveItemNotFoundException: Item at https://graph.microsoft.com/v1.0/me/drive/root:/backup/duplicatiMaya-dummytest/dummyTestWin2:/children was not found
Method: GET, RequestUri: 'https://graph.microsoft.com/v1.0/me/drive/root:/backup/duplicatiMaya-dummytest/dummyTestWin2:/children', Version: 1.1, Content: <null>, Headers:
{
  User-Agent: Duplicati/2.0.7.3
  Authorization: Bearer ABC...XYZ
}
StatusCode: 404, ReasonPhrase: 'Not Found', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
{
  Transfer-Encoding: chunked
  Strict-Transport-Security: max-age=31536000
  request-id: a90777d6-1b1f-4fb1-8db1-3528423fc547
  client-request-id: a90777d6-1b1f-4fb1-8db1-3528423fc547
  x-ms-ags-diagnostic: {"ServerInfo":{"DataCenter":"West Europe","Slice":"E","Ring":"5","ScaleUnit":"006","RoleInstance":"AM1PEPF00027E55"}}
  Date: Wed, 30 Aug 2023 15:03:32 GMT
  Content-Type: application/json
}
{
  "error": {
    "code": "itemNotFound",
    "message": "Item does not exist",
    "innerError": {
      "date": "2023-08-30T15:03:32",
      "request-id": "a90777d6-1b1f-4fb1-8db1-3528423fc547",
      "client-request-id": "a90777d6-1b1f-4fb1-8db1-3528423fc547"
    }
  }
}
   at Duplicati.Library.Backend.MicrosoftGraphBackend.CheckResponse(HttpResponseMessage response)
   at Duplicati.Library.Backend.MicrosoftGraphBackend.ParseResponse[T](HttpResponseMessage response)
   at Duplicati.Library.Backend.MicrosoftGraphBackend.SendRequest[T](HttpRequestMessage request)
   at Duplicati.Library.Backend.MicrosoftGraphBackend.SendRequest[T](HttpMethod method, String url)
   at Duplicati.Library.Backend.MicrosoftGraphBackend.<Enumerate>d__71`1.MoveNext()
   --- End of inner exception stack trace ---
   at Duplicati.Library.Backend.MicrosoftGraphBackend.<Enumerate>d__71`1.MoveNext()
   at Duplicati.Library.Backend.MicrosoftGraphBackend.<List>d__55.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Duplicati.Library.Main.BackendManager.DoList(FileEntryItem item)
   at Duplicati.Library.Main.BackendManager.ThreadRun()
2023-08-30 17:03:31 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Retrying:  ()
2023-08-30 17:03:32 +02 - [Profiling-Timer.Begin-Duplicati.Library.Main.BackendManager-RemoteOperationList]: Starting - RemoteOperationList
2023-08-30 17:03:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-08-30 17:03:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  ()
2023-08-30 17:03:32 +02 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationList]: RemoteOperationList took 0:00:00:00.241

gpatel-fr · August 30, 2023, 9:12pm

disable-autocreate-folder is set to false by default.

ts678 · August 30, 2023, 9:47pm

That might explain it, although knowing history would be better. Seemingly the creation isn’t logged.
Below is a profiling log for an initial backup into a not-initially-existing folder. It has that same “look”:

2023-08-30 17:37:17 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.Operation.BackupHandler-PreBackupVerify]: Starting - PreBackupVerify
2023-08-30 17:37:17 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.BackendManager-RemoteOperationList]: Starting - RemoteOperationList
2023-08-30 17:37:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-08-30 17:37:17 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationList]: RemoteOperationList took 0:00:00:00.000
2023-08-30 17:37:17 -04 - [Retry-Duplicati.Library.Main.BackendManager-RetryList]: Operation List with file  attempt 1 of 5 failed with message: The folder C:\Duplicati\duplicati-2.0.7.2_canary_2023-05-25\RUN\test 6\ does not exist
Duplicati.Library.Interface.FolderMissingException: The folder C:\Duplicati\duplicati-2.0.7.2_canary_2023-05-25\RUN\test 6\ does not exist
   at Duplicati.Library.Backend.File.<List>d__22.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Duplicati.Library.Main.BackendManager.DoList(FileEntryItem item)
   at Duplicati.Library.Main.BackendManager.ThreadRun()
2023-08-30 17:37:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Retrying:  ()
2023-08-30 17:37:17 -04 - [Profiling-Timer.Begin-Duplicati.Library.Main.BackendManager-RemoteOperationList]: Starting - RemoteOperationList
2023-08-30 17:37:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Started:  ()
2023-08-30 17:37:17 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: List - Completed:  ()
2023-08-30 17:37:17 -04 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationList]: RemoteOperationList took 0:00:00:00.000

RemoteOperation table looks even emptier. Not even the list that worked just above is shown here:

Maybe it depends on which code the operation hits. The Test connection button is also a bit quiet.
There’s a list in there whose listing I would sometimes like. No immediate big deal, but seems odd.

arberg · August 31, 2023, 12:46pm

In this run I just bumped the remote folder in Destination and didn’t bother clicking ‘Test’ (and thus allow Duplicati to create the folder before saving new backup-definition). I noticed it worked, but didn’t notice that log. So its fine, it did its job and created the remote backup folder.

I don’t think we should worry about that pre-create remote folder, after all it works.

It was only later it got stuck on post-verify.

ts678 · August 31, 2023, 1:43pm

OK, one theory going away then. FWIW my test all at 100KB/sec finished with no retry-after, then 400kb did too. I may or may not probe for a limit, but how to test and debug the list hang is unclear. This is probably a situation where more low-level logging would help, but I don’t think that’s there yet.

ts678 · August 31, 2023, 2:20pm

I tried search of forum and github for OneDrive Backend event: List to find other stuck list.
Only found a couple from 2018, so whatever’s going on doesn’t seem like it’s common recently:

The outcome of below was less clear. Actually, it’s not clear which OneDrive version did a slow list.

arberg · August 31, 2023, 3:03pm

I’m still think its highly probably this is a recent issue that wasn’t present (or at least not to this degree) in the old beta 2.6.0.3 or whatever i was called. Could be library update, I noticed some of those. But I also have not yet rolled back, due to the need to edit the sqlite db’s.

I might find some time in a few months to get the code running and have a look myself.

I also have not yet seen the ‘normal’ stuck-on-get/put, besides this 1 case with list, but probably it’ll show itself in more ways the more I test it.

ts678 · August 31, 2023, 3:39pm

??? Although the thinking about this has been changing, and logs are few, the title of this topic is now:

Duplicati stuck on Compact when Getting/Putting from OneDrive

I would suggest keeping at the very least a retry level log file which is pretty compact. Profiling is big.
Basic questions such as whether the stuck operation is the get or (somehow) the list will get clearer.

Your job log Complete log also logs a RetryAttempts number. You can even look over your old logs. These retries are at a higher level (the file operation) not the internal Retry-After type of retries, but I suppose it would be possible to redefine, if we really want to. Log at Retry level might also be possible.

Throwing emails and yellow Warning popups on routine throttling is likely to get lots of user complaints. Regardless, your testing with that test build, and Canary feedback when it goes out will help us decide.

Microsoft might also have changed. They’re busy in their OneDrive Beta adding new hints on throttling.
Microsoft OneDrive Service Limit Throttling which leads to RateLimit headers - preview, and then IETF.

Editing the Version number down is pretty easy, but editing it back up before returning to 2.0.7 will avoid complaints about indexes already existing. Problem could have been avoided by using IF NOT EXISTS.

arberg · August 31, 2023, 7:05pm

Well this is interesting. In my next backup test with http-operation-timeout=10min and http-readwrite-timeout=8min i got the following logs (which I havn’t seen before)

2023-08-31 17:38:51 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-i8b9630f2a2af4b27a6ef4a0a51bab124.dindex.zip.aes (1.29 KB)
2023-08-31 17:48:50 +02 - [Retry-Duplicati.Library.Main.Operation.Backup.BackendUploader-RetryPut]: Operation Put with file duplicati-iebf0e88d82e941528e314ba6f8ec51ab.dindex.zip.aes attempt 1 of 6 failed with message: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Backend.MicrosoftGraphBackend.<PutAsync>d__60.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.Backup.BackendUploader.<DoPut>d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Main.Operation.Backup.BackendUploader.<>c__DisplayClass20_0.<<UploadFileAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at Duplicati.Library.Main.Operation.Backup.BackendUploader.<DoWithRetry>d__21.MoveNext()
2023-08-31 17:48:50 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Put - Started: duplicati-b84ba9523cb20405491ede4959e00a1eb.dblock.zip.aes (1,002.56 KB)

I can imagine that the above cancel fixes the hang issues. Since its a 10 min wait it must be the http-operation-timeout=10min and not the read-time that cause this line. But then its strange that it did help when I tried it before. Or there are multiple issues, and this resolves one of them.

gpatel was there a change in this canary related to this, or should Duplicati have behave similar in the old beta release?

fyi: I chose 8/10 min timeout so it was likely I could spot the usage of that timeout from the log-timestamp.

Update: Another one:

2023-08-31 21:45:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Started: duplicati-b00c6ef1a7d3d4800b35f7e7638112c0d.dblock.zip.aes (1,002.56 KB)
2023-08-31 21:55:32 +02 - [Profiling-Timer.Finished-Duplicati.Library.Main.BackendManager-RemoteOperationGet]: RemoteOperationGet took 0:00:10:00.021
2023-08-31 21:55:32 +02 - [Retry-Duplicati.Library.Main.BackendManager-RetryGet]: Operation Get with file duplicati-b00c6ef1a7d3d4800b35f7e7638112c0d.dblock.zip.aes attempt 1 of 5 failed with message: HTTP timeout 00:10:00 exceeded.
System.TimeoutException: HTTP timeout 00:10:00 exceeded.
   at Duplicati.Library.OAuthHttpClient.<GetAsync>d__5.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Duplicati.Library.Backend.MicrosoftGraphBackend.Get(String remotename, Stream stream)
   at Duplicati.Library.Main.BackendManager.coreDoGetPiping(FileEntryItem item, IEncryption useDecrypter, Int64& retDownloadSize, String& retHashcode)
   at Duplicati.Library.Main.BackendManager.DoGet(FileEntryItem item)
   at Duplicati.Library.Main.BackendManager.ThreadRun()
2023-08-31 21:55:32 +02 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Retrying: duplicati-b00c6ef1a7d3d4800b35f7e7638112c0d.dblock.zip.aes (1,002.56 KB)
2023-08-31 21:55:32 +02 - [Profiling-Timer.Begin-Duplicati.Library.Main.Database.ExtensionMethods-ExecuteReader]: Starting - ExecuteReader: SELECT "A"."Hash", "C"."Hash" FROM (SELECT "BlocklistHash"."BlocksetID", "Block"."Hash", * FROM  "BlocklistHash","Block" WHERE  "BlocklistHash"."Hash" = "Block"."Hash" AND "Block"."VolumeID" = 18320) A,  "BlocksetEntry" B, "Block" C WHERE "B"."BlocksetID" = "A"."BlocksetID" AND  "B"."Index" >= ("A"."Index" * 3200) AND "B"."Index" < (("A"."Index" + 1) * 3200) AND "C"."ID" = "B"."BlockID"  ORDER BY "A"."BlocksetID", "B"."Index"

My previous logs also have had Retry-Duplicati.Library.Main.BackendManager-RetryGet but never with these two causes (A task was canceled + HTTP timeout).

Best Alex

gpatel-fr · August 31, 2023, 8:18pm

besides the patch for RetryAfter, nothing that could explain it. As I said, a possibility could be that the 2 problems were causing so much trouble that it was difficult to separate the reasons.

Jojo-1000 · August 31, 2023, 8:45pm

It is good to see that the timeout does work (at least sometimes). I was starting to doubt the entire HTTP request handling. It is maybe a bit counter-intuitive that there is no default timeout, but I’m not sure how that can be improved. Maybe 1h as a default could be long enough that no normal connection will ever hit it?

There could also be a smarter way to do that, maybe detect if nothing at all is transferred for a few minutes, but I don’t know if that is possible.

arberg · September 1, 2023, 6:35am

No timeout is definitely crazy and a bug. It’s ‘normal’ error situation for network calls that you cannot be certain they’ll return. Lots of crazy shit can happen in the network, and Duplicati ought to be resilient towards it all.

I don’t see the big problem with using 1h or less. Esp. if the logs are clear about what caused the disconnect. The logs could even highlight the setting involved instead of just saying cancelled/HTTP timeout. Having something hang for no discernible reason is bound to lead to threads like this, esp. if there are multiple underlying problems and/or the OP sometimes reports incorrectly, as humans tend to do.

It would also be nice if we had an info log with all the important settings (and non of the potentially user private ones), so both I and you could verify from my logs that yes i had indeed turned on the http-timeout when I claimed to. I’m pretty sure I did, but the current situation leaves me wondering… Though of cause as gpatel says, maybe it was Retry-After issue I ran into after turning on the http-timeout.

arberg · September 1, 2023, 7:08am

gpatel I just reviewed your source code update. I don’t know how to comment in GitHub when we don’t have an issue there, and there’s no pull-request, so I’ll write it here.

The hack does have the unfortunate effect that it’ll enforce the Retry-After even across different backup-jobs (assuming the static value survives into the next backup job). Those different backup jobs can have different OneDrive accounts, with different timeouts. Though user has to cancel the current job to get to running another job, so maybe that changes the situation, esp. if cancel of wait is currently impossible. But even if I’m correct in this conclusion, I would still say its far better with the change you made, than the original code if it gets stuck in OneDrive temporary user-ban due to app not obeying Retry-After.

await Task.Delay(delay).ConfigureAwait(false);
It also looks like this cannot be cancelled by user cancelling job (or similar), but I think you know that, if I recall correctly.

gpatel-fr · September 1, 2023, 12:35pm

Yes, unfortunately. Job cancelling is very complicated in Duplicati and it will not be fixed tomorrow.

ts678 · January 15, 2024, 12:55pm

I tracked down the discrepancy between 100 and infinite. The OneDrive timeout changed below:

is the reason I was looking, and I’m wondering if an infinite timeout is inviting network hang issues.

gpatel-fr · January 15, 2024, 4:19pm

I have observed this with the SFTP backend (it has a specific timeout)

ts678 · January 15, 2024, 4:29pm

It also appears (at first glance) to not use HttpClient or WebRequest which is the unification I linked. Basically if anything’s wrong with SFTP, it would seem to be a different problem than http backends.

gpatel-fr · January 15, 2024, 7:57pm

sure; the point that generally a network error can hang a backup if no timeout is set. It seems just wrong, unfortunately I have no clear idea on what to do - especially since there seems to be no way at all to fix it for access that are mediated by the OS (mapped drives…). Probably some sort of general timeout could be cooked up to at least abort a job where nothing happens for 15’. OTOH there are tons of other problems.