Duplicati lock up

Most recently on 2.0.5.114 Canary whose code should be identical to 2.0.6.1 Beta.

I posted an edit to last reply citing log with email detail, email info, and backup info.

It sounded like you had a pretty solid problem. Maybe email transcript gives clues?

Thank you.

In looking at it further the missing “The operation Backup has completed” log entries might be a misdirect. The entries are definitely missing from my profiling logs (and the earlier verbose logs) that I’ve captured to file… strangely through, I turned on Live logs in the Web UI and they happily display “The operation Backup has completed”. So there’s definitely a bug in there, but it mightn’t be showcasing the locking issue… it might just be a file stream flushing problem.

FYI… now that I have email notifications turned off Duplicati continues to work.

I just had a look at the email transcript. Unfortunately it doesn’t provide any clues. The problem referred to in the transcript was that Duplicati wasn’t able to send emails due to configuration/parameter errors. In my case Duplicati happily sends emails, in fact because I was sending emails after every backup (and backups were every 15 minutes) I could easily receive 60 emails from Duplicati before a problem occurred.

Was that one of the rare (less than one in 60?) failures? Where did it end on way to:

although earlier I guess you wondered if you got all output. Did you get “Email sent”?
Sorry I don’t have profiling logs today to compare logs more closely with your logs…

Attempting to look at source on GitHub, maybe OnOperationComplete runs OnFinish
which runs SendMessage. After that it says (missing?) “Operation backup complete”.

send-mail command could be used in a loop if you’d like to see if it can have trouble.

1 Like

Sorry, did I miss something in that thread that you sent me… I’ll go back and look a little later when I have time?

Yes, it is a very rare failure.

The code does not produce the output “Email Sent”.

You are correct… OnOperationComplete runs OnFinish which calls SendMessage… that’s where I joined the dots earlier :grinning:

I’ll have another look at that thread you sent through tonight.

I got all the default logs I could out of this and feel like I have to add some to Duplicati source and install that.

There’s enough jumps, loops, and variables involved that its easier to add logs than it is to eyeball it. But will quickly try eyeballing it one more time before that.

Added more logs. Unfortunately, when I went to add logs to DoPut() it would no longer sync and requested a required db repair before it allowed it. Ran the db repair and so far its continuing to sync fine. (Noticed manual sync was failing to complete every other time at that time but no more.)

However, it looks like there are multiple problem points involved here.

For me it was getting stuck often at the following with the top being the last log entry:

Backend event: Put - Failed: duplicati-be18777c522d74254bc730f3409a93b67.dblock.zip.aes (49.93 MB)

Operation Put with file duplicati-be18777c522d74254bc730f3409a93b67.dblock.zip.aes attempt 6 of 6 failed with message: disk I/O error disk I/O error

Additional logs show the loops are never exiting when this happens. Further, DoPut() appears to never exit as no additional logs were being written.

Either that or an Exception without any further logs and with a catch received a big jump out of all the loops but unless I missed something this one doesn’t appear to be the case. And it would have to be not caught by Exception.

So I have to wait now until the problem surfaces again. It is interesting that a db repair has improved that situation.

Still narrowing down but looks close now.

Now at BackupHandler RunAsync() duplicati/BackupHandler.cs at master · duplicati/duplicati · GitHub

Ran into a different end of logs stuck UI that triangulates to here.

But it goes from BackendManager class directly to BackendUploader class and that’s what’s found in the above function.

So next step is to add logs to RunAsync() and wait again.

Just got this. Not much time on analyzing but at least with

Operation Put with file duplicati-ib4e6856fa8a6408e9c406330441f0f2c.dindex.zip.aes attempt 6 of 6 failed with message: disk I/O error disk I/O error

It is exiting from DoWithRetry() with an Exception thrown:

if (retryCount == m_options.NumberOfRetries)
    throw;

Finally {} block runs and for loop exits

if (cancelToken.IsCancellationRequested)
    return false;

The sync UI never exits despite that and 6 out of 6 fails. Nothing else shows up in the current logs after that.

From there I will need further logs and/or locate code that will stop the sync. Depending on where that’s done at.

Additional. This tracks back to:

case DuplicatiOperation.Backup: in /Server/Runner where it starts but the log for it exiting is never reached. Exception log there also doesn’t appear to show up.

Third issue here to track from me. So at least three or four causing the end result so far.

This backup is still “running” and the UI speed is only slightly dropping but its very far past the normal completion time and doesn’t appear to really be doing anything.

Current end of log with UI stuck far past normal backup times...
CommitAddBlockToOutputFlush took 0:00:00:01.640
Starting - CommitAddBlockToOutputFlush

DoPut() ends: Backend event: Put - Completed: duplicati-i85dd6ff4d4944abeb0e50d8edf8cb677.dindex.zip.aes (30.62 KB)

CommitAfterUpload: CommitAfterUpload took 0:00:00:00.046…

DoWithRetry() returns from UploadFileAsync() with finally {} block executing

File changed, adding dir, etc until CommitAddBlockToOutputFlush then
CommitAddBlockToOutputFlush took and that’s it.

I will not be adding more to this thread until I find something that solves these.

Sorry Xavron, I’m snowed under at the moment (more than usual) with tight timelines… I’ll get back to this thread when I can but it will be at least a few days.

No problem :wink:

In the mean time, I narrowed down one problem point to what looks like the final issue for that one anyway. Testing a test fix and don’t really want to post anything until I can confirm a fix works and the problem is as suspected.

Don’t know how it will relate to others. Could only hope it solves all of them but probably won’t. Could cause another problem lol.

Edit - scratch that. It just happened of course after I post no longer being able to reproduce it lol.

This is first occurrence during a backup and where its last seen in my logs (last entry for 31 minutes): duplicati/DataBlockProcessor.cs at master · duplicati/duplicati · GitHub

Preceded by a bunch of loops through the while loop there then CommitAddBlockToOutputFlush & CommitAddBlockToOutputFlush took and then hits that Exception.

31 minutes later it does a few of these: Duplicati.Library.Main.Operation.Backup.FileEnumerationProcess-IncludingPath]: Including path as no filters matched:

The transfer rate continues to drop for hours with no other UI movement.

Also, the code execution according to the logs goes from:
var newBlock = await database.AddBlockAsync(b.HashKey, b.Size, blockvolume.VolumeID);
b.TaskCompletion.TrySetResult(newBlock);

And then right to

catch (Exception ex)

This is also going back to another issue I noticed. The throwing of an Exception is going right back to duplicati/BackupHandler.cs at master · duplicati/duplicati · GitHub

The Task.WhenAll() may not be dealing with Exceptions in a useful way for that.

Either way, the Duplicati gets into a broken state right here.

(Edit - The actual Exception when Duplicati gets stuck is “Disk I/O error” in most if not all recent cases (not related to the below Exception seen). Currently drilling down into that Exception reason further and attempt some changes to see how Duplicati responds. Though the below one appears to at least be passable so far from what I can see. The I/O error is not with current code.)

Definitely Duplicati is using CoCoL project here which is throwing Exceptions (still waiting to see what the Exception is when it gets messed up).

PaulHop above also highlights a line with CoCoL showing in his run through with this on his side Duplicati lock up - #30 by PaulHop

An error I see matches what’s found here:

Alright so 100% reproducible now. Just throw an Exception in the while loop here duplicati/DataBlockProcessor.cs at master · duplicati/duplicati · GitHub

eg

var newBlock = await database.AddBlockAsync(b.HashKey, b.Size, blockvolume.VolumeID);
throw new Exception("Disk I/O error");

and the UI gets completely stuck and the code stops executing beyond what looks like some little itty bitty possibility of code in any other threads outside of there that can. But this completely breaks Duplicati so anything else that just might run is of no importance.

Unfortunately CoCoL is thrown in the mix which might be the problem. Almost seems like some weird issue between CoCoL, Exception throwing, and Task.WhenAll() use.

Throwing an Exception should not cause the code to stop executing but that’s exactly what it appears to do since nothing appears to happen after it.

I will have to try to write CoCoL out of it.

Partially solved! Not CoCoL. Apparently, adding a timeout to WhenAll() allows the UI to continue and the backup to finish albeit in my test as a failed backup but at least it frees up the UI.

So halfway there lol :slight_smile:

Updates added to here for now:

A test to improve Disk i/o error’s bad effect has proven successful (timeout not necessary with this one): Backup fails but completes within normal incremental backup time of a few minutes. Next time the backup runs, it is successful. Both unblock the UI. Still needs further testing.

Two steps to go at this time. To find out the real reason why that error is being thrown (if its SQL or an actual disk problem but either way Duplicati must deal with it) and to make sure all tasks have check and exit’s to the tasks where needed so WhenAll()'s use isn’t horribly negative when an Exception is thrown unexpectedly.

  • Could be mistaken at the moment but the disk i/o error is looking like at least one cause is by Duplicati in the block processing for use with the sqlite db. Will have to run some tests.

  • Definitely looks like it should be the code. Disk i/o error appears with Duplicati’s backup rename event of two of its backup files along with the sqlite-journal in there also (in other words it affects Duplicati’s own stuff and also next backup runs fine so its going to be very interesting).

WhenAll() UI blocking issue, I’m now attaching to this on github Backup job hangs in the middle of the work · Issue #3683 · duplicati/duplicati · GitHub

The “Disk i/o error” am still waiting for Duplicati to rename files which it appears to be connected to. No rename, no “Disk i/o error”. Have a possible fix in testing for that, but until it happens again (its taking its good old time now when I want it most), can’t be sure as the fix may require a bit more.

Either way, happy to see two long standing bad issues (people complained about both of these years ago) with Duplicati coming to an end.

Updates attached to this:

  • The disk i/o error appearance changed when I added the attempted fix. It is definitely throwing at that spot but seemingly unknowingly now. Spotted another log after that where it throws the error then later during the same backup its all of a sudden working again. So this one will take a bit of time between waiting and testing. But looks like this is what causes the renaming to be tried as the log now shows it comes before that.

  • Latest log shows it happened with CommitAfterUpload. Upload was successful so the info should be committed to the db just fine. The db also works fine. I’m currently working on the theory of a race condition with lower performance hardware based on what I see happening in the logs compared to a higher performance one. That could take much longer. Did notice someone years ago felt it was file accessibility but it is not that (I tried to force that and it didn’t cause this issue) - I tested the same setup here as the other computer and it doesn’t happen here. Duplicati should deal with it anyway.

  • Have to put this on pause for now. Technically, its “solved” for me though with the various checks I have it. Just don’t know to what extent. Its at least not to the extent that I want it solved. The backup does complete “successfully” but haven’t looked to see what that actually fully means with the checks alone. Still, Duplicati now works like it should instead of the way it is for everyone else.

1 Like

Further on the disk i/o error (normally seen as causing the lock up for me):

Looks to be only caused by very specific files (and only those files) such as Duplicati db in /AppData on certain systems as far as I can tell. One system doesn’t ever seem to have a problem backing it up but the other one often throws disk i/o on it and that’s one error thrown during the WhenAll() that causes the total lockup of the Duplicati UI.

I intend to look into why certain files do that as there’s probably something Duplicati should be looking at so as to not attempt to back it up, or in the worst case it should at least throw the disk i/o error, then remove that file from the backup and continue - since there’s a ton of stuff in the code here, it would take me a while to sort through what exactly needs to be done, and so I may not go that far with this.

For now, just blocking the files where this happens such as Duplicati’s AppData folder from the backup, appears to fix the situation until a better way to handle it is implemented. I have backups running without anything bad showing up for well over a week now and that wasn’t possible before.

Theory: At least with Duplicati’s db file, it could in theory be getting into a situation where its attempting to write and read the file at the same time. Perhaps with a slow enough system (eg the one where its happening for me), it runs into a problem and disk i/o error gets thrown. Could include or be another difference such as larger backups that I didn’t duplicate on other systems without the problem.

Thanks for continuing to look into this. There have been some forum reports where having Duplicati NOT try to back up its own database area (I guess that’s what you mean) has been suggested. It’s pointless trying to backup the database from the running job, because it’ll be instantly obsolete. It might also suffer some weird interactions (as I think you’re suggesting) where it interferes with itself over something like DB file access…

There was a brief attempt at ignoring Duplicati database files here:

1 Like