Duplicati lock up

Hi,

I am currently using Duplicati (2.0.6.1_beta_2021-05-03) in Windows from the System Tray running on Windows Server 2019. I’m trialling/testing the software and have backups running every 15 minutes of a small MailEnable post office. Every day or so the backup locks up and no longer responds to scheduling or manual execution via the web interface.

I turned on verbose logging and have captured where it stopped on this occasion:

There’s no error message. The status bar at the top says “Backup complete!”… but it isn’t fully complete as normally it will send me a completion email and write that to the log. As noted above it no longer responds to scheduling or manual “run now” commands from the web interface.

In looking at the details of the backup job it says that the last successful backup was at 9:36pm… yet this one stated at 9:45pm, so the backup summary doesn’t regard the latest backup as successful.

In looking at restores, a 9:45pm backup set has been created for restoration purposes.

The previous run of successful backups ran from 3 June 1:38am to 4 June 1:00am. This run of successful backups was from 4 June 7:55am to 4 Jun 9:45pm… so the timing of how long the backups will successfully run for is unpredictable.

I’ll switch over to running Duplicati as a service as I suspect it will avoid this issue. I hope I’ve provided enough information for you to track down why the backups stop.

Thank you.

Welcome to the forum @PaulHop

Some things here don’t line up. The screenshot shows a download not completing. Example good end:

Do these hangs typically see downloads not complete? What Storage Type is on Destination screen?

My backup log (which is by log-file at log-file-log-level-profiling, therefore long) ends with Information line:

2021-06-04 16:42:27 -04 - [Information-Duplicati.Library.Main.Controller-CompletedOperation]: The operation Backup has completed

and this wasn’t until after status bar “Verifying files” finished. The last download of a sample is a dblock.

2021-06-04 16:42:26 -04 - [Information-Duplicati.Library.Main.BasicResults-BackendEvent]: Backend event: Get - Completed: duplicati-b99eba1a5cc5641069c7137dde1cac412.dblock.zip.aes (4.93 MB)

Please check your logs to see if you’re really having downloads hang, however if one gets stuck, then a
Backup complete! status is surprising. As a side note, isn’t your status usually “Next scheduled task”?

Seeing the same sort of thing but haven’t gotten to diagnosing it yet. The “lockup” appears to be the code getting stuck somewhere without a timeout and requires removing Duplicati from memory and starting it again but that’s my educated guess as a programmer and not hard fact until I get around to looking into it. There are potential other reasons.

I don’t know if there are a lot of timeouts other than in network code to the backend.
While I’m asking people about logs and destination, can you add any info on yours?

Can go based off memory as looked into it quickly this week but I need to deal with the one Linux issue first…

Backup starts and is running. Difference from here is that it shows still in progress. Last log entry is about working with *.dindex.zip.aes file and that’s it.

It will never ever move from there. It stays stuck forever until Duplicati is removed from memory. Of course can’t cancel it because the Duplicati cancel/stop cannot do it.

These can very well be two different issues but they are similar in nature. They do both seem to have a file as the last entry as well.

Just to add again there’s various other things that can cause that as well. Just seems like a timeout but possibly a thread is no longer running and the code is stuck or whatever. Could be in 3rd party code even and need to get that fixed or workaround it. Really need to get a debugger on it but I need to set all that up and wait or additional logs…

I think there was a way to get really heavy amount of logs now that I think about that. Might be a bit too taxing for that computer but I could look into trying it. I could be thinking of something else though.

That’s exactly what I’m experiencing. I’m then ending all Duplicati tasks and restarting them to work around the issue. Due to competing priorities I haven’t tried diagnosing it further.

Hi ts678,

Yes the hangs typically see the operation not fully complete, although the status displays as “Backup Complete”. The downloads (i.e. the puts) do complete, it’s a Get on the dblock.zip file where its failing… after which it sends me a completion email.

The destination is a Google Drive.

Normally my status bar will go back to “Next Scheduled Task”, but when the problem occurs it displays “Backup Complete!”.

I have now switched to running Duplicati as a service… but unfortunately the problem is still occurring.

Thank you.

I have enabled file logging at the “Verbose” level. I might end up with a massive log file (23MB for one run) , it’ll be over 1GB when the problem happens again… we’ll see what I find out.

I’m not sure what that would mean, but all the available clues might help. If it hangs again, please check:

About → System info → Server state properties → lastPgEvent

to see if it’s blank of it has “Phase”:“Verify_Running” or something similar in its too-long-for-screen line.

glogg can open big files. For a mainstream editor, EditPad Lite.

You might consider adjusting download levels if you think that’s the problem (but if so, it’s not common).

backup-test-samples can raise downloads, but you can also do just downloads.with the test command.

You can even tie up your bandwidth (but not your Duplicati server) by Export As Command-line, change syntax into choice of test, and let it run separately in Command Prompt to see how downloading goes.

The destination is a Google Drive.

I’ve seen it with local ssh (not out of the network) and possibly with local drive.

Either way, being Google Drive and SSH… it doesn’t appear to be tied to a particular service use but in something shared between them if the two issues are the same code problem.

However, it could be two problems with the same result. Can’t really rule that out until its narrowed down futher.

Narrowing is the usual challenge. Ideal is to get it down to “Steps to reproduce” that any developer can run, then write up “Steps to reproduce” on an Issue (the clearer/simpler the better) and wait for an investigation.

Problems that only show up when there’s no developer waiting to dissect them can be extremely difficult…

There’s often not any built-in logging at the deep-level networking, but a profiling log can sometimes help if the problem is not actually in the networking. That’s as good as it might get, short of doing network tracing.

1 Like

Build from source and a bunch of new well placed logs should do it. When debugging fails or doesn’t work well enough, I fall back to this as it shows exactly where things are going wrong. It should work here as its one of the situations when I use that tactic with.

One way or another it will get solved :slight_smile:

Works well if expert can reproduce the failure. It’s tough in field (though I haven’t asked @PaulHop :wink:).
Have at it for whatever hangs you think you see, and it would be wonderful if you can track a bug down…

Yeah, its more of a developer only thing to do. Will do so this month or next probably. :+1: :+1:

Edit - will try for later this month since I intend to work on Duplicati’s code coming up. Just have some projects to get done first.

It hasn’t fallen over again yet. I’ve been renaming the log file after a backup when it gets large so that the final one isn’t too huge.

I’m wondering if it has something to do with multiple Duplicati instances running at once. In task manager if I start the Duplicati tray it starts two Duplicati instances… even though I’m using the “–no-hosted-server” option… I don’t know why as yet. In any case I only have one instance open at the moment it’s just Duplicati.Server and I’m staying away from starting the tray icon. If it runs without issue for the next couple of days then I’ll conclude that the Duplicati.Server by itself is fine and there is possibly a file locking issue caused by operating the tray icon.

Worst case scenario is that I can crawl through the code and also try it on my development box through the debugger and Xavron and I can compare notes.

Ok, it locked up again.

Here’s the end of a normal execution:

Here’s the end of the incomplete (locked up) execution:

So last time when I logged to the screen it didn’t log the completion of dblock.zip… but it had started it. This time when I’m logging to a file it logs completion of dblock.zip.

From Process Explorer, here’s what is running:

I can’t see any file locking issues in Process Explorer, and the server has plenty of memory and disk available.

It’s possible that the issue is with the notification email. Most people wouldn’t email successful execution to themselves (especially every 15 minutes), so there might be a very infrequently occurring bug there (maybe a race condition). I’ll set it up again to run so I can get a third snapshot of the log to confirm the location of the issue then I think I’ll need to look at the code… I’m not sure when, I’m pretty busy on a project at the moment.

Just some further information… I’ve just opened the Web UI. The backup again is stopped in this state:

When I select the cross next to “Backup complete!” it displays the following:
image

So something thinks that a file is still being uploaded to the server. Note, the backup halted around 4 1/2 hours ago and normally takes less than 2 minutes to complete.

Looks like it might be tracing back to here: duplicati/BackendManager.cs at beaf03562fdcf4425e962085bdf7175d6a465f49 · duplicati/duplicati · GitHub

Plenty of loops here susceptible to forever looping and not completing correctly on anything missed.

Some extra logs added here should see what’s going on.

All the current logs are behind if’s and catch’s so the logging info can currently miss potential issues.

That correlates to the first post also…

get - started…

Never has a get - complete duplicati/BackendManager.cs at beaf03562fdcf4425e962085bdf7175d6a465f49 · duplicati/duplicati · GitHub

and is tied into the above loop:
case OperationType.Get:
DoGet(item);
break;

1 Like

We can discard my “Stop running backup” note above. The software doesn’t still have a file open when it locks up, the message is misleading.

I have found a couple of bugs in the software:

  • Bug 1: StateController.js, line 66… delete the “Backup_WaitForUpload” condition. It has been duplicated from line 54… Wait For Upload does not represent 100% progress through the backup as suggested on this line.

  • Bug 2: StateController.js, line 66 sets lastPgEvent.Phase to “Backup_Complete”. In looking at the backup handler this state is reached when the backup is truly completed, i.e. all files are uploaded. However on line 87 this variable is moved into the variable “txt” and one line 101 anything starting with “Backup_” will display the “Stop running backup” message referred to in my earlier post. This stop running backup message is not correct when the backup is in a state of “Backup_Complete” as the uploads have finished.

Neither of these bugs fix my issue, but I note them here so they can be corrected as I progress through the software.

1 Like

I’m not sure, that might be where you’re having an issue but I think our two issues are slightly different.

Thank you for finding where the get operation is started and completed… I was looking for that! :slight_smile: