Activate downloaded upgrade sometimes fail


#21

No, it is not turned on by default. I modified our Windows installer to always install as a service, along with those recovery options:

Since Duplicati does not support a Windows service from the installer, you have to manually install it yourself, I guess the user is also responsible for defining their own recovery settings


#22

Since we have two different reports of the same behavior, I think the problem is not with the debugging stuff.

I think that what happens is that the first WindowsService is doing nothing other than spawn the second WindowsService. Once the second instance starts, it will register with the Windows Service stuff and work as it should. When this second instance then quits, the first instance correctly restarts it, but this is no good as it cannot re-register with the service manager and this stops it all.

This is consistent with the error message, which I have seen when the child process does not register with the service manager, or stops responding to service manager requests.

If I am correct, then we cannot use my fix to restart the service.

One solution would be to not allow WindowsService to use the autoupdater, so it would always run the base install version. When it starts Duplicati.Server it will then auto-start the updated version of Duplicati.Server and restarts will happen in the first instance of Duplicati.Server.

If @ts678 or @mikaelmello has a test setup, you can try this by simply changing this call to go directly to RealMain:

Change that line to:

return RealMain(args);

Not sure the ping-pong system works in this setup, but if it does, then that is a fairly simple fix.

Does it work well enough that we should add something like that to the default installer?

I know many users requests this.


#23

If @ts678 or @mikaelmello has a test setup, you can try this by simply changing this call to go directly to RealMain :

I’m going to test it.

Does it work well enough that we should add something like that to the default installer?

Yes it does. It is installed and initialized on install, stopped and removed on uninstall, etc, etc. However I had to add --no-hosted-server to the TrayIcon shortcusts, besides the fact that they are decoupled from the service, so when you uninstall Duplicati with the TrayIcon running, it won’t delete some files being used by the TrayIcon.

Also, I have no idea of how to make ‘install as a service’ optional along with the different commands for the shortcuts. I just went full service


#24

By the way, how did you paste embed Github code here?


#25

Works wonders @kenkendk


#26

Proof-of-concept working fine here, however RealMain returns void, so I had to hardcode a return 0 from Main. Tested service start and stop (previously in forcing the dual-Server mode, I think the “shutdown” never made it into the child Server – and I haven’t let this sit long enough to know if ping-pong is healthy). I did run an Install, Activate, and did my turned-off-while-debugging scheduled backup. So all feels somewhat better in many ways.


#27

Same here. Tested a lot of operations and all is running well up until now. How do I know if ping-pong is working as it should?


#28

I might be wrong about this, but you can see the code waiting for exit at https://github.com/duplicati/duplicati/blob/9bc1fe636a7dd31f3755502d74ab5ab4f028fe07/Duplicati/Service/Runner.cs#L103 is combining exit waiting with a 15 minute “ping” for life, and I think there’s a minute allowed for a pong reply in PingProcess(); Possibly all is well because more time has passed while I’m typing this, and I still see my Duplicati processes. Besides, if forwarding to the child were broken, “shutdown” wouldn’t have made it through to do service stop.


#29

Well, it didn’t made it through here. Shutdown is not working, WindowsService is constantly spawning a new Server process that is then terminated, possibly because there already is a Server listening on port 8200, also probably from the previous service I shutdown to test the proof-of-concept a second time. Possibly related

Edit: I terminated all processes related to Duplicati and started the service again from scratch. WindowsService launched a Server that launched a Server, all is well. Then I stopped the service from the Services application and all three processes were properly terminated. So it probably works, don’t know why the above happened though.


#30

A closer look at Task Manager showed three Server processes, and Process Explorer red/green highlighting shows a continual respawn. I wonder how things held together long enough to do a backup? Needs a look…


#31

I managed to understand why it worked and it didn’t.

  1. Server started when service started: Works as expected, ping-pong and shutdown events are always working.
  2. Server started after update: the I/O fails, it seems the newest Server processes are kind of detached from Windows Service. So when I stop the service, the Server processes continue to run.

Case 1

Case 2, the service was turned off after a update that launched the ‘detached’ Server processes.


I’m curious because from my perspective the code being executed is the same, or is it not? How do I check what is being logged with m_reportMessage()?


#32

(If you like, pick up your old install options from Windows service display) uninstall, install with --debug-service to get events made (although there aren’t a whole lot of them). Still testing here, but at least service stop was OK here (all Duplicati processes gone) whether or not 2.0.3.13 update was installed, however the case where I ran Install then Activate started looking normal, then started spewing Server after minutes. Need to time that better. Maybe a ping problem, and doing shutdown did not terminate all Duplicati. Server processes remained, the original 2.0.3.12, and the original 2.0.3.13 after Activate. Cycling of additional Server processes did end…


#33

What I do is:

  • go to the file contents on GitHub
  • click the first line number (actual line NUMBER on the left) which will show an ellipses next to it
  • shift click the last line number (highlights the block)
  • click the ellipses and select “Copy permalink”
    image

#34

I haven’t timed it yet. I had an idea that I would try to launch Server from Command Prompt, talk to it, kill the child, change the exit code (by debugger) that parent saw (to get a restart), and see if it still answered ping. Unfortunately even a pure base 2.0.3.12 fails to talk to me, which is a lot different than when I explained I/O. Forwarder was redone here which might be 2.0.3.8, and I think I was running a 2.0.3.6 base for a long time. Regardless of what happens on Command Prompt, I wonder if the code here will work out if run repeatedly, because the value of proc will change each time as the old one goes away and a new one gets generated.


#35

Constant Server respawning happens for me after the 15 minute ping timer from WindowsService to Server expires, provided the Server is not the original one which pings OK. See my speculation above that maybe forwarder code isn’t coded to run twice. I’m not certain why second Server pair dies, but a port 8200 conflict would be a plausible guess because lock conflict returns exit 200, whereas I’m getting exit 100 from the new child Server, passing up through new parent Server. The question was why the ping failure didn’t kill the old parent Server (and I wonder what would become of old child Server?), however the reason seems to be that the incomplete read task after 1 minute tries to be destroyed at “using” block end, but can’t be. I’m not sure why m_writeMessage didn’t log this to the event log (it logged Starting event but hasn’t logged later events), however debugger Output (which might be Console.Writeline) has this to say from the exception catch block:

16:59:25.336 Exception thrown: ‘System.InvalidOperationException’ in mscorlib.dll
16:59:25.336 Additional information: A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or Canceled).

What this means is that the stack of original WindowsService, original Server, and updated Server is still hanging around, however WindowsService thinks Server went unresponsive, failed to kill it, and started restarting additional Server pairs which come up, go down, and keep on doing this (rate-limited though).

Note that all this is happening on the modified proof-of-concept version that’s running RealMain directly. Fixing forwarding was said to be a “fairly simple fix”, but other fixes include read timeout exception, return code out of Main, and possibly the oddity about why --debug-service seems to only work for a short time.

Recap of setup is base 2.0.3.12 with a test 2.0.3.13 WindowsService set for better debuggability, calling RealMain from Main. Remove 2.0.3.13 update, start service, Install 2.0.3.13, Activate, wait > 15 minutes before respawning extra Server stack starts, but forwarding (e.g. to shutdown) was probably bad earlier.


#36

Is this specific to the updates not activating or is it a general issue you’re seeing (this deserving it’s own thread)?

Personally, I haven’t seen the symptoms you describe - or perhaps I’m just not noticing them…


#37

To reproduce the behavior you must have a Duplicati instance with the latest WindowsService (2.0.13) and with the changes described above.

Then, when you must download the update and activate it. The Windows Service will kill the original Server processes and launch a new pair. This new pair is not responsive to the parent Windows Service, therefore WindowsService will fail to ping after some time and terminate. However, the new pair of Server processes will stay alive because WindowsService fails to shutdown them.


#38

This is tough to answer because there are several errors, however at main level @mikaelmello is correct.

First, see earlier disclaimer about the test being a proof-of-concept aimed at getting update activation, and actually getting it, but,then having issues. So overall the test isn’t a general issue, but some results may be.

The seeming mis-handling (exception) of the read timeout seems possible but unlikely in an ordinary build, however the lack of response could probably be forced by “Suspend” of a Server, e.g. in Process Explorer. Basically, error handling of rare errors can often have latent defects. There are probably more in Duplicati, contributing to the situation where things work most of the time, but every now and then a rare issue arises. Because the exception I got isn’t easy to see even if it’s raised, I tried a symptom search but saw no others.

The possible mis-handling of --debug-service is also obscure, but it might also be visible on a normal build, and now would be a good time to compare notes with @mikaelmello because I had suggested using that.

I’m not sure how best to handle obscure issues. Whether in forum or GitHub, no instant action seems likely, and even if I made an exact recipe for you to reproduce, would it help? I guess confirmation is always good,


#39

WIth --debug-service activated, the logs go as far as m_reportMessage(string.Format("Starting process {0} with cmd args {1}", exec, cmdargs), false);, fair enough.

However when I activate an update, nothing else is logged until I stop the service.


#40

We’re likely seeing the same oddity which just hampers debugging. My surprisingly-last event message:

Starting process C:\Program Files\Duplicati 2\Duplicati.Server.exe with cmd args "--ping-pong-keepalive=true" "--log-file=C:\ProgramData\Duplicati\server.log" "--log-level=Profiling" --unencrypted-database

See how --debug-service isn’t passed to Server (where it wouldn’t make sense anyway). Service config:

"C:\Program Files\Duplicati 2\Duplicati.WindowsService.exe" "--log-file=C:\ProgramData\Duplicati\server.log" "--log-level=Profiling" --unencrypted-database --debug-service

But this is enough of a diversion from the real article topic, so I think I’m just going to let it be a mystery.

In terms of article topic (activate of upgrade), it looks like plan A (WindowsService restart) and plan B (Server restart) are both having issues, however possibly the second path is more fixable than service re-registration. Meanwhile, I kind of wonder if most people would be better off if the 2.0.3.13 fix just got retracted until it’s well.