You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: Before starting a kernel, Jupyter picks some unused port numbers and writes them into the connection file. Then it releases those port numbers, so the starting kernel can use them. During that interval, some other process opening a socket connection can allocate one (or more) of the ports. The starting kernel finds a port blocked and runs into some error condition. In the best of cases, Jupyter detects that the kernel doesn't come up and restarts it. But the restart uses the same connection file and port numbers, so the problem persists.
This is a reincarnation of jupyter-server/kernel_gateway#131, which was closed there because it is an upstream problem. @minrk suggested to restart the kernel with newly selected port numbers if this error condition - failure on initial start of the kernel - is detected.
Unless there are other suggestions to deal with the situation, I'd like to work on a PR that implements this behavior. I'll be on vacation next week, but I'm opening the issue already now, to gather as much feedback as possible.
Looking at the code, I think that the ConnectionFileMixin in connect.py will need some attention. It has to remember which of the port numbers are configured and which are chosen at random, so that only the latter are regenerated.
Another place I need to look at is the KernelManager in manager.py. I thought about deleting the kernel and starting a different one, with possibly a different ID. But there are event handling methods in the class, so it's probably easier to keep the instance and generate a new connection file. Maybe I should introduce a relaunch_kernel, in addition to start_kernel and restart_kernel?
But I'm not sure where to detect and handle the actual error condition. Is this something that should be done in jupyter_client itself? Or rather leave it to notebook, kernel_gateway, and other applications using the jupyter_client? Your suggestions would be most welcome :-)
The text was updated successfully, but these errors were encountered:
consoleapp.py starts kernels, maybe that's the place to detect and handle failure of initial kernel startup. jupyter/notebook and jupyter/kernel_gateway call the start_kernel method in the kernel manager though.
Maybe I should write re-usable relaunch logic that can be called explicitly from different places, instead of trying to change the default behavior.
There's a design limitation to the solution in PR #279. It relies on detecting whether a kernel is alive, which in turn relies on the heartbeat ZMQ channel, afaik. If the port for the heartbeat channel is taken by another kernel, and that other kernel happens to run its own heartbeat channel there, then the wrong kernel will be probed.
Summary: Before starting a kernel, Jupyter picks some unused port numbers and writes them into the connection file. Then it releases those port numbers, so the starting kernel can use them. During that interval, some other process opening a socket connection can allocate one (or more) of the ports. The starting kernel finds a port blocked and runs into some error condition. In the best of cases, Jupyter detects that the kernel doesn't come up and restarts it. But the restart uses the same connection file and port numbers, so the problem persists.
This is a reincarnation of jupyter-server/kernel_gateway#131, which was closed there because it is an upstream problem. @minrk suggested to restart the kernel with newly selected port numbers if this error condition - failure on initial start of the kernel - is detected.
Unless there are other suggestions to deal with the situation, I'd like to work on a PR that implements this behavior. I'll be on vacation next week, but I'm opening the issue already now, to gather as much feedback as possible.
Looking at the code, I think that the
ConnectionFileMixin
in connect.py will need some attention. It has to remember which of the port numbers are configured and which are chosen at random, so that only the latter are regenerated.Another place I need to look at is the
KernelManager
in manager.py. I thought about deleting the kernel and starting a different one, with possibly a different ID. But there are event handling methods in the class, so it's probably easier to keep the instance and generate a new connection file. Maybe I should introduce arelaunch_kernel
, in addition tostart_kernel
andrestart_kernel
?But I'm not sure where to detect and handle the actual error condition. Is this something that should be done in jupyter_client itself? Or rather leave it to notebook, kernel_gateway, and other applications using the jupyter_client? Your suggestions would be most welcome :-)
The text was updated successfully, but these errors were encountered: