Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio and concurrent.futures.ProcessPoolExecutor prevent kernel restarts #274

Closed
jbweston opened this issue Oct 27, 2017 · 9 comments · Fixed by #290
Closed

asyncio and concurrent.futures.ProcessPoolExecutor prevent kernel restarts #274

jbweston opened this issue Oct 27, 2017 · 9 comments · Fixed by #290

Comments

@jbweston
Copy link

The problem is illustrated in this notebook.

If I use the asyncio event loop integration introduced in #21 along with concurrent.futures.ProcessPoolExecutor then hitting the "restart" button from within the notebook interface is broken. The notebook server attempts 5 times to restart the kernel, then gives up and creates a "new" one. The previous kernel process is then orphaned, but still exists on the system (with the init process as the parent).

This is a problem because there is a sizeable delay between hitting "restart" and being able to use a new kernel. Also this produces many orphan processes that take resources and do not go away even once the notebook server is stopped.

@jbweston
Copy link
Author

The problem is not present if any ProcessPoolExecutors are shutdown prior to hitting "restart".

@minrk
Copy link
Member

minrk commented Nov 15, 2017

There's a good chance that jupyter/jupyter_client#279 alleviates this, as it is not an uncommon situation to be in where restart doesn't fully clean up subprocesses and the result is a forked subprocess holding on to its parent's FD that the new kernel needs to restart, ultimately running into EADDRINUSE.

Does it require asyncio and ProcessPool, or just ProcessPool to run into this? It's possible that we can do more to clean up process trees, either here or in jupyter_client.

@jbweston
Copy link
Author

Does it require asyncio and ProcessPool, or just ProcessPool to run into this? It's possible that we can do more to clean up process trees, either here or in jupyter_client.

It requires both, and it requires that the asyncio event loop integration be used with %gui asyncio.
Without this the kernel will happily restart.

@basnijholt
Copy link

@minrk unfortunately jupyter/jupyter_client#279 does not solve the issue (also I am using #281).

@minrk
Copy link
Member

minrk commented Nov 27, 2017

I don't have a fix yet, but I've narrowed this down to a couple of things:

  1. on clean exit, ProcessPools clean up their children
  2. %gui asyncio (%gui anything, actually) prevents clean shutdown, thus preventing atexit cleanup from firing
  3. when clean shutdown fails, jupyter-client kills the kernel more forcefully
  4. when killing the kernel, we do not kill the process group, even though we do have code to do this in signal_kernel, a different path is used for kill.

I think we should fix both gui-registration preventing clean exit (an ipykernel issue) and killing the process group (a jupyter-client issue), but either one ought to fix this.

FWIW, I was able to recover my kernel thanks to jupyter/jupyter_client#279. It does fail to restart initially, but after a few retries it gets new ports and is able to come back.

@basnijholt
Copy link

Fixing it sounds like a lot of work.

But the problem is quite a bit more annoying than you would think; after you restarted sufficiently many times, the whole machine becomes unusable until you either manually kill all processes or just restart my Docker container.

Would all of those things have to be solved to get it to behave?

@minrk
Copy link
Member

minrk commented Dec 12, 2017

Would all of those things have to be solved to get it to behave?

No, I think if any of those is fixed, it will behave in this case. But fixing them all would ensure similar issues don't crop up again.

The first fix should really be in ipykernel, figuring out why we don't exit cleanly for simple cases like this. With that fixed, jupyter-client solidly shutting down process groups is a bit less urgent. It shouldn't be a lot of work, but it's a relatively complex bit of logic. I'll try to find a window to look into it.

@minrk
Copy link
Member

minrk commented Dec 12, 2017

The central issue is the exit is handled here, where we directly stop the tornado eventloop. This isn't enough when another eventloop (asyncio in this case) is also running. Registering an additional hook to stop the integrated eventloop at the same time should do it.

@minrk
Copy link
Member

minrk commented Dec 15, 2017

This should be doubly fixed:

basnijholt added a commit to python-adaptive/adaptive that referenced this issue Feb 15, 2018
See the following issues / MR:
- ipython/ipykernel#274
- ipython/ipykernel#263
- jupyter/jupyter_client#314

ttps://github.com/jupyter/jupyter_client/pull/314# Please enter the commit message for your changes. Lines starting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants