Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pending kernels #593

Merged
merged 5 commits into from
Nov 23, 2021
Merged

Conversation

blink1073
Copy link
Contributor

@blink1073 blink1073 commented Oct 19, 2021

Fixes #592.
Fixes jupyterlab/jupyterlab#11338
Requires jupyter/jupyter_client#712 in order to use the new feature.

We cannot make use_pending_kernels default to True before a major version bump because clients like JupyterLab are expecting kernel process startup errors to be given as part of the response to a POST on /api/sessions.
This PR needs an alternate way of surfacing that error as discussed in #592.
There is also a behavior change of a restart being ignored (and erroring) if the process has not yet started.

  • Do not depend on use_pending_kernels being available so we maintain compat with older jupyter_client versions
  • Wait for the kernel to be ready in the WebSocket handler and catch the error appropriately.
    • Set the execution_state to "dead" and set the "reason" based on the startup error
    • Add handling of the "reason" field to the kernel response model
    • Make sure to clear the "reason" field at kernel startup
  • Cull dead kernels in the kernel culler and add test
  • Do not wait for kernel shutdown when switching kernels in a session when use_pending_kernels is set
  • Better handling of synchronous process startup errors - fixes Error Starting Kernel Message jupyterlab/jupyterlab#11338
  • Make all kernel/session tests work with and without use_pending_kernels
  • Add tests for bad kernelspecs in kernel and sessions API
  • Add prototype in JupyterLab - Fix Handling of WebSocket Startup Errors jupyterlab/jupyterlab#11358

image

@blink1073 blink1073 marked this pull request as draft October 19, 2021 18:15
@blink1073
Copy link
Contributor Author

blink1073 commented Oct 21, 2021

TODO: test how a failing start affects the websocket GET. Also see what JupyterLab does today.

@blink1073
Copy link
Contributor Author

blink1073 commented Oct 23, 2021

Problem

A 500 error on a WebSocket open doesn't give a usable error response in the error Event, so we need another way to handle a kernel startup error.

Possible Solution

Expand the kernel model to include an error message. Currently the kernels that use use_pending_kernels end up in a permanent "starting" execution_state. We could set the execution_state to "dead" and add a "reason" field to the kernel model with the error message captured during startup.

The JupyterLab client code would have to be update to handle an error in the initial WebSocket connect by fetching an updated kernel model and setting its status to "dead" if appropriate, and provide a "reason" property on the kernel object. We then handle the "dead" status in SessionContext and display the "reason" to the user, similar to the message shown for POST errors.

There is already similar logic in classic notebook.

@blink1073
Copy link
Contributor Author

blink1073 commented Oct 23, 2021

TODO:

  • verify how a kernel gets garbage collected. Make sure a kernel that died at startup can be handled in MultiKernelManager.shutdown_all and is handled gracefully by the SessionManager. Dead kernels should be culled by the culler.

@blink1073 blink1073 force-pushed the use-pending-kernels branch from 72b500d to 1b9ca23 Compare October 25, 2021 12:44
@codecov-commenter
Copy link

codecov-commenter commented Oct 28, 2021

Codecov Report

Merging #593 (4c8a054) into master (c5c515b) will increase coverage by 0.37%.
The diff coverage is 85.79%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #593      +/-   ##
==========================================
+ Coverage   77.33%   77.71%   +0.37%     
==========================================
  Files         110      110              
  Lines       10255    10402     +147     
  Branches     1260     1287      +27     
==========================================
+ Hits         7931     8084     +153     
+ Misses       1927     1919       -8     
- Partials      397      399       +2     
Impacted Files Coverage Δ
jupyter_server/base/zmqhandlers.py 59.09% <0.00%> (-1.18%) ⬇️
jupyter_server/pytest_plugin.py 84.97% <52.38%> (-3.10%) ⬇️
jupyter_server/services/kernels/handlers.py 62.40% <58.33%> (+1.58%) ⬆️
jupyter_server/services/kernels/kernelmanager.py 83.06% <92.00%> (+3.27%) ⬆️
jupyter_server/tests/services/kernels/test_api.py 97.60% <92.30%> (+0.57%) ⬆️
jupyter_server/tests/services/sessions/test_api.py 98.16% <95.38%> (-0.75%) ⬇️
jupyter_server/services/sessions/handlers.py 76.99% <100.00%> (+3.84%) ⬆️
jupyter_server/tests/services/kernels/test_cull.py 100.00% <100.00%> (ø)
...yter_server/tests/services/kernelspecs/test_api.py 96.55% <100.00%> (ø)
jupyter_server/base/handlers.py 65.51% <0.00%> (+1.21%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5c515b...4c8a054. Read the comment docs.

@blink1073
Copy link
Contributor Author

Current failures are unrelated (nbconvert). This is ready for discussion at today's meeting.

@blink1073
Copy link
Contributor Author

Note: this can be tested in JupyterLab using pip install --pre --upgrade jupyterlab

@blink1073 blink1073 added this to the 1.11 milestone Nov 19, 2021
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the activity-monitor update Steve - this looks good.

@blink1073
Copy link
Contributor Author

Thanks for the review! I'll make a release with this on Monday

@blink1073
Copy link
Contributor Author

Kicking the PR to pick up the new version of jupyter_client.

@blink1073 blink1073 closed this Nov 22, 2021
@blink1073 blink1073 reopened this Nov 22, 2021
Steven Silvester and others added 2 commits November 22, 2021 13:48
yup

wip

wip

handle getattr

use ensure_future

add more backwards compat

clean up tests

more cleanup

Fix ping handling for shut down kernels

fix handling of kernel startup

lint
@blink1073 blink1073 merged commit 758dba6 into jupyter-server:master Nov 23, 2021
@blink1073 blink1073 deleted the use-pending-kernels branch November 23, 2021 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error Starting Kernel Message Slow Starting Kernels Proposal
3 participants