Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New KernelLauncher API for kernel discovery system #308

Closed
wants to merge 55 commits into from

Conversation

takluyver
Copy link
Member

@minrk I'd like to get your thoughts on the design of this before I get into integrating this machinery with KernelManager. It's meant to address #301.

The design we worked out in #261 remains: kernel providers are classes, discovered by entry points, which can tell Jupyter about kernel types from different systems (e.g. kernelspecs, conda environments, remote machines...).

The make_manager() method defined in #261 is gone, replaced by launch() and launch_async(). These return kernel launcher objects (better names welcome), which offer a subset of Popen methods, plus get_connection_info(), which returns a dictionary of connection info (the same info you get from a connection file).

Why a new interface? I wanted to use KernelManager, and split off the subclasses QtKernelManager and IOLoopKernelManager as separate functionality outside of the manager. But KernelManager has grown all kinds of complexity, like sending messages over the control channel, which you can do even if you didn't start the kernel. So I plan to make KernelManager work with owned kernels (where it has a KernelLauncher) and non-owned kernels (where it does not).

Async: so far, it has been OK for kernel control to be mostly synchronous. With increasing flexibility in how kernels are launched, this may be more painful. But I don't want to make N kernel providers support M event loops. So, asyncio. Tornado is moving in that direction, there's an asyncio interface for Qt event loops (quamash), and we're planning for our applications to require Python 3 in the next couple of years. I have also implemented an async wrapper, which runs the blocking kernel manager in a separate thread, so kernel providers only need to implement the blocking launcher interface - but there may be efficiency/reliability benefits to implementing an async interface rather than wrapping a blocking interface.

Where next?

  1. Complete the refactoring: make KernelManager use the (blocking) PopenKernelLauncher to launch kernels for code using KernelManager to start kernels. Allow passing a KernelLauncher into a KernelManager, for code using the discovery mechanism to start kernels.
  2. Non-owned kernels: discovery mechanisms, and support creating a KernelManager for an already-running kernel.
  3. New launch protocol: doing this reminded me that the way we bind ports, then release them and start a kernel to bind them again, is cumbersome and error prone. I would like to design a mechanism for the kernel to pick its own random ports and then tell the parent process about them.
  4. KernelLauncher socket(s), AKA return of the revenge of the undead 'kernel nanny' - protocol to ask a remote kernel launcher to deliver signals to a kernel, and for it to notify other clients when the kernel dies.
  5. Capturing stdout/stderr - one day.

Code duplication: The new launcher2 module contains quite a bit of duplicated code for launching kernels in a subprocess. The steps for launching a kernel are currently split between the manager, connect and launcher modules, and pulling them altogether was the only way to get a clear view of what's actually happening. I hope to eliminate the duplication again, but it's non-trivial because the code is written with a lot more flexibility than it probably needs.

Version 6: These changes will become jupyter_client 6.0.

takluyver and others added 25 commits October 9, 2017 15:23
MetaKernelFinder -> KernelFinder
Prototype new kernel discovery machinery
The old URL points to a "This page has moved"-page
Updated URL for Jupyter Kernels in other languages
- use IOLoop.current over IOLoop.instance
- drop removed `loop` arg from PeriodicCallback
- deprecate now-unused IOLoopKernelRestarter.loop
- interrupt_mode="signal" is the default and current behaviour
- With interrupt_mode="message", instead of a signal, a
  `interrupt_request` message on the control port will be sent
Additional to the actual signal, send a message on the control port
@takluyver takluyver added this to the 6.0 milestone Nov 30, 2017
@takluyver
Copy link
Member Author

The test failure on Python 3.3 is due to a problem with pytest: pytest-dev/pytest#2966

The latest pytest release dropped support for Python 3.3, but a packaging problem as yet unknown means that that is not showing up in the metadata, so pip tries and fails to install the latest version.

As this branch is intended to be for jupyter_client 6.0, I'm inclined to drop the 3.3 tests, but we may need to work around it for 5.x if pytest doesn't fix it.

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like the simple launcher API.

So I plan to make KernelManager work with owned kernels (where it has a KernelLauncher) and non-owned kernels (where it does not).

I'm not sure about this. I think KernelManager should only work with owned Kernels. Everything that works with remote Kernels should be part of KernelClient. So rather than having Manager, Launcher, and Client, we should have just two: Manager + Client in early forms, or Launcher + Client in this new API. The primary motivation for the original KernelNanny proposal was for KernelClient to get all functionality for dealing with a Kernel, regardless of remote or local (interrupt, restart being the main missing pieces), and KernelManager would only be the implementation of managing a local process. This new Launcher API could take us there but it seems to me like it should mean dropping KernelManager entirely, rather than adding a third API. What do you think?

Or do you think there's enough logic that belongs in KernelManager and not in Client that Manager should get these extra abstractions around Launchers and stick around?

Async

👍 I'd love lots of test coverage for the new APIs.

Code duplication

I think code duplication is a good route to go for an upgrade to a new API. It allows us to clearly isolate and improve what the new implementation does without fear of breaking what the older implementations did. And gives us a clearer path to deprecation and eventual removal of the older APIs.

def wait(self):
"""Wait for the kernel process to exit.
"""
raise NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is only this method NotImplemented, while the others pass?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A spec for the return value of wait would be useful.

"""
buf = os.urandom(16)
return u'-'.join(b2a_hex(x).decode('ascii') for x in (
buf[:4], buf[4:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 + 4 = 8, not 16. Typo?

return (yield from self.in_default_executor(self.wrapped.launch))

@asyncio.coroutine
def wait(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure we inherit docstrings

@minrk
Copy link
Member

minrk commented Dec 4, 2017

Also +1 to dropping Python 3.3 in 6.0

@takluyver
Copy link
Member Author

It does make sense for KernelManager to be for only an owned kernel. I'll have a look at moving some pieces from KernelManager to KernelClient. Maybe that will be enough to allow unifying kernel launchers with kernel managers.

@takluyver
Copy link
Member Author

Thanks Yuvi. To clarify, would that be one kernel per container(/pod), so docker run ... or an equivalent command would start the kernel directly? Or would the container be a longer-lasting thing that can start and stop multiple kernels inside itself?

@yuvipanda
Copy link

yuvipanda commented Feb 12, 2018 via email

@rgbkrk
Copy link
Member

rgbkrk commented Feb 13, 2018

I should probably watching this PR with more regularity and contribute / review where I can. 😄

@takluyver
Copy link
Member Author

@yuvipanda I've made a rough prototype kernel provider to start a docker container locally and connect to it: https://github.com/takluyver/jupyter_docker_kernels

This is very much a prototype, and it uses docker directly rather than any of the higher level management tools, but hopefully it gives you some idea of what it would take to use this API for docker.

@takluyver
Copy link
Member Author

I'm starting to wonder whether, rather than having KernelManager2, KernelClient2, JupyterConsoleApp2 inside jupyter_client version 6, we should develop these new APIs as a separate package with a new name (like jupyter_client2). That might give us more freedom to make some releases while we're still experimenting with the APIs.

@jankatins
Copy link

jankatins commented Feb 20, 2018

I converted https://github.com/Cadair/jupyter_environment_kernels to use the new infrastructure: Cadair/jupyter_environment_kernels#35

I've implemented it with two providers: one for conda, one for virtualenv. conda actually searches for python and IRkernel kernels.

Here are some observations:

  • Currently we use the normal LoggingConfigureable and traitlets to configure the environment discovering (blacklist envs, set some paths, ...), but this is not anymore possible as it seems the two metaclases clash.

    For now I have locally removed the ABCMeta from KernelProviderBase to make it work. Not sure if there is a better way. I would really love to keep using traitlets as a config mechanism in the environment kernel providers.

    Error:

Error loading kernel provider
Traceback (most recent call last):
  File "/home/js/external/jupyter_client/jupyter_client/discovery.py", line 133, in from_entrypoints
    provider = ep.load()()  # Load and instantiate
  File "/home/js/.binaries/miniconda3/envs/environment-kernel-test/lib/python3.6/site-packages/entrypoints.py", line 77, in load
    mod = import_module(self.module_name)
  File "/home/js/.binaries/miniconda3/envs/environment-kernel-test/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/js/external/jupyter_environment_kernels/environment_kernels/env_kernel_provider.py", line 19, in <module>
    class BaseEnvironmentKernelProvider(KernelProviderBase, LoggingConfigurable):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
  • We periodically update the list of environment kernels in a tornado background task, as this takes quite a while (calling conda, iterating over several dirs and try to start a py/r kernel in it). This is fine when we are running in a notebook server, but does not make sense when e.g. started via jupyter kernel whatever. I think it would be nice if the KernelFinder could take over the triggering (via a explicit start_updater() or so which the notebook should call) and the KernelProviderBase gets a update_cache() method which would do the updating in the background. Another idea would be that the KernelProviderBase gets a keyword arg longrunning or so, that the provider can implement the updater itself in case the calling app needs it.

  • Activating an environment before running a kernel is now much cleaner :-)

  • Why are the ressource dir not anymore part of the interface? I think a logo is needed and IRkernel has some nice javascript files which enhance the keyboard layout in an R notebook (that will be an interesting challenge to get these served form a SSH kernel or a docker container :-) ).

  • I've locally implemented jupyter kernel --list to get a list of all available kernels. Would you take such a change?

env.pop('PYTHONEXECUTABLE', None)

if extra_env:
print(extra_env)
Copy link

@jankatins jankatins Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug print...

@kevin-bates
Copy link
Member

Sorry for the timing (and verbosity)...

tl;dr We're doing similar things in order to support remote kernels and very interested in this effort.

Regarding the use case for remote kernels, this is almost entirely what Jupyter Enterprise Gateway provides - remote kernel management.

Kernels used in data sciences tend to consume large amounts of resources. By supporting remote kernels (accessed via remote notebooks using NB2KG) we're able to better leverage cluster resources by spreading the kernels across the cluster. Since we cater to Spark-based analytics, we currently support the YARN resource manager in both client and cluster mode. In cluster mode, we let YARN determine where the kernel is going to land within the cluster. We also have an ssh-based "distributed" implementation.

Because we want to avoid having to modify kernels, we wrap kernels in language-specific kernel launchers (yes, overloaded). I believe these provide similar functionality to the Nanny functionality. We also create a 6th "communication" port that is used for invoking interrupts and another means of conveying shutdown actions, etc. The message-based interrupts give us most of this functionality, but not all kernels support those.

Each of the types of Resource Managers can be plugged into the framework via the notion of process proxies - which, I believe, are akin to the kernel providers and abstracts the process. The process proxy is responsible for startup confirmation (discovery), monitoring (poll), interrupt conveyance and termination (should the normal mechanisms fail). Which kind of process proxy should be used for a given request is conveyed via extensions to the kernelspec format. These extensions also provide a means of conveying per-kernel configuration values - which I believe is similar to the newly added metadata entry.

If you look into our repo, you'll notice we derive from KernelManger, MultiKernelManager and KernelSpec in order to facilitate this plugability and discovery into the existing framework. Our process-proxy instance essentially replaces the 'proc' set to self.kernel during launch.

As a result, we are very interested in this work.

@jankatins
Copy link

@takluyver What is the roadmap for this and https://github.com/takluyver/jupyter_kernel_mgmt + https://github.com/takluyver/jupyter_protocol ? Will jupyer_client go away?

Also: how do I get jupyter_kernel_mgmt into a normal notebook server?

@takluyver
Copy link
Member Author

@takluyver What is the roadmap for this and https://github.com/takluyver/jupyter_kernel_mgmt + https://github.com/takluyver/jupyter_protocol ? Will jupyer_client go away?

My thinking is that jupyter_client won't go beyond version 5.x releases, and the two new packages (jupyter_protocol and jupyter_kernel_mgmt) will gradually replace it. I couldn't keep enough stuff in my head at once to build what I think we need while respecting backwards compatibility, so there's an API break, and downstream code will have to adapt to use the new system.

Also: how do I get jupyter_kernel_mgmt into a normal notebook server?

It will need changes to the notebook server code. And there's an extra bit that I haven't really worked out for the notebook server: how to pick a kernel to start when opening an existing notebook. I think this is at the heart of why so many people get confused by our current system.

I'm planning to start trying to integrate this system into nbconvert first, because that's a relatively simple, self-contained use case for running a kernel.

@jankatins
Copy link

jankatins commented May 7, 2018

@takluyver Could you specify what "won't go beyond version 5.x releases" means?

I'm trying to decide if it makes sense to already integrate this into jupyter_environment_kernels nowish or if I should wait (e.g. until it is testable in a jupyter notebook -> we have a use case that we want to update the list during runtime as we want new environments to be picked up during the lifetime of a notebook server).

@takluyver
Copy link
Member Author

I mean that, if we follow this plan, there will probably never be a jupyter_client version 6, but we'll likely still do some more bugfix releases of jupyter_client.

I'd be keen for you to try updating jupyter_environment_kernels to the new API, to see if it makes sense for someone other than me. See jupyter_ssh_kernels and jupyter_docker_kernels for examples of how it can work. But it's not something you can give to users yet, and the APIs might still change before it's ready.

@jankatins
Copy link

Ok, will try to get that done and only do tests with the jupyter kernel --list or so.

@takluyver
Copy link
Member Author

Thanks! I'm working now on integrating it with a branch of nbconvert for a more interesting test case. Once that's working, I'll also get those packages on PyPI so that it's a bit easier to test with them.

Feel free to ask about the new APIs; it's all still rather messy, and I haven't written much about it. The readmes of j_protocol and j_kernel_mgmt have a few details.

@takluyver
Copy link
Member Author

@jankatins I've now got a branch of nbconvert working with the jupyter_kernel_mgmt API instead of jupyter_client: jupyter/nbconvert@master...takluyver:jupyter_kernel_mgmt

It needs an up to date version of jupyter_kernel_mgmt from my Github repo, because I've been discovering and fixing problems in that code as I worked on nbconvert. There were some hard to debug async issues to work out.

@takluyver
Copy link
Member Author

And I've just put jupyter_protocol and jupyter_kernel_mgmt on PyPI, both at version 0.1 to emphasise that they're not yet stable.

@mpacer
Copy link
Member

mpacer commented May 8, 2018

@takluyver if there are hard-to-debug async issues… would it be easier if we were to use stdlib asyncio and async/await and make the new kernel mechanisms (jupyter_protocol and jupyter_kernel_mgmt) python3 only?

From what I understood notebook 6.0 is going to be python3 only & ipython >6 is already python3.

On top of that… the biggest reason I could imagine wanting to keep it python2 compatible is if we wanted to move all our python2 dependencies to use the new kernel management system and deprecate the jupyter_client system. Maybe I'm being pessimistic, but it seems unlikely that we're going to be able to completely drop jupyter_client support before 2020. That means we could leave jupyter_client to be the mechanism for people wanting python2 support to use kernels, and new things could instead use the great new python3 only libraries :).

@takluyver
Copy link
Member Author

Yup, I agree, and I'm actually already using asyncio. The two new packages currently require Python 3.4 or above. In fact the code is a bit of an ugly mixture of asyncio parts and tornado parts, since they now run on the same event loop. pyzmq has more convenient integration (ZMQStream) with tornado's API than it does with asyncio's.

The main problem that I eventually figured out was the classic ZMQ slow subscriber issue, where you miss some messages on a PUB-SUB socket because they're sent before the subscription updates. We were declaring the client 'ready' when it got a kernel info reply, but that doesn't necessarily mean the iopub socket is getting output. So I now made it repeatedly send kernel_info_requests until it gets something (probably a status message) on iopub. That does rely on the kernel sending status messages.

@SylvainCorlay
Copy link
Member

@takluyver I presume these kernel_discovery PRs to jupyter_client ought to be closed now that this work is being doine in kernel_mngt and jupyter_protocol?

@blink1073
Copy link
Contributor

Thanks again for pushing on this @takluyver. Closing in favor of the Kernel Provisioning and Parameterized Kernel Launch work.

@blink1073 blink1073 closed this Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.