Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use new kernel management APIs in notebook server #4170

Closed
wants to merge 50 commits into from

Conversation

takluyver
Copy link
Member

This adapts the notebook server to use jupyter_kernel_mgmt and jupyter_protocol for interacting with kernels. This includes the 'kernel providers' discovery mechanism.

The discovery was intended as one half of a reworking of how kernels are chosen for notebooks. I don't think it makes sense to record the kernel name in notebook metadata: that name might not mean the same thing on your computer as it does on mine. But I don't yet have a good answer for how it should choose which kernel to start for a notebook.

A lot of the incidental changes here are because the kernel startup now includes getting the client connected; this exposed a number of race conditions where things happened before the kernel finished starting.

The inheritance stack was making this harder to follow.
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takluyver - I really like where this heading. Just had a few comments, mostly in the area of consumability and backwards compatibility.

if self.kernel_providers:
self.kernel_finder = KernelFinder(self.kernel_providers)
else:
self.kernel_finder = KernelFinder.from_entrypoints()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little awkward to me. How would an application be able to have providers from both the property and the entrypoints? If there was a way to get a list of entrypoint-based providers, then you could require the property be the composite.

If there was a separate way to get the providers from entrypoints, then you could just call the constructor and, if the argument is None, have it call from_entrypoints(), else the argument is the complete set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was essentially just for testing, though I was planning to add some sort of inclusion/exclusion lists for real configurability. The idea is that it will always normally discover kernel providers from entry points, but in certain odd situations (like testing) you might want to override it so that you can completely control kernel discovery.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I would recommend these approaches get pushed into the constructor and let it determine how to compose/filter the providers.

"""
if path is not None:
kwargs['cwd'] = self.cwd_for_path(path)
kernel_id = str(uuid.uuid4())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore the call to new_kernel_id(). It's important that applications be able to specify their own kernel_id. Another option may be to let the kernel provider determine its kernel id - assuming the provider also has access to kwargs (see next comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be easy enough. Can you point me to the code where you're overriding it, out of interest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code confuses me a bit. Normally the point of a MappingKernelManager is to manage multiple kernels with different IDs, but if you're pulling the kernel ID from an environment variable, it will always be the same (unless it's being changed in process, which hopefully isn't the case). So you can only ever have one kernel in the manager? So why do you need the MappingKernelManager?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for the delayed response)
The MappingKernelManager (well, actually, the previous MultiKernelManager) is responsible for equating a kernel manager instance with a kernel_id.

The environment for which this value is derived is a portion of the kernel creation request that comes from clients of Enterprise Gateway. In these applications, clients can influence the request. It is not the environment of the process itself.

As was noted in the next comment, by specifying things like kernel_id and kernel namespace, for example, clients can "seed" environments in which their kernels will be invoked. Since kernel_id is a perfect "key", allowing clients to specify their kernel_id is critical to enabling them to build applications as they desire based on that value.

Please restore that functionality as it is current behavior.

elif '/' not in kernel_name:
kernel_name = 'spec/' + kernel_name

kernel = KernelInterface(kernel_name, self.kernel_finder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add kwargs to the KernelInterface() constructor. The gateway projects (Kernel and Enterprise) allow for users to specify arguments that influence kernel creation. For example, users can specify a pre-existing namespace (via KERNEL_NAMESPACE) that instructs Enterprise Gateway to launch kubernetes-based kernels into the specified namespace rather than having EG create one for them. Since KernelInterface() is the "path" to the kernel provider, we should flow kwargs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I may think more about the MappingKernelManager and KernelInterface APIs, and I'll try to keep this in mind if I do. I'm not entirely happy with them at present.

@takluyver
Copy link
Member Author

Just thought about this a bit more in the context of a discussion on nbconvert.

I think the crucial question is whether and how we associate notebooks with an environment.

Normal code files are associated with a language by their file extension (.py -> Python) and if they're designed to be a command-line entry point, can be associated with an interpreter (~= environment) by a shebang line like #!/usr/bin/python. They can also be associated with whatever environment you're currently in by using #!/usr/bin/env python, and you can always override the shebang by running with an explicit interpreter: path/to/python foo.py. Shebang lines are often rewritten during installation (e.g. by pip or conda), because it's understood that they are only meaningful on a specific system.

Notebooks are associated with a language by language info in the notebook metadata. This doesn't need to change. Code will only make sense in the right language, and the claim that 'this is Python code' means the same thing everywhere. This is our equivalent of a .py extension.

We originally designed kernelspecs as if there would be one per language (or per major language version, like Python 2 or 3). Their evolution has led to them often being used to distinguish environments for the same language. So the kernelspec name in the notebook metadata effectively associates a notebook with an environment, like a shebang line.

This leads to some problems:

  • Lots of people don't know about the kernelspec name in the notebook metadata, and consequently don't expect a notebook to be tied to an environment. Unlike a shebang, it's added automatically whenever you save the file, and it's not visible in the editor.
  • It's not very obvious how to override the association if you want to run it in a different environment. Different tools have different mechanisms (e.g. nbval's --current-env flag).
  • Kernelspec names are even more brittle than shebangs: a shebang only depends on the filesystem path, whereas kernelspecs depend on the environment and go through an extra layer of indirection.
  • Notebooks are distributed without any install step where a tool could modify the metadata (like rewriting a shebang).

I think we should at least get rid of automatically saving an association with a particular environment. That leaves a couple of questions:

  1. What happens when you open a notebook and there are multiple kernels available for its language? For command line tools like nbconvert, I'm inclined to say the user should be forced to pick. But that seems awkward for interactive use. Should we remember locally an association between paths and kernels (another kind of hidden state)? Should we have some notion of a default kernel for each language - and if so how is that chosen?
  2. Should there be some way of explicitly associating a notebook or a directory or a 'project' (however we define that) with a particular kernel? E.g. if you do jupyter kernel-assoc . conda/foo, all notebooks opened or created under the CWD would have conda/foo as the default kernel? Where would this be stored, and how would we make it visible?

@dhirschfeld
Copy link
Contributor

I'm inclined to say the user should be forced to pick

I think that's best in all cases. Yes, that does introduce some friction in interactive use but I think a well designed UX can alleviate that problem.

  • If each notebook stores only the language it uses then when opening it for the first time a dialog can pop up listing all the kernels available on the system for that language
  • As mentioned elsewhere, the dialog can have a remember my choice which would store the kernel association locally
  • There would need to be a UI to view/change/delete kernel associations
  • If the association isn't stored permanently it can be remembered such that upon opening the same notebook the last kernel used is already selected as the default

`KernelInterface` uses `kernel_type` instead of `kernel_name` which deviates from the previous `KernelManager`.  As a result, enabling culling in the "kernel provider era" triggered exceptions once an active kernel was available.
When a kernel is culled, it's shutdown needs to go through the MappingKernelManager so that it can clean up its list of kernels.  Otherwise, the same kernel will be attempted to be culled every period thereafter.
takluyver and others added 4 commits June 13, 2019 14:28
Use appropriate attribute in cull debug
Culling should go through MappingKernelManager
@takluyver
Copy link
Member Author

Superseded by PR #4837.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants