Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Transmit Cell Metadata #70

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added zz-cell-metadata/allthekernels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added zz-cell-metadata/pick-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added zz-cell-metadata/pick.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added zz-cell-metadata/sos-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added zz-cell-metadata/sos-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
348 changes: 348 additions & 0 deletions zz-cell-metadata/transmit-cell-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,348 @@
---
title: Transmitting Cell Metadata in Jupyter Execute Requests
authors: John Lam (jflam@microsoft.com), Matthew Seal (matt@noteable.io), Carol Willing (willingc@gmail.com)
issue-number: <pre-proposal-issue-number>
pr-number: <proposal-pull-request-number>
date-started: 2021-02-10
---

# Summary

This proposal discusses the **transmission of cell metadata** with execute
message requests and would modify the Jupyter Messaging Protocol. Individual
kernels would interpret or ignore this metadata. This enables flexibility in
different usage scenarios implemented in various front-end clients.

# Motivation

By transmitting cell metadata inline with the `execute`, `inspect_request`,
and `complete_request` messages, Jupyter implementations will have a
reliable channel to transmit additional metadata to the kernel in a standard
way.

Notebook extensions can also use this channel to transmit additional
information that was often transmitted using magic commands.

Some use cases which motivated this proposal are:

- Route requests automatically to an appropriate kernel via libraries like
[allthekernels](https://github.com/minrk/allthekernels) without need for
additional metadata within the cell itself
- Create or find a conda environment without needing to use magics, like
[pick](https://github.com/nteract/pick)
- Support polyglot (more than one language/kernel within a single notebook)
scenarios, like [sos](https://vatlab.github.io/sos-docs/)
- Provide hints to the kernel for localization purposes, like how the
`ACCEPT_LANGUAGE HTTP` header works
- Provide hints to the kernel about client capabilities, similar to how
hints of a web browser client's capabilities work

# Guide-level explanation

Transmitting cell metadata enables many scenarios as described briefly in the
Motivation section. In this section, we consider one scenario in more detail:
running a code cell using a specific kernel.

Today a typical approach is for the user to include a magic command in the
cell that identifies the kernel. This approach interferes with other
extensions that may want to use the contents of the cell, e.g., autocomplete
providers would now need to be aware of and ignore the syntax of magics.

## Simple example

For example, in the [allthekernels](https://github.com/minrk/allthekernels)
project, users select the kernel using a `><language>` command:

```R
>python3
1+1
```

But in our example, let's imagine that we use cell metadata to specify the
kernel instead. Now, let's consider a minimal JSON fragment for the above
cell:

```json
{
"cell_type" : "code",
"execution_count": 1,
"metadata" : {
"kernel": "python3",
},
"source" : "1+1",
}
```

The cell metadata dict contains an entry that specifices that the `kernel` is
`python3`. But where did the `"kernel": "python3"` metadata come from? What
wrote it into the cell metadata in the first place?

Elaborating a bit more on the user experience here, you could imagine a client
extension providing some additional UI elements such as a cell drop-down that
lets the user pick from a list of installed kernels on the user's machine. The
user picks one, and the kernelspec or its identifier is written to that cell's
metadata.

In this example, there is also a corresponding `allthekernels` kernel that is
damianavila marked this conversation as resolved.
Show resolved Hide resolved
installed on the user's machine that knows how to multiplex between different
kernel processes that are running on the user's machine. When the user runs
the cell, the Jupyter implementation will send an
[execute](https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute)
message to the kernel.

Here's a minimal representation of the execute message for the above cell:

```js
{
"header" : {
"msg_id": "...",
"msg_type": "...",
"metadata": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Reference-level explanation below, the metadata is not in the header. I think it should not be in the header, since the header will be transmitted back and forth many times (since parent_header is a copy of the original message's header). Can this example be changed to conform to the reference below, to move the metadata to the top level of the message rather than in the header?

"kernel": "python3",
},
//...
},
"parent_header": {},
"content": {
"code": "1+1",
},
"content": {},
"buffers": [],
}
```

In this case the `allthekernels` kernel sees the `"kernel": "python3"` entry
in the message, and locates and activates a child kernel to handle the
request, and passes the message onto the child kernel for processing.

There could be other cell metadata that was transmitted from the client as
well. Some of that metadata could have been put there by client extensions,
like in the case of `allthekernels`. Other metadata could be put there by the
Jupyter implementation itself, e.g., language or client capabilities like
screen size.

## Metadata Key Conflicts and Namespacing

The potential for conflicts exists across extensions that want to add their
own cell metadata to notebook file. We recommend that extensions namespace
their metadata keys to minimize the possibility of conflicts between
extensions. For example, in the `allthekernels` case it could look like:

```json
{
"cell_type" : "code",
"execution_count": 1,
"metadata" : {
"allthekernels:kernel": "python3",
},
"source" : "1+1",
}
```

## Kernels declaring the need for Cell Metadata

Kernels should have a way to declare that they require metadata to be sent.
For a kernel like `allthekernels`, this kernel *needs* to have cell metadata
that specifies the available options. The kernel on receipt of the metadata
can take the appropriate action or warn that it requires additional
information.

# Reference-level Explanation

Cell metadata will be transmitted to the kernel in messages that are
associated with the cell. Some examples of messages include:
[execute](https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute),
[inspect_request](https://jupyter-client.readthedocs.io/en/stable/messaging.html#introspection),
and
[complete_request](https://jupyter-client.readthedocs.io/en/stable/messaging.html#completion)
messages.
minrk marked this conversation as resolved.
Show resolved Hide resolved

The general form of a message is:

```js
{
"header" : {
"msg_id": "...",
"msg_type": "...",
//...
},
"parent_header": {},
"metadata": {},
"content": {},
"buffers": [],
}
```

We propose adding cell metadata to the existing `message.metadata` dict
[see Jupyter client
docs](https://jupyter-client.readthedocs.io/en/stable/messaging.html#metadata).
This will be used to transmit the cell metadata for the executed cell.

In cases where Jupyter extensions generate their own metadata, the keys for
the metadata should be namespaced using an extension-specific prefix. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be a good idea to recommend a sub-dictionary as well for this. There's no requirement (or even recommendation) that metadata be a one-level dict. If an extension has more than one or two settings, it probably makes sense to nest a dict.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea; do you think that existing implementations might make the incorrect assumption that there aren't nested dicts, i.e., this change would break existing implementations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the python layer all of the library interactions within jupyter / nteract don't expect single layer dicts. I'd be surprised if the UIs did either since there's already multi-layer dicts present in nbformat fields.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be nothing that prevents transmission (and a space-limited journal; storage) of nested dicts i.e. with schema: JSONschema-validateable [nested] JSON and/or W3C SHACL-validateable JSON-LD with URIs for migrateable Linked Data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any risk associated with it. Dicts are generally nested throughout Jupyter and the only spec for metadata is that it's a JSONable dict, which can have aribtrary depth, just like is already true for header, content, etc. which can also have nested fields.

prefix is ideally human-readable and identifies the extension that wrote the
metadata. There is no current provision to guarantee global uniqueness for
these prefixes in a way that other technologies, e.g., XML Namespaces do using
URIs.

Below is a nominal example of these proposals, cell metadata and execute
requests, in action. This fragment of a notebook contains a cell to be
executed. Note that the `kernel` attribute is namespaced using `allthekernels`
and the existing Jupyter attributes `collapsed` and `scrolled` are not
namespaced.

```js
{
"cell_type" : "code",
"execution_count": 1,
"metadata" : {
"allthekernels:kernel" : "python3",
"collapsed" : True,
"scrolled": False,
},
"source" : "1+1",
"outputs": [{
"output_type": "stream",
...
}],
}
```

Below is the corresponding EXECUTE message:

```js
{
"header" : {
"msg_id": "...",
"msg_type": "...",
//...
},
"parent_header": {},
"metadata": {
"allthekernels:kernel": "python3",
"collapsed": True,
"scrolled": False,
},
"content": {
"code": "1+1",
},
"content": {},
"buffers": [],
}
```

# Rationale and Alternatives

## Rejected alternative: Metadata in content

We considered another approach, content-level cell metadata, before we arrived
at this JEP's proposed recommendation.

Transmitting the metadata as a dict in the content of an EXECUTE message is
illustrated here:

```js
{
"header" : {
"msg_id": "...",
"msg_type": "...",
//...
},
"parent_header": {},
"content": {
"code": "1+1",
"metadata": {
"allthekernels:kernel": "python3",
"collapsed": True,
"scrolled": False,
},
},
"buffers": [],
}
```

We decided against this pattern as there are types of metadata that could be
transmitted to the kernel that are not logically associated with the content;
the examples below describe capabilities of the client:

- Provide hints to the kernel for localization purposes, like how the
`ACCEPT_LANGUAGE HTTP` header works
- Provide hints to the kernel about client capabilities, similar to how
hints of a web browser client's capabilities work

## Rejected approach: Allow-List Pattern

In looking at metadata that should or shouldn't be sent, we investigated if
the fields to be passed should be allow-list or block-list pattern matching.
e.g. Allow `allthekernels:kernel` metadata only. The issue is that this
greatly complicates existing applications over the current proposal as kernels
would need to indicate the metadata fields they accept, and clients would then
need to track that and filter fields sent back during execution. The
attributes within the metadata today are: A) small in size and B) not harmful
to send across the wire so keeping the solution simpler was the preferred
pattern in the proposal.

## Impact

This proposal will add a new foundational capability to the Jupyter Messaging
Protocol: the ability to transmit additional information to the kernel which
the kernel can use to make better decisions about execution of user code. This
makes it much more straightforward to have independent collaboration on
polyglot notebooks (notebooks that contain code in more than one programming
language).

If the proposal is accepted, we benefit from an opportunity to improve the
ability to send out-of-band information to the kernel with the EXECUTE
message. Scenarios like polyglot notebooks, or adaptive rendering based on
changes to the user's browser window size or graphics settings would be
realized.

# Prior Art

## allthekernels

`allthekernels` uses a special syntax ("> __kernelspec__") within the cell to
specify the kernel to use to run the code in the cell. This would be replaced
by writing the kernelspec as cell metadata and transmitting it to the kernel
as described earlier in this document.

![allthekernels screenshot](./allthekernels.png)

[GitHub](https://github.com/minrk/allthekernels)

## Script of Scripts (SoS)

`SoS` is a combination of a meta-kernel (authors call it a "super kernel")
that controls a set of child kernels and magic commands to identify the kernel
to target in a cell. It also provides a shared context in the “super kernel”
to share variables and data between different kernels. Requires an extension
to manage language metadata (see screenshot below)

![sos architecture](./sos-1.png)
![sos screenshot](./sos-2.png)

[GitHub](https://github.com/vatlab/sos-notebook)
[JupyterCon Presentation](https://www.youtube.com/watch?v=U75eKosFbp8)
[Documentation](https://vatlab.github.io/sos-docs/notebook.html#content)

## nteract pick

`pick` is a kernel proxy that uses magics to specify an existing conda
environment to use or an environment to create to run code in the notebook.

![pick architecture](pick.png)
![pick screenshot](pick-2.png)

[Github](https://github.com/nteract/pick)

# Open Questions

We have an opinion around some decision points but would be open to
suggestions around:

- Whether the cell metadata is transmitted as a new dict in the EXECUTE
message, or whether it is transmitted as a new dict in the content field of
the EXECUTE message.
- Decide whether kernels need to explicitly declare the metadata that they
need, and if so, the mechansim for communicating that declaration to the
Jupyter implementation.