Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags: an attempted overview #2826

Closed
mpacer opened this issue Sep 1, 2017 · 27 comments
Closed

Tags: an attempted overview #2826

mpacer opened this issue Sep 1, 2017 · 27 comments

Comments

@mpacer
Copy link
Member

mpacer commented Sep 1, 2017

Jupyter-wide policy regarding tags in the core projects

We really need to make a decision about core projects handle cell tags before the final release of notebook 5.1.

There has been ongoing discussion amongst @rgbkrk @takluyver @jasongrout @ellisonbg @Carreau and others regarding the purpose of tags… Right now we are not shipping any magic values with core Jupyter projects.

N.B.: by "magic values", I mean values that are hard-coded and that map onto specific behaviour by default, usually with no ability for users to override this behaviour mapping.

Notebook 5.1 includes a magic value (#2549) which violates some of the expectations that had arisen in earlier discussions around allowed magic vlaues.

psuedo-namespaces with colons

@jasongrout and @ellisonbg had been extremely concerned about stepping on users' namespaces in any privileged jupyter tags. This would allow other projects/users to assign their own "namespace" by prefixing their tags with the namespace value postceded by a colon (e.g., namespace:tag).

We had discussed the possibility of saving for jupyter the "null string" namespace (which would then require no :. This would make the raises_exception tag a candidate for meeting that namespace requirement. Unfortunately that is not the only value that matches the current "magic value".

Potential problem: "*raises_exception*" matching

Right now, #2549 makes the namespaces convention impossible. It includes any string that contains the characters raises_exception anywhere inside of it, regardless of the namespace+: that might prefix it.

The value is not customisable, and is specified deep within the codecell execute functionality meaning it will not be able to be easily overwritten:

if (stop_on_error === undefined) {
if (this.metadata !== undefined &&
this.metadata.tags !== undefined) {
if (this.metadata.tags.indexOf('raises-exception') !== -1) {
stop_on_error = false;
} else {
stop_on_error = true;
}
} else {
stop_on_error = true;
}
}

Compromise to resolve issues around #2549

I think the simplest suggestion would be to allow raises_exception and nbval_raises_exception as the two tags that the core notebook "knows" about. This greatly reduces namespace stompage while not breaking the behaviour that @takluyver and @mscuthbert introduced in merging #2549. It's less than ideal to have a non-core project determining magic values inside the core project… but without making it configurable via a traitlet (or something), I can't think of another way to handle this.

Even so it would seem to be a good idea to modularise this chunk of functionality so that it can be overwritten by external libraries without needing to rewrite the entirety of CodeCell.prototype.execute just to access this one feature.

Document how tags are used outside of changelogs.

Now that tags are supported in at least a couple of different ways in the core projects and in 3rd party projects, we should explicitly document those ways of using them. I list the ones I know about at the bottom of this issue.

We need to start this effort and its closely tied to how we decide to handle the policy in general. Here are a couple of my initial thoughts:

  1. nbformat should document the overall policy regarding tags.
    1. We should have a policy for 3rd party libraries to know how to specify the tags they want to use; if we want to use namespaces with colons, we need to start promulgating that now.
    2. We should specify whether we intend that people should use tags as features that can apply to multiple cells or as unique identifiers
      a. nameis specified in cell.metadata as an explicit unique id across notebook
      i. we should have an interface similar cell tags that allows assigning a single unique name to each cell that will enforce the uniqueness condition as a part of the UI
  2. notebook docs should include screenshots of assigning tags to cells.
    1. notebook docs should mention any magic values that tags have and their matching scope
  3. jupyterlab docs should include screenshots of assigning tags to cells.
    1. jupyterlab docs should mention any magic values that tags have and their matching scope
  4. nbconvert docs should describe how to specify tags for any of its purposes.

Examples of tags as user-defined values

Examples of tags as hard-coded magic values

Are there other projects that the core in any way maintains that use tags in either of these ways?

Regarding jupyterlab: @blink1073 @ian-r-rose @afshin are tags being used for anything meaningful in jupyterlab as of yet? Is there an official policy of how to handle them there?

@mscuthbert
Copy link
Contributor

It wouldn't be too hard to change this.metadata.tags.indexOf('raises-exception') !== -1 to something like const raises_exception_index = this.metadata.tags.indexOf('raises-exception'); if (raises_exception_index !== -1 and raises_exception_index > this.metadata.tags.indexOf(':')) to make this special value conscious of namespaces.

I think that moving to a namespace for most items would be a good thing. We can create a core: or jupyter: namespace to which eventually raises-exception could be moved to. But I don't think that we should make the namespaces absolutely private (in that they can't be accessed by other programs), since I'm already using the nbval-skip, nbval-ignore-output, etc. in other programs rather than need to duplicate tags for the same functionality multiple times.

@mpacer
Copy link
Member Author

mpacer commented Sep 2, 2017

The most recent suggestion was that the namespace for core jupyter tags would in theory be the null string with no colons. The goal is not to make other package namespaces inaccessible (that would be basically impossible), but rather to make their use as explicit as possible.

I know that we could modify the code to respect the convention that was proposed, but it's dangerous to have magic values cover so much area in the space of potential strings. I'd much prefer if we're going to allow both of those in, that we restrict it to the exact two strings that need to be covered to maintain the current use-case.

More generally, though, my point was that we need to come to a consensus on this now before 5.1 is released. Once 5.1 is released, we have to live with that decision for backwards compatibility reasons.

As you can see it's already really complicated; I fear it will only get more complicated if we aren't pretty strict about enforcing this as soon as possible.

And regarding duplication… as long as it's a thing we need to do once and never again I think we can probably avoid this problem by switching to conventions and making some way to overwrite the code that you just wrote. The hard part is that nbval would then need to also be a notebook_extension… which those devs might not want to do.

It's worth noting that for switching conventions, it wouldn't be hard to write a preprocessor in nbconvert that would automatically convert metadata from one convention to another (something like this in preprocess.cell).

old_conv=re.compile(r"nbval-(.*)")
new_conv= "nbval:"
cell.metadata.tags = [new_conv +  old_conv.match(tag).group(0) if old_conv.match(tag) else tag for tag in cell.metadata.tags ]

…which you could apply once to all the notebooks in your project and be done with it.

Similarly you could write a custom save_hook that would apply a similar sort of transform so that you wouldn't need to actually type redundant tags even if they needed to be stored redundantly in order to handle separation of concerns appropriately.

cc @gnestor sorry I didn't pull you in before.

@takluyver
Copy link
Member

I think you've misunderstood what the code is doing: it's testing for membership in a list, not doing a substring match:

// In this list
a = ['raises-exception']
a.indexOf('raises-exception')  // -> 0

// Not in this list
b = ['foo:raises-exception']
b.indexOf('raises-exception') // -> -1

More generally, can we please not wrap tags up in specifications and namespaces and overthinking? Yes, we should try to avoid clashes, but a simple prefix like we use for nbval-check-output should be sufficient. I really meant tags to be a practical system for doing stuff, not a theoretically beautiful abstraction for semantic labelling of cells.

@mpacer
Copy link
Member Author

mpacer commented Sep 2, 2017

Sorry I was thrown off… I thought the feature was specifically implemented so that nbval-raises-exception would pass as well. I thought it was a weird way to test for that.

In that case, I support leaving things as is but also establishing what our rules and expectations around tags are, whether for user space reasons or cyclomatic complexity reasons.

@takluyver
Copy link
Member

To further confuse things, the initial version of the PR did support using nbval-raises-exception. I asked the contributor to change that, because the nbval- tags should only be used by nbval. However, nbval does now also recognise the non-prefixed tag, since we're effectively defining that as a common tag which can be interpreted by different tools to mean that an error is expected.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 2, 2017

/cc @ewmassey for papermill + tags

@rgbkrk
Copy link
Member

rgbkrk commented Sep 2, 2017

Thank you for the summary @mpacer. As is pointed out, I'm both against frontends using tags as part of their state AND an offender as a papermill contributor / user. To bring clarity to my hypocrisy, I'll explain:

  • The UI/UX for setting new kinds of metadata tends to be hard for users (since the built-in option is the JSON editor if new UI isn't created for it).
  • I really wanted papermill to use top level metadata for parameters. However, the notebook wouldn't necessarily be executable then. We need to be able to have default parameters recognized by frontends that don't know our format. The easiest way to do that is with a plain code cell. We could easily switch to a boolean flag for metadata on the cell once we've started customizing a bit more.
  • As a frontend implementer, the metadata object can be well typed and specced. Arrays of arbitrary strings of arbitrary order can't.

@blink1073
Copy link
Contributor

JupyterLab currently does not use any tag data. We opened jupyterlab/jupyterlab#2412 as a feature parity issue to implement the raises-exception behavior that was added to classic notebook.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 5, 2017

raises-exception needs to go into nbformat, both the schema and the documentation if it is to be expected.

@ellisonbg
Copy link
Contributor

ellisonbg commented Sep 7, 2017 via email

@dsblank
Copy link
Member

dsblank commented Sep 7, 2017

Is this correct:

  1. tags are easy to add/change/delete for the user
  2. tags that do things (other than filtering) are only used by 3rd party software
  3. standard jupyter functionality (like hiding a cell, tagging for special CSS) has to be done through another means (metadata)

Wouldn't it be easier to have tags that DO have special functionality in the core software? Or am I confused?

@ellisonbg
Copy link
Contributor

ellisonbg commented Sep 7, 2017 via email

@dsblank
Copy link
Member

dsblank commented Sep 7, 2017

Here is another issue:

  • what language would the tag be in? how would internationalization be handled?

But I hope that there is a UI/UX developed that is as easy as tags for doing official functionality. Perhaps that is what is missing from the discussion.

@mpacer
Copy link
Member Author

mpacer commented Sep 7, 2017

@dsblank My understanding is that we need modifications to 2. & 3. above:

from: 2. tags that do things (other than filtering) are only used by 3rd party software

to: 2. specific tag values are only used by 3rd party software (caveat raises-expectation)

from: 3. standard jupyter functionality (like hiding a cell, tagging for special CSS) has to be done through another means (metadata)

to: 3. standard jupyter functionality (like hiding a cell, tagging for special CSS) can be done through generic metadata or via a generic tag-based interface that requires explicit user-specification

The key to both nuances is that we can create ways for users to access specific functionality (hiding a cell, tagging for special CSS) but where to do so requires that the users both specify what tags they want to use and then apply those particular tags to their cells. We don't have a general way of surfacing that kind of functionality (today) in the notebook front end, but that's a matter of implementation. For example, it should be possible (once I rework #2413) to access tag-based element filtering from the "Download As…" menu item by uploading traitlets values via a json config file that specifies which tags indicate which kind of filtering.

I imagine other front-end features could similarly use loading configuration to enable these kinds of generic tag interfaces. However, that seems like way too much overhead to be the only mechanism for exposing this functionality. My thoughts as to how to do this more simply are still evolving.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 8, 2017

I've been trying to think of a forms-focused editor for metadata, at least for certain acknowledged ones:

screen shot 2017-09-08 at 10 50 26 am

Either as something that pops out of a cell or appears to be on the "backside" of a cell.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 8, 2017

Here's an interpretation for notebook wide metadata, largely targeting the authors, descriptions, and tag fields:

image

This comes from nteract/nteract#1503

@mscuthbert
Copy link
Contributor

I think that a separate editor for metadata and tags could work, especially if it were easy to assign the metadata/formatting/tags data for a certain cell as a keyboard shortcut so that it can be applied to another cell with a few keystrokes.

@gnestor
Copy link
Contributor

gnestor commented Sep 12, 2017

Just to clarify: Is this blocking us from releasing 5.1 or have we decided that the current implementation of raises-exception in notebook is acceptable?

@takluyver
Copy link
Member

To me it's totally acceptable and exactly the kind of thing tags are there for. I don't see any advantage of saying that only third-party projects can define tags. I'm already frustrated by how long it's taken to get 5.1 out the door, so I really hope we don't want to block it on this issue.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 13, 2017

It's not in saying that only third-party projects and users can create tags. People are saying if you want defined expected Jupyter behavior, it better be well specified and likely put in the metadata.

@rgbkrk
Copy link
Member

rgbkrk commented Sep 13, 2017

As for the release being blocked, just ship with raises-exception and support it as a quirk forever.

@takluyver
Copy link
Member

Tags are in metadata, and I don't see anything wrong with specifying some particular ones, like raises-exception to have more meanings more general than one specific tool. It's a pragmatic solution to let users easily set certain kinds of metadata without needing to build a special UI for each attribute.

@minrk
Copy link
Member

minrk commented Sep 14, 2017

I agree with @rgbkrk that we should ship 5.1 with raises-exception tag and continue this discussion post-5.1. Defining the interpretation of a single tag is not committing to the general pattern of using tags for behavior.

In general, I agree with @takluyver that tags were created with the specific purpose of providing a single UI for defining labelling and behavior without having to create new, dedicated UI for every flag, which is required for any other metadata structure we come up with. Of course, most metadata won't fit in tags, and some will have logical UI other than tags, but things like raises-exception are precisely the use case tags were created to solve.

@jasongrout
Copy link
Member

jasongrout commented Sep 14, 2017

The distinction between "expected Jupyter behavior" (leading to a format spec change) and "supported by a few tools" (could be implemented by a tag convention) can be confusing when one official reference frontend is by far most popular and implements a specific convention. I think we want to avoid "the implementation is the standard" - but of course it can be difficult to set that expectation about what is implementation and what is the standard.

@takluyver - it sounds like you don't expect all frontends to implement the raises-exception behavior the notebook is implementing, correct?

@takluyver
Copy link
Member

I don't consider it an imperative, but where frontends have similar 'execute all' functionality, I wouldn't be surprised if users ask for similar behaviour around the tag. We are defining the tag as a marker that the tagged cell is expected to raise an error, but it's up to tools how and whether to use that.

I think we want to avoid "the implementation is the standard"

In principle, yes. But let's not stifle ourselves by trying to write the standards before we allow ourselves to use anything. Let's make tags useful first, and then try to formalise them.

Function annotations in Python are an example of what I mean. The syntax has been supported since Python 3.0, but core developers said "we won't define any uses, we'll let third parties come up with them". So they went basically unused until those core devs came back and defined them as type hints.

@mpacer
Copy link
Member Author

mpacer commented Sep 14, 2017

It sounds like everyone on the different sides of the issue agree on this release going out either way, so I'm going to close this and I think we can start on a new issue in nbformat.

@mpacer mpacer closed this as completed Sep 14, 2017
@ellisonbg
Copy link
Contributor

ellisonbg commented Sep 15, 2017 via email

@minrk minrk added this to the Reference milestone Sep 13, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants