Tags: an attempted overview #2826

mpacer · 2017-09-01T23:27:03Z

Jupyter-wide policy regarding tags in the core projects

We really need to make a decision about core projects handle cell tags before the final release of notebook 5.1.

There has been ongoing discussion amongst @rgbkrk @takluyver @jasongrout @ellisonbg @Carreau and others regarding the purpose of tags… Right now we are not shipping any magic values with core Jupyter projects.

N.B.: by "magic values", I mean values that are hard-coded and that map onto specific behaviour by default, usually with no ability for users to override this behaviour mapping.

Notebook 5.1 includes a magic value (#2549) which violates some of the expectations that had arisen in earlier discussions around allowed magic vlaues.

psuedo-namespaces with colons

@jasongrout and @ellisonbg had been extremely concerned about stepping on users' namespaces in any privileged jupyter tags. This would allow other projects/users to assign their own "namespace" by prefixing their tags with the namespace value postceded by a colon (e.g., namespace:tag).

We had discussed the possibility of saving for jupyter the "null string" namespace (which would then require no :. This would make the raises_exception tag a candidate for meeting that namespace requirement. Unfortunately that is not the only value that matches the current "magic value".

Potential problem: `"raises_exception"` matching

Right now, #2549 makes the namespaces convention impossible. It includes any string that contains the characters raises_exception anywhere inside of it, regardless of the namespace+: that might prefix it.

The value is not customisable, and is specified deep within the codecell execute functionality meaning it will not be able to be easily overwritten:

notebook/notebook/static/notebook/js/codecell.js

Lines 314 to 325 in d4c6fe4

    
           if (stop_on_error === undefined) { 
        
               if (this.metadata !== undefined &&  
        
                       this.metadata.tags !== undefined) { 
        
                   if (this.metadata.tags.indexOf('raises-exception') !== -1) { 
        
                       stop_on_error = false; 
        
                   } else { 
        
                       stop_on_error = true; 
        
                   } 
        
               } else { 
        
                  stop_on_error = true; 
        
               } 
        
           }

Compromise to resolve issues around #2549

I think the simplest suggestion would be to allow raises_exception and nbval_raises_exception as the two tags that the core notebook "knows" about. This greatly reduces namespace stompage while not breaking the behaviour that @takluyver and @mscuthbert introduced in merging #2549. It's less than ideal to have a non-core project determining magic values inside the core project… but without making it configurable via a traitlet (or something), I can't think of another way to handle this.

Even so it would seem to be a good idea to modularise this chunk of functionality so that it can be overwritten by external libraries without needing to rewrite the entirety of CodeCell.prototype.execute just to access this one feature.

Document how tags are used outside of changelogs.

Now that tags are supported in at least a couple of different ways in the core projects and in 3rd party projects, we should explicitly document those ways of using them. I list the ones I know about at the bottom of this issue.

We need to start this effort and its closely tied to how we decide to handle the policy in general. Here are a couple of my initial thoughts:

nbformat should document the overall policy regarding tags.
1. We should have a policy for 3rd party libraries to know how to specify the tags they want to use; if we want to use namespaces with colons, we need to start promulgating that now.
2. We should specify whether we intend that people should use tags as features that can apply to multiple cells or as unique identifiers
  a. nameis specified in cell.metadata as an explicit unique id across notebook
  i. we should have an interface similar cell tags that allows assigning a single unique name to each cell that will enforce the uniqueness condition as a part of the UI
notebook docs should include screenshots of assigning tags to cells.
1. notebook docs should mention any magic values that tags have and their matching scope
jupyterlab docs should include screenshots of assigning tags to cells.
1. jupyterlab docs should mention any magic values that tags have and their matching scope
nbconvert docs should describe how to specify tags for any of its purposes.

Examples of tags as user-defined values

nbconvert, via traitlet values ,
- TagRemovePreprocessor.remove_cell_tags: removes cells with specified tags (Tag remove preprocessor nbconvert#640)
- TagRemovePreprocessor.remove_input_tags: removes inputs with specified tags (enables input removal via preprocessor, template and invalid nb intermediate representation nbconvert#643)
- TagRemovePreprocessor.remove_all_outputs_tags: removes outputs with specified tags (Tag remove preprocessor nbconvert#640)

Examples of tags as hard-coded magic values

notebook
- "*raises-exception*": allows exceptions to pass allow raises-exception to continue #2549
computationalmodelling/nbval:
- "pytest-validate-ignore-output" : avoids testing cell (deprecated)
- "nbval-ignore-output": avoids testing cell
- "nbval-check-output": ensures testing cell even with --nbval-lax
- "nbval-raises-exception": allows exceptions to pass (analogous to notebook value)
- "nbval-skip": skips cell during testing
nteract/papermill
- "parameters": identifies a single unique cell (https://github.com/nteract/papermill/blob/master/papermill/execute.py#L157)

Are there other projects that the core in any way maintains that use tags in either of these ways?

Regarding jupyterlab: @blink1073 @ian-r-rose @afshin are tags being used for anything meaningful in jupyterlab as of yet? Is there an official policy of how to handle them there?

The text was updated successfully, but these errors were encountered:

mscuthbert · 2017-09-02T02:12:36Z

It wouldn't be too hard to change this.metadata.tags.indexOf('raises-exception') !== -1 to something like const raises_exception_index = this.metadata.tags.indexOf('raises-exception'); if (raises_exception_index !== -1 and raises_exception_index > this.metadata.tags.indexOf(':')) to make this special value conscious of namespaces.

I think that moving to a namespace for most items would be a good thing. We can create a core: or jupyter: namespace to which eventually raises-exception could be moved to. But I don't think that we should make the namespaces absolutely private (in that they can't be accessed by other programs), since I'm already using the nbval-skip, nbval-ignore-output, etc. in other programs rather than need to duplicate tags for the same functionality multiple times.

mpacer · 2017-09-02T04:00:38Z

The most recent suggestion was that the namespace for core jupyter tags would in theory be the null string with no colons. The goal is not to make other package namespaces inaccessible (that would be basically impossible), but rather to make their use as explicit as possible.

I know that we could modify the code to respect the convention that was proposed, but it's dangerous to have magic values cover so much area in the space of potential strings. I'd much prefer if we're going to allow both of those in, that we restrict it to the exact two strings that need to be covered to maintain the current use-case.

More generally, though, my point was that we need to come to a consensus on this now before 5.1 is released. Once 5.1 is released, we have to live with that decision for backwards compatibility reasons.

As you can see it's already really complicated; I fear it will only get more complicated if we aren't pretty strict about enforcing this as soon as possible.

And regarding duplication… as long as it's a thing we need to do once and never again I think we can probably avoid this problem by switching to conventions and making some way to overwrite the code that you just wrote. The hard part is that nbval would then need to also be a notebook_extension… which those devs might not want to do.

It's worth noting that for switching conventions, it wouldn't be hard to write a preprocessor in nbconvert that would automatically convert metadata from one convention to another (something like this in preprocess.cell).

old_conv=re.compile(r"nbval-(.*)")
new_conv= "nbval:"
cell.metadata.tags = [new_conv +  old_conv.match(tag).group(0) if old_conv.match(tag) else tag for tag in cell.metadata.tags ]

…which you could apply once to all the notebooks in your project and be done with it.

Similarly you could write a custom save_hook that would apply a similar sort of transform so that you wouldn't need to actually type redundant tags even if they needed to be stored redundantly in order to handle separation of concerns appropriately.

cc @gnestor sorry I didn't pull you in before.

takluyver · 2017-09-02T11:52:53Z

I think you've misunderstood what the code is doing: it's testing for membership in a list, not doing a substring match:

// In this list
a = ['raises-exception']
a.indexOf('raises-exception')  // -> 0

// Not in this list
b = ['foo:raises-exception']
b.indexOf('raises-exception') // -> -1

More generally, can we please not wrap tags up in specifications and namespaces and overthinking? Yes, we should try to avoid clashes, but a simple prefix like we use for nbval-check-output should be sufficient. I really meant tags to be a practical system for doing stuff, not a theoretically beautiful abstraction for semantic labelling of cells.

mpacer · 2017-09-02T17:40:17Z

Sorry I was thrown off… I thought the feature was specifically implemented so that nbval-raises-exception would pass as well. I thought it was a weird way to test for that.

In that case, I support leaving things as is but also establishing what our rules and expectations around tags are, whether for user space reasons or cyclomatic complexity reasons.

takluyver · 2017-09-02T19:37:39Z

To further confuse things, the initial version of the PR did support using nbval-raises-exception. I asked the contributor to change that, because the nbval- tags should only be used by nbval. However, nbval does now also recognise the non-prefixed tag, since we're effectively defining that as a common tag which can be interpreted by different tools to mean that an error is expected.

rgbkrk · 2017-09-02T23:43:24Z

/cc @ewmassey for papermill + tags

rgbkrk · 2017-09-02T23:56:58Z

Thank you for the summary @mpacer. As is pointed out, I'm both against frontends using tags as part of their state AND an offender as a papermill contributor / user. To bring clarity to my hypocrisy, I'll explain:

The UI/UX for setting new kinds of metadata tends to be hard for users (since the built-in option is the JSON editor if new UI isn't created for it).
I really wanted papermill to use top level metadata for parameters. However, the notebook wouldn't necessarily be executable then. We need to be able to have default parameters recognized by frontends that don't know our format. The easiest way to do that is with a plain code cell. We could easily switch to a boolean flag for metadata on the cell once we've started customizing a bit more.
As a frontend implementer, the metadata object can be well typed and specced. Arrays of arbitrary strings of arbitrary order can't.

blink1073 · 2017-09-05T12:39:59Z

JupyterLab currently does not use any tag data. We opened jupyterlab/jupyterlab#2412 as a feature parity issue to implement the raises-exception behavior that was added to classic notebook.

rgbkrk · 2017-09-05T16:35:36Z

raises-exception needs to go into nbformat, both the schema and the documentation if it is to be expected.

ellisonbg · 2017-09-07T20:05:52Z

My own understanding of tags is this: * For third party users only - official jupyter projects shouldn't use them so nbconvert shouldn't use them. * Third parties *should* use some sort of namespacing * Third parties can interpret them in any way they want * Frontends can only use tags in a completely generic manner (tag based filtering for example) and should not have particular UI/UX behavior associated with "special tags". The counterside of this is: * Anything tag like that we want to (official jupyter projects) do should go into cell metadata and be added to the notebook format. This will prevent us from creating informal pseudo-specifications. Based on this, I agree with Kyle that `raises-exception` should be added to the schema and documentation. But I would prefer it be moved to metadata.

dsblank · 2017-09-07T20:21:23Z

Is this correct:

tags are easy to add/change/delete for the user
tags that do things (other than filtering) are only used by 3rd party software
standard jupyter functionality (like hiding a cell, tagging for special CSS) has to be done through another means (metadata)

Wouldn't it be easier to have tags that DO have special functionality in the core software? Or am I confused?

ellisonbg · 2017-09-07T20:35:12Z

My understanding is that that 1,2,3 are correct. There are a couple of challenges with using tags for special functionality in the core software: * Tags alone are not sufficient as they can only encode very simple information. If we embrace the usage of tags for official functionality, we will have to answer the "tag or metadata?" question for each and every bit of official data. From an organizational and communication perspective, it is much easier to make the decision a single time. * Any special functionality of core jupyter software needs to be in the notebook format and official documentation. It is simpler conceptually to not have a portion of our spec that get's into particular tags. Also, we don't want to communicate to others that they can ask us to support additional official tags. * It is easier to deal with metadata than tags from a code perspective.

…

On Thu, Sep 7, 2017 at 1:21 PM, Doug Blank ***@***.***> wrote: Is this correct: 1. tags are easy to add/change/delete for the user 2. tags that do things (other than filtering) are only used by 3rd party software 3. standard jupyter functionality (like hiding a cell, tagging for special CSS) has to be done through another means (metadata) Wouldn't it be easier to have tags that DO have special functionality in the core software? Or am I confused? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2826 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0NQo0Bl8j2bJYzKdW5OA5bO3_Dz7ks5sgFBIgaJpZM4PKuHt> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

dsblank · 2017-09-07T20:41:54Z

Here is another issue:

what language would the tag be in? how would internationalization be handled?

But I hope that there is a UI/UX developed that is as easy as tags for doing official functionality. Perhaps that is what is missing from the discussion.

mpacer · 2017-09-07T21:36:36Z

@dsblank My understanding is that we need modifications to 2. & 3. above:

from: 2. tags that do things (other than filtering) are only used by 3rd party software

to: 2. specific tag values are only used by 3rd party software (caveat raises-expectation)

from: 3. standard jupyter functionality (like hiding a cell, tagging for special CSS) has to be done through another means (metadata)

to: 3. standard jupyter functionality (like hiding a cell, tagging for special CSS) can be done through generic metadata or via a generic tag-based interface that requires explicit user-specification

The key to both nuances is that we can create ways for users to access specific functionality (hiding a cell, tagging for special CSS) but where to do so requires that the users both specify what tags they want to use and then apply those particular tags to their cells. We don't have a general way of surfacing that kind of functionality (today) in the notebook front end, but that's a matter of implementation. For example, it should be possible (once I rework #2413) to access tag-based element filtering from the "Download As…" menu item by uploading traitlets values via a json config file that specifies which tags indicate which kind of filtering.

I imagine other front-end features could similarly use loading configuration to enable these kinds of generic tag interfaces. However, that seems like way too much overhead to be the only mechanism for exposing this functionality. My thoughts as to how to do this more simply are still evolving.

rgbkrk · 2017-09-08T17:51:45Z

I've been trying to think of a forms-focused editor for metadata, at least for certain acknowledged ones:

Either as something that pops out of a cell or appears to be on the "backside" of a cell.

rgbkrk · 2017-09-08T17:56:44Z

Here's an interpretation for notebook wide metadata, largely targeting the authors, descriptions, and tag fields:

This comes from nteract/nteract#1503

mscuthbert · 2017-09-08T18:40:33Z

I think that a separate editor for metadata and tags could work, especially if it were easy to assign the metadata/formatting/tags data for a certain cell as a keyboard shortcut so that it can be applied to another cell with a few keystrokes.

gnestor · 2017-09-12T22:01:53Z

Just to clarify: Is this blocking us from releasing 5.1 or have we decided that the current implementation of raises-exception in notebook is acceptable?

takluyver · 2017-09-13T09:06:54Z

To me it's totally acceptable and exactly the kind of thing tags are there for. I don't see any advantage of saying that only third-party projects can define tags. I'm already frustrated by how long it's taken to get 5.1 out the door, so I really hope we don't want to block it on this issue.

rgbkrk · 2017-09-13T14:57:18Z

It's not in saying that only third-party projects and users can create tags. People are saying if you want defined expected Jupyter behavior, it better be well specified and likely put in the metadata.

rgbkrk · 2017-09-13T14:58:24Z

As for the release being blocked, just ship with raises-exception and support it as a quirk forever.

takluyver · 2017-09-13T15:03:11Z

Tags are in metadata, and I don't see anything wrong with specifying some particular ones, like raises-exception to have more meanings more general than one specific tool. It's a pragmatic solution to let users easily set certain kinds of metadata without needing to build a special UI for each attribute.

minrk · 2017-09-14T13:00:06Z

I agree with @rgbkrk that we should ship 5.1 with raises-exception tag and continue this discussion post-5.1. Defining the interpretation of a single tag is not committing to the general pattern of using tags for behavior.

In general, I agree with @takluyver that tags were created with the specific purpose of providing a single UI for defining labelling and behavior without having to create new, dedicated UI for every flag, which is required for any other metadata structure we come up with. Of course, most metadata won't fit in tags, and some will have logical UI other than tags, but things like raises-exception are precisely the use case tags were created to solve.

jasongrout · 2017-09-14T15:24:07Z

The distinction between "expected Jupyter behavior" (leading to a format spec change) and "supported by a few tools" (could be implemented by a tag convention) can be confusing when one official reference frontend is by far most popular and implements a specific convention. I think we want to avoid "the implementation is the standard" - but of course it can be difficult to set that expectation about what is implementation and what is the standard.

@takluyver - it sounds like you don't expect all frontends to implement the raises-exception behavior the notebook is implementing, correct?

takluyver · 2017-09-14T16:13:01Z

I don't consider it an imperative, but where frontends have similar 'execute all' functionality, I wouldn't be surprised if users ask for similar behaviour around the tag. We are defining the tag as a marker that the tagged cell is expected to raise an error, but it's up to tools how and whether to use that.

I think we want to avoid "the implementation is the standard"

In principle, yes. But let's not stifle ourselves by trying to write the standards before we allow ourselves to use anything. Let's make tags useful first, and then try to formalise them.

Function annotations in Python are an example of what I mean. The syntax has been supported since Python 3.0, but core developers said "we won't define any uses, we'll let third parties come up with them". So they went basically unused until those core devs came back and defined them as type hints.

mpacer · 2017-09-14T16:41:57Z

It sounds like everyone on the different sides of the issue agree on this release going out either way, so I'm going to close this and I think we can start on a new issue in nbformat.

ellisonbg · 2017-09-15T18:30:30Z

Thanks!

…

On Thu, Sep 14, 2017 at 9:42 AM, M Pacer ***@***.***> wrote: It sounds like everyone on the different sides of the issue agree on this release going out either way, so I'm going to close this and I think we can start on a new issue in nbformat. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2826 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0B06qty5TtmhinUMoj8owLZCeZRXks5siVdfgaJpZM4PKuHt> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

mpacer closed this as completed Sep 14, 2017

minrk added this to the Reference milestone Sep 13, 2018

github-actions bot added the status:resolved-locked label Apr 8, 2021

github-actions bot locked as resolved and limited conversation to collaborators Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tags: an attempted overview #2826

Tags: an attempted overview #2826

mpacer commented Sep 1, 2017

mscuthbert commented Sep 2, 2017

mpacer commented Sep 2, 2017

takluyver commented Sep 2, 2017

mpacer commented Sep 2, 2017

takluyver commented Sep 2, 2017

rgbkrk commented Sep 2, 2017

rgbkrk commented Sep 2, 2017

blink1073 commented Sep 5, 2017

rgbkrk commented Sep 5, 2017

ellisonbg commented Sep 7, 2017 via email

dsblank commented Sep 7, 2017

ellisonbg commented Sep 7, 2017 via email

dsblank commented Sep 7, 2017

mpacer commented Sep 7, 2017 •

edited

Loading

rgbkrk commented Sep 8, 2017

rgbkrk commented Sep 8, 2017

mscuthbert commented Sep 8, 2017

gnestor commented Sep 12, 2017

takluyver commented Sep 13, 2017

rgbkrk commented Sep 13, 2017

rgbkrk commented Sep 13, 2017 •

edited

Loading

takluyver commented Sep 13, 2017

minrk commented Sep 14, 2017

jasongrout commented Sep 14, 2017 •

edited

Loading

takluyver commented Sep 14, 2017

mpacer commented Sep 14, 2017

ellisonbg commented Sep 15, 2017 via email

Tags: an attempted overview #2826

Tags: an attempted overview #2826

Comments

mpacer commented Sep 1, 2017

Jupyter-wide policy regarding tags in the core projects

psuedo-namespaces with colons

Potential problem: "*raises_exception*" matching

Compromise to resolve issues around #2549

Document how tags are used outside of changelogs.

Examples of tags as user-defined values

Examples of tags as hard-coded magic values

mscuthbert commented Sep 2, 2017

mpacer commented Sep 2, 2017

takluyver commented Sep 2, 2017

mpacer commented Sep 2, 2017

takluyver commented Sep 2, 2017

rgbkrk commented Sep 2, 2017

rgbkrk commented Sep 2, 2017

blink1073 commented Sep 5, 2017

rgbkrk commented Sep 5, 2017

ellisonbg commented Sep 7, 2017 via email

dsblank commented Sep 7, 2017

ellisonbg commented Sep 7, 2017 via email

dsblank commented Sep 7, 2017

mpacer commented Sep 7, 2017 • edited Loading

rgbkrk commented Sep 8, 2017

rgbkrk commented Sep 8, 2017

mscuthbert commented Sep 8, 2017

gnestor commented Sep 12, 2017

takluyver commented Sep 13, 2017

rgbkrk commented Sep 13, 2017

rgbkrk commented Sep 13, 2017 • edited Loading

takluyver commented Sep 13, 2017

minrk commented Sep 14, 2017

jasongrout commented Sep 14, 2017 • edited Loading

takluyver commented Sep 14, 2017

mpacer commented Sep 14, 2017

ellisonbg commented Sep 15, 2017 via email

Potential problem: `"raises_exception"` matching

mpacer commented Sep 7, 2017 •

edited

Loading

rgbkrk commented Sep 13, 2017 •

edited

Loading

jasongrout commented Sep 14, 2017 •

edited

Loading