Display data as a cell type #1123

rgbkrk · 2016-02-21T05:31:40Z

On the heels of #621 and IRkernel/IRkernel#260, I'm wondering about a cell that matches display data semantics. We already have a way to display mimetype: data bundles.

Here's an example UX flow. A user goes to insert an image either via drag and drop or a menu:

Insert
    --> Image
    --> HTML

The image then ends up embedded in the document as if it was injected using magics or running code.

Thinking on @lbustelo's use case for the declarative widgets, this would allow cross-language support for writing direct HTML cells without the use of magics. The cell then has two states - edit and view, just like the markdown cells.

This also spares the idea of having to do garbage collection in #621.

/cc @lbustelo @parente @julienr @jdfreder

The text was updated successfully, but these errors were encountered:

ellisonbg · 2016-02-21T07:32:26Z

Can you clarify why a Markdown cell doesn't accomplish that? Do you want to access display mime other types than HTML? What would edit mode look like for those other mime types? Not opposed to this idea, just trying to think through it....

takluyver · 2016-02-21T10:46:20Z

@ibustelo has run into some sort of sanitisation that we do on markdown cells, so you can't write arbitrary HTML in them.

I think that was prompted by the security discussion - we decided to sanitise markdown all the time, so that the signatures and trust mechanism only deal with outputs from code cells.

So I think the crucial thing we need to work out is what the user interface for trusting arbitrary HTML in something like a markdown cell is, equivalent to running a code cell to trust it. Should we present untrusted cells unrendered and let the user render them to trust them?

willingc · 2016-02-21T16:54:26Z

Ping @minrk

julienr · 2016-02-21T19:24:02Z

If I understand this correctly, I think this can be orthogonal to #621 . Even if you could insert a special "image" cell, I don't think this would cover all the use cases for inline images. For example, you might want to do layout on your inline images (e.g. put them in a table) that you cannot do with a simple image cell type.

ellisonbg · 2016-02-21T23:33:52Z

@takluyver great point, yes I guess we do treat the code/markdown cells
differently from a security standpoint. This clarifies an important part of
this. The idea of using Markdown cells for this purpose, but treating them
more like code cells seems like a pretty straightforward approach. What are
the downsides of that?

Creating a new cell type that supports arbitrary display mime-types, but
can still be human edited like a Markdown seems a bit awkward still. Trying
to get my head around it. Would this be like an input-less code cell? Or a
code-cell whose input is its output?

On Sun, Feb 21, 2016 at 2:46 AM, Thomas Kluyver notifications@github.com
wrote:

@ibustelo has run into some sort of sanitisation that we do on markdown
cells, so you can't write arbitrary HTML in them.

I think that was prompted by the security discussion - we decided to
sanitise markdown all the time, so that the signatures and trust mechanism
only deal with outputs from code cells.

So I think the crucial thing we need to work out is what the user
interface for trusting arbitrary HTML in something like a markdown cell is,
equivalent to running a code cell to trust it. Should we present untrusted
cells unrendered and let the user render them to trust them?

—
Reply to this email directly or view it on GitHub
#1123 (comment).

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com

rgbkrk · 2016-02-22T02:38:31Z

Here's what I imagine for what is represented as an HTML cell for users when they're editing it:

Followed by what it looks like when rendered:

In this example I'm showing a pencil icon to switch back to edit mode, though we can refine that.

As for inserting it, to the user they don't see what the underlying representation is for this cell which I'm imagining to be:

{
 "cell_type": "data",
 "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Email</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>jane@doe.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>John Doe</td>\n",
       "      <td>john@doe.com</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     }
}

Similary for images, this would end up with base64 encoded images inline while the UX for it is similar in flow to drag and dropping images or selecting via menu in other UIs.

minrk · 2016-02-22T10:29:16Z

A summary of the security/sanitization we do and why:

untrusted HTML/js must never be displayed without explicit user action
Code cells have an obvious action to take: execute
Markdown cells also have an obvious action to take: render
Instead of tracking whether the user has explicitly trusted markdown cells, we treat them as always untrusted, which means sanitizing markdown results, but we can display them even when the notebook is untrusted
A dedicated display-data cell would need to have to have the same trust semantics as other cells (explicit trust before display or sanitization)

If we treat markdown cells as we do code cells, that means that an untrusted notebook needs to open with all markdown cells unrendered, and they must be 'executed' manually by the user before they are allowed to render on the page. The same would be true if we had dedicated mimebundle cells.

So the question to me is, really, a user-experience one on which I don't have an informed opinion. Should we:

sanitize markdown, in which case we always get to display it (what we do now)
allow arbitrary HTML/js in markdown, in which case we cannot display it until it is trusted

There are technically other options, like "display but sanitize only untrusted markdown", which is attractive because it will do what people want the most often, but I think it's the most confusing when sanitization of untrusted markdown produces a materially different result, where we have to communicate to users that they must"unrender then re-render the cell to see it how it's meant to be."

takluyver · 2016-02-22T10:53:35Z

Is it possible to detect whether there's anything in a block of markdown/html which would be affected by sanitisation? I.e. could we check if each markdown cell is 'safe' and decide to display it rendered or unrendered?

minrk · 2016-02-22T10:58:46Z

Yes, that is possible.

takluyver · 2016-02-22T11:06:45Z

If that's doable, I think that would be preferable. But it would probably also cause some confusion - why are some of these cells showing up rendered and others unrendered? Maybe we could add a little untrusted indicator by the unrendered cells, and offer a bit of explanation on mouseover/click.

minrk · 2016-02-22T11:59:17Z

Sure, so then the plan would be:

track 'trusted' on markdown, just like code cells
sanitize if untrusted
show indicator / more info if sanitization made any changes
explicit render of markdown cell indicates trust just like cell execution

Step 1. requires a change in nbformat to keep track of the trusted flag on markdown cells. Perhaps that trust->sign code really belongs in the notebook repo anyway (not 100% sure), since it's specifically a transform from the nbformat file format to 'live' document state that's specific to this webapp.

One (lazy) version of that indicator could be to leave the cell unrendered if it's untrusted and sanitization would make some change.

@rgbkrk is there anything display cells would provide that this proposal would not, or do you still think that display cells are something we should add?

takluyver · 2016-02-22T12:02:48Z

sanitize if untrusted

To be precise, my proposal is not to sanitise (or not to display sanitised output), but to display the markdown cell unrendered if sanitisation would make any changes and it's untrusted.

minrk · 2016-02-22T12:32:37Z

Gotcha, that ought to be doable.

minrk · 2016-02-22T13:23:31Z

I got the basics going for that at #1126.

rgbkrk · 2016-02-22T16:00:37Z

Trust on markdown cells is a good starting point and we should see how it works in practice, via @lbustelo.

There's a larger question about what the markdown cells do and how they're specified:

Does the grammar for commonmark completely respect any embedded html?
What's the specification for embedded Math? Right now it seems like it's a function of the implementation in the notebook at any given time (see issues on nbviewer, github's rendering of notebooks, etc.)

If we couple ourselves to always using markdown cells for embedded HTML, these are the primary user experience and developer experience cons:

Unable to support a WYSIWYM editor like ProseMirror? If we want the notebook to be approachable to a lot more analysts, this is a big one.
In contexts where the outputs need to be sandboxed (O'Reilly Media's site, any future revisions on nbviewer), where is this HTML declared. Is it a global context on the page? How do we make it not interfere with the rest of the page? For multi-user collaboration, how do you reason about a document that can change underneath you?

takluyver · 2016-02-22T16:09:16Z

Unable to support a WYSIWYM editor like ProseMirror?

It would make this trickier, but I don't think it should be impossible. I expect that most markdown cells would still contain relatively simple Markdown which could be edited like that.

Of course, if there are two different behaviours for markdown cells based on their contents, maybe they should be two cell types. That would be a bigger change to the notebook format, though.

lbustelo · 2016-02-22T16:43:00Z

Using the Markdown cell (once sanitation issues are solved) addresses 2 of my main concerns:

HTML magic (and others like Javascript) places a requirement on the kernel that for many use-cases seem unreasonable and, for certain languages, not practical to implement (i.e Support %% cell magic syntax and more specifically %%HTML IRkernel/IRkernel#260).
From a UX point of view, it is beneficial to have a well defined place to author client side content. Having to type %%HTML always felt kludgy.

Having said that... there are so many ways to author client side content that may not fit as nicely in the Markdown cell and might be better suited if we had some level of extensibility around cell types. @bollwyvl brought up https://github.com/pugjs/jade as another alternative to avoid typing HTML. There is always some new flavor of the month.

Also as @rgbkrk hinted at with WYSIWYG comment, the maturation of the Notebook space is going to lead to higher level authoring experiences. Jupyter and the NB format should somehow accommodate for that to avoid being overshadowed by the countless alternatives that are popping up all over.

I understand the importance of the downstream tools (i.e. nbviewer) and the hesitation of an open set of cell types, but as notebooks become platforms for solution development, I think this issue is going to become more and more important to solve.

rgbkrk · 2016-02-23T04:20:07Z

Quick demo for you using draft-js and KaTeX, for a rich editor that has block maths:

bollwyvl · 2016-02-23T06:33:07Z

Great, glad to see conversation about this topic! I've been tracking it for some time now, and am really interested in how this round turns out! I was indeed convinced then that make a cell type for everything is not the answer... but perhaps it is time to start thinking about some of the UI pieces around rich output that haven't fundamentally changed since then.

Background
For a while now, I've been making magics/extensions/widgets/tricks/fancies to get closer to authoring particular kinds of wysiyMean, ala lyx/prosemirror. We've come a long way, if you are willing to jump through many hoops!

Front End Stuff
+1 common mark as a container for robot-authored Stuff (JS, HTML, CSS, SVG), even if we have to push forward the maths. If you read/write markdown and all those things, you're kind of a robot, so you should be fine.

Even if adopting commonmark for all that Stuff, and getting our sanitization house in order, the authoring of said Stuff in the browser (in the notebook) probably does need some love.

CodeMirror has our back, and has completion and linting for all of those, and many of the transpiled languages which we'd want to see with full support of npm.
This would be backed by the nascent "front end ʞernel", and is what we'd need anyway to make JupyterLab into the best place to build Lab/Notebook extensions.
Just ditching out to the text editor would lose the immediacy of being in-browser...
- and heck, maybe there's something to this stuff-embedded-in-json idea: there is something kind of beauitful in the vulcanized polymer components: here's your nbextension... as a notebook!

User-authored text
The exception to this is, I think, user-authored complex prose documents.

If we were to embark on this path, I think it would end up having:

typography-focused approach to the UI of rich text authoring
- much like @rgbkrk's demo (:heart: it!) fade out all the UI, embrace the keyboard shortcuts, and start writing your journal paper. or letter. or contract. or whatever.
a document-oriented data structure for the content, likely via prosemirror...
- footnotes!
- sidebars!
- comments!
- images with captions!
- numbered equations!
a no-fooling, extensible layout engine (nbpresent just hacks the css)
- parpers, posters, banners
robust themes that look publication grade out of the box

Then the question is... are these then even cells? or is this another, prose-native view of your notebook you have chosen, which can include cells embedded in it. Do you show them next to each other? Does this even go in an ipynb, or is this a separate file type altogether, or a wrapper around both kinds of file, or a PDF with a local file store?

Very exciting stuff, and hopefully a topic for the dev meeting!

minrk · 2016-02-23T18:48:57Z

That seems really appealing, and points to either a display-data cell as discussed here, or just an HTML cell, whose editor can be any HTML-authoring magic.

I believe with CommonMark's behavior, anything inside a <div> should be treated as raw HTML, which would mean that wrapping the entire markdown cell in a single div makes it a pure HTML cell. That should mean that people can experiment with wysiwyg html editors on top of markdown cells while we figure if a cell type is useful.

jasongrout · 2016-02-23T19:02:43Z

My understanding with commonmark is that any html tags should treat the enclosed material as html. That seems to be how the reference implementation behaves too: http://spec.commonmark.org/dingus/

bollwyvl · 2016-02-23T21:14:00Z

I believe with CommonMark's behavior, anything inside a <div> should be treated as raw HTML...

Yep, seems so. div, and these buddies:

Start condition: line begins the string < or </ followed by one of the strings (case-insensitive) address, article, aside, base, basefont, blockquote, body, caption, center, col, colgroup, dd, details, dialog, dir, div, dl, dt, fieldset, figcaption, figure, footer, form, frame, frameset, h1, head, header, hr, html, iframe, legend, li, link, main, menu, menuitem, meta, nav, noframes, ol, optgroup, option, p, param, section, source, summary, table, tbody, td, tfoot, th, thead, title, tr, track, ul, followed by whitespace, the end of the line, the string >, or the string />.
End condition: line is followed by a blank line.

Start condition: line begins with a complete open tag or closing tag (with any tag name other than script, style, or pre) followed only by whitespace or the end of the line.
End condition: line is followed by a blank line.

or just an HTML cell, whose editor can be any HTML-authoring magic

So, today, an inline prosemirror could serialize its JSON document model to cell metadata, and treat the source as its output.

Though a bit tubby on bytes, this is nice, as then you've still got a "dead pixel" version of the content if someone doesn't have nbJadeDustUnderscoreHandlebarsReactHAMLJinjaLiquidLessSCSSStylusCoffeeTypeScript.

I guess you'd have a helpful message at the top that suggested thou shalt not edit this cell, but since we don't lock cells, they'd be free to go on about their business if they didn't have your editor. Some translations even have reverse engineering capabilities, such as http://html2jade.org/, though this falls apart once you actually start using template features...

My understanding with commonmark is that any html tags should treat the enclosed material as html. That seems to be how the reference implementation behaves too: http://spec.commonmark.org/dingus/

This is the behavior of nbviewer, inherited from mistune, as discussed here: jupyter/nbviewer#526 (comment)

As described there, nbconvert( or nbviewer)'s configuration could be changed to mimic the live browser's marked, and the spec.

rgbkrk · 2016-02-23T21:38:05Z

The editor posted above comes straight out of draft-js, which has been used in production at Facebook for the last couple years and open sourced yesterday. It's React centric, though you can have React target any DOM element for rendering. I'm enjoying the APIs so far and the model underneath (people are using the same model in native apps now too).

That's a diversion from the real problem I'm worried about: what our specification is for the markdown cells themselves. It's not specced and is a reflection of the way the current user's notebook server implements it. When used on nbviewer, github, or other static renderings, if we wanted consistency, we'd have to match the version of marked, mistune, commonmark, etc. as well as MathJax that matches the notebook server they came from. If we keep the spec consistent with commonmark and suggest HTML somewhere else, we lessen the rendering bugs that get reported elsewhere and can build a clean model (a necessity for a WYSIWYG editor).

fperez · 2016-02-24T03:04:08Z

I don't have an answer here yet, just starting to digest the questions... But I want to throw one more data point into this topic: the Broad Institute's GenePattern Notebook exposes special input cells via an extension and a custom cell type, this page shows some examples.

This is a pattern that is also used by KBase for its computational biology apps and methods, with a slightly more complex approach b/c it was created earlier, when our notebook infrastructure was less mature (so the KBase team had to hack more).

This KBase/GenePatternNB approach fits certain use cases in biology really well, and it has made me think that we really need to find a clean, generic solution for it, as biology is not the only place where it's useful.

So I think we should approach this question trying to solve these slightly different, but ultimately related, use cases in a unified way...

If we don't get to a solution here, this should definitely be something to brainstorm on at the dev meeting! I've put it on the agenda.

jasongrout · 2016-02-24T14:36:18Z

As @bollwyvl pointed out in #1123 (comment), the important thing in commonmark for html blocks is that there are no blank lines (e.g., it doesn't do html between div tags, it just pays attention to blank lines).

minrk · 2016-02-24T14:57:41Z

Aha, interesting note about the blank lines, thanks. That seems a bit weird, but does point to us making a specific HTML cell (mimebundle or otherwise) that wysiwyzards can sit on. I wouldn't want to be saying "Make sure you don't add any empty lines in your HTML, or it'll interpret it as another language".

jasongrout · 2016-02-24T15:02:52Z

To throw another thing into the discussion, I also worked on wysiwyg tools for generating code directly in code cells: http://bl.ocks.org/jasongrout/5378313.

takluyver · 2016-02-24T15:10:34Z

@jasongrout that's very cool!

fperez · 2016-02-24T15:49:50Z

@jasongrout +1!

tonyfast · 2016-04-09T03:21:31Z

I was trying to find out how to use the cell type to create to special cells and I found this awesome thread.

I have been building presentations with nbpresent for data-driven interactive documents. My first lesson was that too much code makes creating a presentation from a single notebook unwieldy.

There needs to be a tighter way to connect variables on the kernel with HTML and Javascript. I came up with this small cell magic called literacy. %%literate processes the cell as literate markdown where the code fences are executed if the language is understood. Moreover, the entire cell is a jinja2 template, well a few templates, which means that data can passed from the kernel to the presentation and made interactive.

This notebook showcases literacy. Some cool features are:

Markdown cells that create variables in python and insert them into the template.
Coffeescript, javascript, and pyscript to change state.
A DataFrame inserted directly into markdown after making a request.
Hidden code cells and disqus comments. Click the toggle input button to show the source.

I feel like there is a cool editor in this idea.

lbustelo · 2016-04-09T12:57:04Z

@tonyfast checkout jupyter-declarativewidgets. The main focus of that work is to connect data and functions from a kernel with visual interactive areas in the notebook authored using the html magic.

rgbkrk · 2016-04-09T14:01:10Z

Mega 👍 to the declarative widgets

takluyver · 2016-04-09T16:15:05Z

Neat, thanks for posting that, @tonyfast

fperez · 2016-04-13T23:05:00Z

Very neat, @tonyfast!

twavv · 2017-08-26T22:49:43Z

Looks like this issue has stalled...

I've been pestering @minrk about this yesterday and today as it would make my life (as someone developing an extension that wants to add some fundamentally new functionality to the notebook) a whole lot easier. Currently, the only option I see (short of forking the notebook) is to hijack markdown cells with certain attributes and override the render method and create my own DOM elements outside of the codemirror/rendered output areas.

Is the plan still to implement a mimebundle type cell in the notebook (outside of jupyterlab)? This would also make #1999 more-or-less trivial to implement as an extension which would be a huge plus; just set mime type to image/png and have an extension include some metadata with the cell to provide its own editor (as an html canvas editor).

rgbkrk · 2017-08-27T01:24:54Z

These are the steps that would need to be taken, ignoring whether or not people want this in the core document format. Many likely need to be done in parallel across the repos.

Create a PR including it in the core nbformat spec -- this would (possibly) be a minor bump in the format - https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json
Write integration tests for the nbformat repo for this new cell type
Verify that a plain unmodified jupyter notebook server works properly with this new cell type
Patch the notebook to handle this new type in a non-obtrusive way (if broken above)

All that being said, your workaround is to establish a way to do this in a raw cell. Stick whatever you want in metadata, including the serialized version of what you want to do.

minrk · 2017-08-27T01:49:05Z

v4 of the notebook format is officially designed such that a new notebook cell type is a minor revision of the notebook format. UIs that see unrecognized cell types in minor format revisions newer than the latest they support should handle it, even if they can't display them.

This would be the first change to exercise this behavior, though, so it will be interesting to see if the upgrade experience goes as promised.

rgbkrk · 2017-08-27T19:48:32Z

Whoops, corrected the above. I see how unrecognized_cell is declared now.

ellisonbg · 2017-09-08T04:09:09Z

Overall, I like this idea. I am a bit worried about the unintended side effects of the decision. This change would make the jupyter notebook format an essentially a sequence of arbitrary content specified by MIME types (standard ones and made up ones). That is getting dangerously close to saying "put anything you want in a notebook" and not having any ability to reason about what you might find in a given notebook. At the same time, this already perfectly describes output, so maybe it isn't a big deal. I can clearly imagine *many* use cases for it. Because of the broad impact across the entire project, I would prefer this be proposed as a Jupyter Enhacement Proposal first.

…

On Sun, Aug 27, 2017 at 12:48 PM, Kyle Kelley ***@***.***> wrote: Whoops, corrected the above. I see how unrecognized_cell is declared now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1123 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0FOO5jiq1_GVRCAq8NfmhYygNDbUks5sccgVgaJpZM4He2Y5> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

minrk mentioned this issue Feb 22, 2016

[WIP] add trust to markdown cells #1126

Closed

bollwyvl mentioned this issue Feb 23, 2016

Add PDF rendering jupyter/nbviewer#577

Open

fperez mentioned this issue Feb 24, 2016

Support %% cell magic syntax and more specifically %%HTML IRkernel/IRkernel#260

Closed

Carreau added this to the wishlist milestone Jun 27, 2016

rgbkrk mentioned this issue Dec 21, 2016

Inline Whiteboard #1999

Open

minrk mentioned this issue Aug 28, 2017

August 2017 Team Meeting Agenda Topics jupyter/governance#36

Closed

Display data as a cell type #1123

Display data as a cell type #1123

Comments

rgbkrk commented Feb 21, 2016

ellisonbg commented Feb 21, 2016 via email

takluyver commented Feb 21, 2016

willingc commented Feb 21, 2016

julienr commented Feb 21, 2016

ellisonbg commented Feb 21, 2016

rgbkrk commented Feb 22, 2016

minrk commented Feb 22, 2016

takluyver commented Feb 22, 2016

minrk commented Feb 22, 2016

takluyver commented Feb 22, 2016

minrk commented Feb 22, 2016

takluyver commented Feb 22, 2016

minrk commented Feb 22, 2016

minrk commented Feb 22, 2016

rgbkrk commented Feb 22, 2016

takluyver commented Feb 22, 2016

lbustelo commented Feb 22, 2016

rgbkrk commented Feb 23, 2016

bollwyvl commented Feb 23, 2016

minrk commented Feb 23, 2016

jasongrout commented Feb 23, 2016

bollwyvl commented Feb 23, 2016

rgbkrk commented Feb 23, 2016

fperez commented Feb 24, 2016

jasongrout commented Feb 24, 2016

minrk commented Feb 24, 2016

jasongrout commented Feb 24, 2016

takluyver commented Feb 24, 2016

fperez commented Feb 24, 2016

tonyfast commented Apr 9, 2016

lbustelo commented Apr 9, 2016

rgbkrk commented Apr 9, 2016

takluyver commented Apr 9, 2016

fperez commented Apr 13, 2016

twavv commented Aug 26, 2017

rgbkrk commented Aug 27, 2017 • edited Loading

minrk commented Aug 27, 2017 • edited Loading

rgbkrk commented Aug 27, 2017

ellisonbg commented Sep 8, 2017 via email

rgbkrk commented Aug 27, 2017 •

edited

Loading

minrk commented Aug 27, 2017 •

edited

Loading