Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display data as a cell type #1123

Open
rgbkrk opened this issue Feb 21, 2016 · 39 comments
Open

Display data as a cell type #1123

rgbkrk opened this issue Feb 21, 2016 · 39 comments
Milestone

Comments

@rgbkrk
Copy link
Member

rgbkrk commented Feb 21, 2016

On the heels of #621 and IRkernel/IRkernel#260, I'm wondering about a cell that matches display data semantics. We already have a way to display mimetype: data bundles.

Here's an example UX flow. A user goes to insert an image either via drag and drop or a menu:

Insert
    --> Image
    --> HTML

The image then ends up embedded in the document as if it was injected using magics or running code.

Thinking on @lbustelo's use case for the declarative widgets, this would allow cross-language support for writing direct HTML cells without the use of magics. The cell then has two states - edit and view, just like the markdown cells.

This also spares the idea of having to do garbage collection in #621.

/cc @lbustelo @parente @julienr @jdfreder

@ellisonbg
Copy link
Contributor

ellisonbg commented Feb 21, 2016 via email

@takluyver
Copy link
Member

@ibustelo has run into some sort of sanitisation that we do on markdown cells, so you can't write arbitrary HTML in them.

I think that was prompted by the security discussion - we decided to sanitise markdown all the time, so that the signatures and trust mechanism only deal with outputs from code cells.

So I think the crucial thing we need to work out is what the user interface for trusting arbitrary HTML in something like a markdown cell is, equivalent to running a code cell to trust it. Should we present untrusted cells unrendered and let the user render them to trust them?

@willingc
Copy link
Member

Ping @minrk

@julienr
Copy link
Contributor

julienr commented Feb 21, 2016

If I understand this correctly, I think this can be orthogonal to #621 . Even if you could insert a special "image" cell, I don't think this would cover all the use cases for inline images. For example, you might want to do layout on your inline images (e.g. put them in a table) that you cannot do with a simple image cell type.

@ellisonbg
Copy link
Contributor

@takluyver great point, yes I guess we do treat the code/markdown cells
differently from a security standpoint. This clarifies an important part of
this. The idea of using Markdown cells for this purpose, but treating them
more like code cells seems like a pretty straightforward approach. What are
the downsides of that?

Creating a new cell type that supports arbitrary display mime-types, but
can still be human edited like a Markdown seems a bit awkward still. Trying
to get my head around it. Would this be like an input-less code cell? Or a
code-cell whose input is its output?

On Sun, Feb 21, 2016 at 2:46 AM, Thomas Kluyver notifications@github.com
wrote:

@ibustelo has run into some sort of sanitisation that we do on markdown
cells, so you can't write arbitrary HTML in them.

I think that was prompted by the security discussion - we decided to
sanitise markdown all the time, so that the signatures and trust mechanism
only deal with outputs from code cells.

So I think the crucial thing we need to work out is what the user
interface for trusting arbitrary HTML in something like a markdown cell is,
equivalent to running a code cell to trust it. Should we present untrusted
cells unrendered and let the user render them to trust them?


Reply to this email directly or view it on GitHub
#1123 (comment).

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 22, 2016

Here's what I imagine for what is represented as an HTML cell for users when they're editing it:

group

Followed by what it looks like when rendered:

group 2

In this example I'm showing a pencil icon to switch back to edit mode, though we can refine that.

As for inserting it, to the user they don't see what the underlying representation is for this cell which I'm imagining to be:

{
 "cell_type": "data",
 "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Email</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>jane@doe.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>John Doe</td>\n",
       "      <td>john@doe.com</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     }
}

Similary for images, this would end up with base64 encoded images inline while the UX for it is similar in flow to drag and dropping images or selecting via menu in other UIs.

@minrk
Copy link
Member

minrk commented Feb 22, 2016

A summary of the security/sanitization we do and why:

  1. untrusted HTML/js must never be displayed without explicit user action
  2. Code cells have an obvious action to take: execute
  3. Markdown cells also have an obvious action to take: render
  4. Instead of tracking whether the user has explicitly trusted markdown cells, we treat them as always untrusted, which means sanitizing markdown results, but we can display them even when the notebook is untrusted
  5. A dedicated display-data cell would need to have to have the same trust semantics as other cells (explicit trust before display or sanitization)

If we treat markdown cells as we do code cells, that means that an untrusted notebook needs to open with all markdown cells unrendered, and they must be 'executed' manually by the user before they are allowed to render on the page. The same would be true if we had dedicated mimebundle cells.

So the question to me is, really, a user-experience one on which I don't have an informed opinion. Should we:

  1. sanitize markdown, in which case we always get to display it (what we do now)
  2. allow arbitrary HTML/js in markdown, in which case we cannot display it until it is trusted

There are technically other options, like "display but sanitize only untrusted markdown", which is attractive because it will do what people want the most often, but I think it's the most confusing when sanitization of untrusted markdown produces a materially different result, where we have to communicate to users that they must"unrender then re-render the cell to see it how it's meant to be."

@takluyver
Copy link
Member

Is it possible to detect whether there's anything in a block of markdown/html which would be affected by sanitisation? I.e. could we check if each markdown cell is 'safe' and decide to display it rendered or unrendered?

@minrk
Copy link
Member

minrk commented Feb 22, 2016

Yes, that is possible.

@takluyver
Copy link
Member

If that's doable, I think that would be preferable. But it would probably also cause some confusion - why are some of these cells showing up rendered and others unrendered? Maybe we could add a little untrusted indicator by the unrendered cells, and offer a bit of explanation on mouseover/click.

@minrk
Copy link
Member

minrk commented Feb 22, 2016

Sure, so then the plan would be:

  1. track 'trusted' on markdown, just like code cells
  2. sanitize if untrusted
  3. show indicator / more info if sanitization made any changes
  4. explicit render of markdown cell indicates trust just like cell execution

Step 1. requires a change in nbformat to keep track of the trusted flag on markdown cells. Perhaps that trust->sign code really belongs in the notebook repo anyway (not 100% sure), since it's specifically a transform from the nbformat file format to 'live' document state that's specific to this webapp.

One (lazy) version of that indicator could be to leave the cell unrendered if it's untrusted and sanitization would make some change.

@rgbkrk is there anything display cells would provide that this proposal would not, or do you still think that display cells are something we should add?

@takluyver
Copy link
Member

  1. sanitize if untrusted

To be precise, my proposal is not to sanitise (or not to display sanitised output), but to display the markdown cell unrendered if sanitisation would make any changes and it's untrusted.

@minrk
Copy link
Member

minrk commented Feb 22, 2016

Gotcha, that ought to be doable.

@minrk
Copy link
Member

minrk commented Feb 22, 2016

I got the basics going for that at #1126.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 22, 2016

Trust on markdown cells is a good starting point and we should see how it works in practice, via @lbustelo.

There's a larger question about what the markdown cells do and how they're specified:

  • Does the grammar for commonmark completely respect any embedded html?
  • What's the specification for embedded Math? Right now it seems like it's a function of the implementation in the notebook at any given time (see issues on nbviewer, github's rendering of notebooks, etc.)

If we couple ourselves to always using markdown cells for embedded HTML, these are the primary user experience and developer experience cons:

  • Unable to support a WYSIWYM editor like ProseMirror? If we want the notebook to be approachable to a lot more analysts, this is a big one.
  • In contexts where the outputs need to be sandboxed (O'Reilly Media's site, any future revisions on nbviewer), where is this HTML declared. Is it a global context on the page? How do we make it not interfere with the rest of the page? For multi-user collaboration, how do you reason about a document that can change underneath you?

@takluyver
Copy link
Member

Unable to support a WYSIWYM editor like ProseMirror?

It would make this trickier, but I don't think it should be impossible. I expect that most markdown cells would still contain relatively simple Markdown which could be edited like that.

Of course, if there are two different behaviours for markdown cells based on their contents, maybe they should be two cell types. That would be a bigger change to the notebook format, though.

@lbustelo
Copy link

Using the Markdown cell (once sanitation issues are solved) addresses 2 of my main concerns:

  1. HTML magic (and others like Javascript) places a requirement on the kernel that for many use-cases seem unreasonable and, for certain languages, not practical to implement (i.e Support %% cell magic syntax and more specifically %%HTML IRkernel/IRkernel#260).
  2. From a UX point of view, it is beneficial to have a well defined place to author client side content. Having to type %%HTML always felt kludgy.

Having said that... there are so many ways to author client side content that may not fit as nicely in the Markdown cell and might be better suited if we had some level of extensibility around cell types. @bollwyvl brought up https://github.com/pugjs/jade as another alternative to avoid typing HTML. There is always some new flavor of the month.

Also as @rgbkrk hinted at with WYSIWYG comment, the maturation of the Notebook space is going to lead to higher level authoring experiences. Jupyter and the NB format should somehow accommodate for that to avoid being overshadowed by the countless alternatives that are popping up all over.

I understand the importance of the downstream tools (i.e. nbviewer) and the hesitation of an open set of cell types, but as notebooks become platforms for solution development, I think this issue is going to become more and more important to solve.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 23, 2016

Quick demo for you using draft-js and KaTeX, for a rich editor that has block maths:

latex

@bollwyvl
Copy link
Contributor

Great, glad to see conversation about this topic! I've been tracking it for some time now, and am really interested in how this round turns out! I was indeed convinced then that make a cell type for everything is not the answer... but perhaps it is time to start thinking about some of the UI pieces around rich output that haven't fundamentally changed since then.

Background
For a while now, I've been making magics/extensions/widgets/tricks/fancies to get closer to authoring particular kinds of wysiyMean, ala lyx/prosemirror. We've come a long way, if you are willing to jump through many hoops!

Front End Stuff
+1 common mark as a container for robot-authored Stuff (JS, HTML, CSS, SVG), even if we have to push forward the maths. If you read/write markdown and all those things, you're kind of a robot, so you should be fine.

Even if adopting commonmark for all that Stuff, and getting our sanitization house in order, the authoring of said Stuff in the browser (in the notebook) probably does need some love.

  • CodeMirror has our back, and has completion and linting for all of those, and many of the transpiled languages which we'd want to see with full support of npm.
  • This would be backed by the nascent "front end ʞernel", and is what we'd need anyway to make JupyterLab into the best place to build Lab/Notebook extensions.
  • Just ditching out to the text editor would lose the immediacy of being in-browser...
    • and heck, maybe there's something to this stuff-embedded-in-json idea: there is something kind of beauitful in the vulcanized polymer components: here's your nbextension... as a notebook!

User-authored text
The exception to this is, I think, user-authored complex prose documents.

If we were to embark on this path, I think it would end up having:

  • typography-focused approach to the UI of rich text authoring
    • much like @rgbkrk's demo (:heart: it!) fade out all the UI, embrace the keyboard shortcuts, and start writing your journal paper. or letter. or contract. or whatever.
  • a document-oriented data structure for the content, likely via prosemirror...
    • footnotes!
    • sidebars!
    • comments!
    • images with captions!
    • numbered equations!
  • a no-fooling, extensible layout engine (nbpresent just hacks the css)
    • parpers, posters, banners
  • robust themes that look publication grade out of the box

Then the question is... are these then even cells? or is this another, prose-native view of your notebook you have chosen, which can include cells embedded in it. Do you show them next to each other? Does this even go in an ipynb, or is this a separate file type altogether, or a wrapper around both kinds of file, or a PDF with a local file store?

Very exciting stuff, and hopefully a topic for the dev meeting!

@minrk
Copy link
Member

minrk commented Feb 23, 2016

That seems really appealing, and points to either a display-data cell as discussed here, or just an HTML cell, whose editor can be any HTML-authoring magic.

I believe with CommonMark's behavior, anything inside a <div> should be treated as raw HTML, which would mean that wrapping the entire markdown cell in a single div makes it a pure HTML cell. That should mean that people can experiment with wysiwyg html editors on top of markdown cells while we figure if a cell type is useful.

@jasongrout
Copy link
Member

My understanding with commonmark is that any html tags should treat the enclosed material as html. That seems to be how the reference implementation behaves too: http://spec.commonmark.org/dingus/

@bollwyvl
Copy link
Contributor

I believe with CommonMark's behavior, anything inside a <div> should be treated as raw HTML...

Yep, seems so. div, and these buddies:

Start condition: line begins the string < or </ followed by one of the strings (case-insensitive) address, article, aside, base, basefont, blockquote, body, caption, center, col, colgroup, dd, details, dialog, dir, div, dl, dt, fieldset, figcaption, figure, footer, form, frame, frameset, h1, head, header, hr, html, iframe, legend, li, link, main, menu, menuitem, meta, nav, noframes, ol, optgroup, option, p, param, section, source, summary, table, tbody, td, tfoot, th, thead, title, tr, track, ul, followed by whitespace, the end of the line, the string >, or the string />.
End condition: line is followed by a blank line.

Start condition: line begins with a complete open tag or closing tag (with any tag name other than script, style, or pre) followed only by whitespace or the end of the line.
End condition: line is followed by a blank line.

or just an HTML cell, whose editor can be any HTML-authoring magic

So, today, an inline prosemirror could serialize its JSON document model to cell metadata, and treat the source as its output.

Though a bit tubby on bytes, this is nice, as then you've still got a "dead pixel" version of the content if someone doesn't have nbJadeDustUnderscoreHandlebarsReactHAMLJinjaLiquidLessSCSSStylusCoffeeTypeScript.

I guess you'd have a helpful message at the top that suggested thou shalt not edit this cell, but since we don't lock cells, they'd be free to go on about their business if they didn't have your editor. Some translations even have reverse engineering capabilities, such as http://html2jade.org/, though this falls apart once you actually start using template features...

My understanding with commonmark is that any html tags should treat the enclosed material as html. That seems to be how the reference implementation behaves too: http://spec.commonmark.org/dingus/

This is the behavior of nbviewer, inherited from mistune, as discussed here: jupyter/nbviewer#526 (comment)

As described there, nbconvert( or nbviewer)'s configuration could be changed to mimic the live browser's marked, and the spec.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 23, 2016

The editor posted above comes straight out of draft-js, which has been used in production at Facebook for the last couple years and open sourced yesterday. It's React centric, though you can have React target any DOM element for rendering. I'm enjoying the APIs so far and the model underneath (people are using the same model in native apps now too).

That's a diversion from the real problem I'm worried about: what our specification is for the markdown cells themselves. It's not specced and is a reflection of the way the current user's notebook server implements it. When used on nbviewer, github, or other static renderings, if we wanted consistency, we'd have to match the version of marked, mistune, commonmark, etc. as well as MathJax that matches the notebook server they came from. If we keep the spec consistent with commonmark and suggest HTML somewhere else, we lessen the rendering bugs that get reported elsewhere and can build a clean model (a necessity for a WYSIWYG editor).

@fperez
Copy link
Member

fperez commented Feb 24, 2016

I don't have an answer here yet, just starting to digest the questions... But I want to throw one more data point into this topic: the Broad Institute's GenePattern Notebook exposes special input cells via an extension and a custom cell type, this page shows some examples.

This is a pattern that is also used by KBase for its computational biology apps and methods, with a slightly more complex approach b/c it was created earlier, when our notebook infrastructure was less mature (so the KBase team had to hack more).

This KBase/GenePatternNB approach fits certain use cases in biology really well, and it has made me think that we really need to find a clean, generic solution for it, as biology is not the only place where it's useful.

So I think we should approach this question trying to solve these slightly different, but ultimately related, use cases in a unified way...

If we don't get to a solution here, this should definitely be something to brainstorm on at the dev meeting! I've put it on the agenda.

@jasongrout
Copy link
Member

As @bollwyvl pointed out in #1123 (comment), the important thing in commonmark for html blocks is that there are no blank lines (e.g., it doesn't do html between div tags, it just pays attention to blank lines).

@minrk
Copy link
Member

minrk commented Feb 24, 2016

Aha, interesting note about the blank lines, thanks. That seems a bit weird, but does point to us making a specific HTML cell (mimebundle or otherwise) that wysiwyzards can sit on. I wouldn't want to be saying "Make sure you don't add any empty lines in your HTML, or it'll interpret it as another language".

@jasongrout
Copy link
Member

To throw another thing into the discussion, I also worked on wysiwyg tools for generating code directly in code cells: http://bl.ocks.org/jasongrout/5378313.
inlinewidgets

@takluyver
Copy link
Member

@jasongrout that's very cool!

@fperez
Copy link
Member

fperez commented Feb 24, 2016

@jasongrout +1!

@tonyfast
Copy link
Collaborator

tonyfast commented Apr 9, 2016

I was trying to find out how to use the cell type to create to special cells and I found this awesome thread.

I have been building presentations with nbpresent for data-driven interactive documents. My first lesson was that too much code makes creating a presentation from a single notebook unwieldy.

There needs to be a tighter way to connect variables on the kernel with HTML and Javascript. I came up with this small cell magic called literacy. %%literate processes the cell as literate markdown where the code fences are executed if the language is understood. Moreover, the entire cell is a jinja2 template, well a few templates, which means that data can passed from the kernel to the presentation and made interactive.

This notebook showcases literacy. Some cool features are:

  • Markdown cells that create variables in python and insert them into the template.
  • Coffeescript, javascript, and pyscript to change state.
  • A DataFrame inserted directly into markdown after making a request.
  • Hidden code cells and disqus comments. Click the toggle input button to show the source.

I feel like there is a cool editor in this idea.

@lbustelo
Copy link

lbustelo commented Apr 9, 2016

@tonyfast checkout jupyter-declarativewidgets. The main focus of that work is to connect data and functions from a kernel with visual interactive areas in the notebook authored using the html magic.

@rgbkrk
Copy link
Member Author

rgbkrk commented Apr 9, 2016

Mega 👍 to the declarative widgets

@takluyver
Copy link
Member

Neat, thanks for posting that, @tonyfast

@fperez
Copy link
Member

fperez commented Apr 13, 2016

Very neat, @tonyfast!

@Carreau Carreau added this to the wishlist milestone Jun 27, 2016
@twavv
Copy link

twavv commented Aug 26, 2017

Looks like this issue has stalled...

I've been pestering @minrk about this yesterday and today as it would make my life (as someone developing an extension that wants to add some fundamentally new functionality to the notebook) a whole lot easier. Currently, the only option I see (short of forking the notebook) is to hijack markdown cells with certain attributes and override the render method and create my own DOM elements outside of the codemirror/rendered output areas.

Is the plan still to implement a mimebundle type cell in the notebook (outside of jupyterlab)? This would also make #1999 more-or-less trivial to implement as an extension which would be a huge plus; just set mime type to image/png and have an extension include some metadata with the cell to provide its own editor (as an html canvas editor).

@rgbkrk
Copy link
Member Author

rgbkrk commented Aug 27, 2017

These are the steps that would need to be taken, ignoring whether or not people want this in the core document format. Many likely need to be done in parallel across the repos.

  1. Create a PR including it in the core nbformat spec -- this would (possibly) be a minor bump in the format - https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json
  2. Write integration tests for the nbformat repo for this new cell type
  3. Verify that a plain unmodified jupyter notebook server works properly with this new cell type
  4. Patch the notebook to handle this new type in a non-obtrusive way (if broken above)

All that being said, your workaround is to establish a way to do this in a raw cell. Stick whatever you want in metadata, including the serialized version of what you want to do.

@minrk
Copy link
Member

minrk commented Aug 27, 2017

v4 of the notebook format is officially designed such that a new notebook cell type is a minor revision of the notebook format. UIs that see unrecognized cell types in minor format revisions newer than the latest they support should handle it, even if they can't display them.

This would be the first change to exercise this behavior, though, so it will be interesting to see if the upgrade experience goes as promised.

@rgbkrk
Copy link
Member Author

rgbkrk commented Aug 27, 2017

Whoops, corrected the above. I see how unrecognized_cell is declared now.

@ellisonbg
Copy link
Contributor

ellisonbg commented Sep 8, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests