Changes to how stream outputs are handled in Notebooks #161023

DonJayamanne · 2022-09-16T00:36:43Z

Related issues #146768, microsoft/vscode-jupyter#11031

Multiple Output Items per Output

Today we can have multiple output items per output
However all output items in an output must have a unique mime type.
Thus its not possible to have two stderr output items in the same output.
This is check is done in extHostTypes.ensureUniqueMimeTypes
I.e. if you were to call appendOutputItem with stdout multiple times, only the first stdout is stored in the model.
E.g. if you were to call appendOutputItem with stdout twice with the values 1 and 2 respectively, then you'd end up with a single output item with the value 1 in the model. The second stdout output item is ignored.

Changes to the API

We will support calling appendOutputItem multiple times for the same stream mime types per output.
However we'd merge the stream output items into one and we'd still end up with a single output item per mime type.
E.g. if you were to call appendOutputItem with stdout twice with the values 1 and 2 respectively, then you'd end up with a single output item with the value 12 in the model.

This change applies only to the stream mime types which include:
- 'application/vnd.code.notebook.stdout',
- 'application/x.notebook.stdout',
- 'application/x.notebook.stream',
- 'application/vnd.code.notebook.stderr',
- 'application/x.notebook.stderr'

Current approach

Summary: Jupyter extension gets new output for each cell and compresses/merges the outpu items, and replaces the existing output with new output. E.g. if you get a new stream output, we replace the previous output items with a new one.

Draw backs:

Jupyter extension is always replacing existing output with new output
As Jupyter gets new output, the output grows and this large output is sent to Renderer process and form Renderer process back to Ext Host.
- I.e. we send large strings back and forth (data sent back and forth grows over time)
- Sending this ever growing large output back and forth has the potential to crash vs code.

sequenceDiagram
  participant Jupyter
  participant Extension Host
  participant VSCode Renderer Proc
  Jupyter->>Extension Host: Append Kernel Output `Line1` using `Task.appendOutput` 
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._spliceNotebookCellOutputs2` 
  Note right of VSCode Renderer Proc: Model updated to contain <br/>Output with OutputItem `Line1`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for new Output `Line1`
  Note right of Extension Host: Model updated to contain <br/>Output with OutputItem `Line1`
  Jupyter->>Extension Host: Kernel output is `Line2`<br/>Merge previous outputs with new to get `Line1Line2`<br/>Replace output with new value `Line1Line2`<br/>Using `Task.replaceOutputItems`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._replaceNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Model updated to replace output <br/>with new value `Line1Line2`.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for replace with new output item `Line1Line2`
  Note right of Extension Host: Model updated to replace output <br/>with new value `Line1Line2`.
  Jupyter->>Extension Host: Kernel output is `Line3`<br/>Merge previous outputs with new to get `Line1Line2Line3`<br/>Replace output with new value `Line1Line2Line3`<br/>Using `Task.replaceOutputItems`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._replaceNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Model updated to replace output <br/>with new value `Line1Line2Line3`.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for replace with new output item `Line1Line2Line3`
  Note right of Extension Host: Model updated to replace output <br/>with new value `Line1Line2Line3`.

Why do we need to merge/compress outputs

Assume we get output from the kernel as follows:

First kernel output is Line1.
Second kernel output is ESC[ALine2
Third kernel output is ESC[ALine3

Note: Assume ESC[A is a special terminal escape sequence that tells the terminal to remove the previous line.

Thus when we get the output ESC[ALine3, what needs to be displayed is Line3 and same Line3 saved in *.ipynb file (the prevous lines need to be dropped).
This compression/merging of outputs needs to happen in realtime, so users can see updates from terminal progress bars & the like (very common in Jupyter, installing packages and running experiments).

Proposed Solution

Summary: We send the incremental updates to Renderer process. And instead of appending the new items we compress the new output items with the existing items and replace the existing items with a single output item. I.e. we always have one output item in the cell output.

Draw backs:

Changes to behaviour of existing API or need a new API.
Current API method is appendOutputItems. Today this only supports having multiple output items with distinct mime types. If one were to call the API with the same mime type, the the new items get dropped.
The change here allows us to call appendOutputItems for the same mime type again without losing this data and this applies to streams mime types.

Optionally we could have a new method appendStreamOutputItems to be more explicit about the fact that we have special handling for stream output items.

Benefits:

We only send the new output items (incremental updates) to Renderer process and Extension host.
No additional processing required when rendering output in notebook renderer (output items are already merged/compressed)
No additional processing required when serializing the output in Notebook Serializer (output items are already merged/compressed)

sequenceDiagram
  participant Jupyter
  participant Extension Host
  participant VSCode Renderer Proc
  Jupyter->>Extension Host: Append Kernel Output `Line1` using `Task.appendOutput` 
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._spliceNotebookCellOutputs2` 
  Note right of VSCode Renderer Proc: Model updated to contain <br/>Output with OutputItem `Line1`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for new Output `Line1`
  Note right of Extension Host: Model updated to contain <br/>Output with OutputItem `Line1`
  Jupyter->>Extension Host: Kernel output is `Line2`<br/>Append new output item with value `Line2` using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: We now have 2 output items in the output model. `Line`, and `Line2`
  Note right of VSCode Renderer Proc: When sending data to `Notebook Renderer` we need to loop through 2 output items to merge into one.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line2`
  Note right of Extension Host: Model contains two output items, `Line` and `Line2`.
  Note right of Extension Host: Notebook Serializer needs to loop through 2 output items to merge into one.
  Jupyter->>Extension Host: Kernel output is `Line3`<br/>Append new output item with value `Line3` using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: We now have 3 output items in the output model. `Line`, `Line2`, and `Line3`
  Note right of VSCode Renderer Proc: When sending data to `Notebook Renderer` we need to loop through 3 output items to merge into one.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line3`
  Note right of Extension Host: Model contains two output items, `Line1`,`Line2` and `Line3`.
  Note right of Extension Host: Notebook Serializer needs to loop through 3 output items to merge into one.

Solution that we tried and found to not work well

Summary: We send the incremental updates to Renderer process. I.e. we always append output items. As we get new items this list of output items will grow.

Draw backs:

VS Code renderer process needs to merge/compress all output items into one for Notebook Output Renderer
IPynb Serializers Extension (in extension host) needs to merge/compress all output items to save to *.ipynb file.
Thus as the output items grow form 1 to n, we end up spending more and more time processing the same output items over & over in both places:
- I.e. renderer process is processing a large list for Notebook Renderer and as the list grows, this process is busy
- I.e. extension host is processing a large list for Serialization and as the list grows, this process is busy

Note: We have tried this approach and the results are not good. Basically VS Code is busy processing a large list of output items and this slows other parts of notebook.

Benefits:

We only send the new output items to Renderer process and Extension host.
- i.e. only send incremental updates

sequenceDiagram
  participant Jupyter
  participant Extension Host
  participant VSCode Renderer Proc
  Jupyter->>Extension Host: Append Kernel Output `Line1` using `Task.appendOutput` 
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._spliceNotebookCellOutputs2` 
  Note right of VSCode Renderer Proc: Model updated to contain <br/>Output with OutputItem `Line1`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for new Output `Line1`
  Note right of Extension Host: Model updated to contain <br/>Output with OutputItem `Line1`
  Jupyter->>Extension Host: Kernel output is `Line2`<br/>Append new output item with value `Line2`<br/>using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2`.<br/>Now we have one output item `Line1Line2`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line2`
  Note right of Extension Host: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2`.<br/>Now we have one output item `Line1Line2`
  Jupyter->>Extension Host: Kernel output is `Line3`<br/>Append new output item with value `Line3`<br/>using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2Line3`.<br/>Now we have one output item `Line1Line2Line3`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line3`
  Note right of Extension Host: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2Line3`.<br/>Now we have one output item `Line1Line2Line3`

The text was updated successfully, but these errors were encountered:

DonJayamanne · 2022-09-16T00:38:18Z

@roblourens @rebornix /cc

jrieken · 2022-09-20T07:35:15Z

Thanks for the nice write-up.

Some follow questions/idea

I am confused by the application/x.notebook.XYZ mime-types. We don't spell them out anywhere and officially (by the docs) the only custom mime types are application/vnd.code.notebook.[stdout|stderr|error]
Merging repeated mime-types (of special types) makes sense. Do we recall why we/I made the restriction in the first place? Is that because of non-mergable types like images or audio files?

I see the ensureUniqueMimeTypes-call but it is also local to a single invocation of appendOutputItems. Did you try to simply call appendOutputItems-twice? Since the actual implementation is debounced merging should happen in the same debounced way, e.g here in update, so that calling appendOutputItems with repeated mime-types or repeated calls without repeated mimes should be any different

DonJayamanne · 2022-09-20T08:12:14Z

confused by the application/x.notebook.XYZ mime-types. We don't spell them out anywhere and officially (by the docs) the only custom mime types are application/vnd.code.notebook.[stdout|stderr|error]

I only included those because we seemed to have a few of those streams. I agree the only one we need to merge are for application/vnd.code.notebook.[stdout|stderr] (the ones exposed via the shared methods on NotebookCellOutputItem)

Do we recall why we/I made the restriction in the first place? Is that because of non-mergable types like images or audio files?

Unfortunately I don't recall this.
Possible because, its just not possible to have multiple mime types repeated. For Jupyter we map each Jupyter output to a single NotebookCell output with single output items., and in Jupyter each Output is an Object and the keys are the mime types, which in turn map to an individual output item. Hence technically one cannot have 2 output items for the same mime type, as that would equate to two duplicate keys in a JS object, which isn't possible.

Did you try to simply call appendOutputItems-twice?

No didn't, i was primarily focused on merging the stream output items into single item.

DonJayamanne self-assigned this Sep 16, 2022

roblourens added api-proposal notebook-execution Issues related to running cells in a notebook labels Sep 20, 2022

roblourens added this to the September 2022 milestone Sep 20, 2022

lszomoru modified the milestones: September 2022, October 2022 Oct 5, 2022

DonJayamanne mentioned this issue Oct 9, 2022

Compress streams in notebook outputs #160946

Merged

DonJayamanne closed this as completed in #160946 Oct 11, 2022

VSCodeTriageBot added unreleased Patch has not yet been released in VS Code Insiders insiders-released Patch has been released in VS Code Insiders and removed unreleased Patch has not yet been released in VS Code Insiders labels Oct 11, 2022

DonJayamanne mentioned this issue Oct 26, 2022

Running notebook cells uses all available memory #146768

Closed

github-actions bot locked and limited conversation to collaborators Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to how stream outputs are handled in Notebooks #161023

Changes to how stream outputs are handled in Notebooks #161023

DonJayamanne commented Sep 16, 2022 •

edited

Loading

DonJayamanne commented Sep 16, 2022

jrieken commented Sep 20, 2022 •

edited

Loading

DonJayamanne commented Sep 20, 2022

Changes to how stream outputs are handled in Notebooks #161023

Changes to how stream outputs are handled in Notebooks #161023

Comments

DonJayamanne commented Sep 16, 2022 • edited Loading

Multiple Output Items per Output

Changes to the API

Current approach

Why do we need to merge/compress outputs

Proposed Solution

Solution that we tried and found to not work well

DonJayamanne commented Sep 16, 2022

jrieken commented Sep 20, 2022 • edited Loading

DonJayamanne commented Sep 20, 2022

DonJayamanne commented Sep 16, 2022 •

edited

Loading

jrieken commented Sep 20, 2022 •

edited

Loading