Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large timeseries #1205

Closed
wants to merge 20 commits into from
Closed

Large timeseries #1205

wants to merge 20 commits into from

Conversation

hoxbro
Copy link
Member

@hoxbro hoxbro commented Nov 22, 2023

This adds a notebook that explains the different ways of working with large time-series datasets with holoviz

Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this guide is useful in its own, I believe it needs to be significantly re-worked before being integrated in the docs. It feels to me it's been written as a standalone guide. Actually, I believe it could be pretty easily turned into a nice blog post!

  • The guide should be more integrated into the docs, probably moving some of its content to the Time Series Data guide.
  • The guide focuses too much on Bokeh, hvPlot supports Matplotlib and Plotly too.
  • The guide could link more to resources in HoloViews and Datashader.

More specifically:

  • The old way: Bokeh's custom Canvas rendering and WebGL: new baseline for timeseries plotting: I'm not sure we should mention how things used to be? Ideally, we'd have a guide specific to Bokeh like Plotting with Bokeh in HoloViews' docs, that mentions WebGL.
  • Datashader rasterizing: I just realized explaining Datashader is difficult, I don't know how many notebook users will understand this sentence: Datashader works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points. We're also defining Datashader in multiple places in hvPlot's docs. Ideally again, we could have a "Large data" guide that would be the only place where we would define and explain Datashader, and link to it from other places. I also find the guide isn't extended enough on anti-aliasing, I bet most users aren't familiar with it and need more explanation.
  • Minimap: What should be the main reference place to introduce the RangeToolLink in hvPlot's docs? Currently it's only used in the OHLC guide. Should it be in the yet-to-me-made Plotting with Bokeh guide? It seems to be the minimap approach could already be introduced in the Timeseries Data guide.

cc @droumis @jbednar

@jbednar
Copy link
Member

jbednar commented Dec 1, 2023

Reviewing the published page https://holoviz-dev.github.io/hvplot/user_guide/visualizing_large_timeseries.html rather than the source code:

visualizing large timeseries

Title of notebook needs to be capitalized to match others and have a reasonable title. hvPlot is inherently a plotting library, so "visualizing" seems redundant. Just "Large_Timeseries", maybe?

  1. Datashader rasterizing

When pages are built for our website, it uses the default Datashader image size. The default image size is intentionally set to a low value to avoid generating a large image that is then thrown away in interactive usage, updating to the actual display resolution via a RangeXY callback. Here, because the callback is never invoked, the image is rendered at a very low resolution, which looks bad on the website. I think the images can be improved by including a cell like this early in the notebook:

from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

[The example above needs rasterize, plus instant inspection. Also needs to illustrate what happens when very large numbers of traces overlap.]

Presumably this note should be omitted, and an issue opened instead.

  1. Minimap

The example shows no data by default; presumably we should put some sort of initial range in there that causes data to display when exported to HTML?

The instructions also don't seem to match the plot; there's no grey box visible, and once you pan to find one, it's not a small rectangle but something larger than the plot, which doesn't seem right. Plus panning and zooming in the bottom plot make it very easy to get lost; I would think that the minimap should not have any y axis panning and no zooming, just x panning. It should be hard to shoot yourself in the foot or get lost.

While this guide is useful in its own, I believe it needs to be significantly re-worked before being integrated in the docs. It feels to me it's been written as a standalone guide. Actually, I believe it could be pretty easily turned into a nice blog post!

Yes, this was a standalone guide, and we decided that hvplot was where it should end up. I agree it will make a nice blog post when we are done, but it should also have a permanent home in our docs so that people can figure out the best way to deal with their large timeseries data.

The guide should be more integrated into the docs, probably moving some of its content to the Time Series Data guide.

I could be convinced otherwise, but my first guess would be that the Time Series guide should lose its LTTB section and instead it should have a section at the end suggesting that people look at this separate guide if they have large timeseries or want to look at many of them together. It's a lot of content already and I don't think it's relevant to people with small timeseries.

The guide focuses too much on Bokeh, hvPlot supports Matplotlib and Plotly too.

That's a general tension in the hvPlot docs that I believe remains unresolved -- how do we show how the backends differ, as well as how the various data sources differ?  I don't think this one is particularly different in that respect, but if it is, it can have an explicit statement that these examples focus on Bokeh but in some cases similar functionality is available for the other backends.

The old way: Bokeh's custom Canvas rendering and WebGL: new baseline for timeseries plotting: I'm not sure we should mention how things used to be? Ideally, we'd have a guide specific to Bokeh like Plotting with Bokeh in HoloViews' docs, that mentions WebGL.

Maybe don't say it's the old way, then, but just mention that it's an option and that it's not recommended any more.

Datashader rasterizing: I just realized explaining Datashader is difficult, I don't know how many notebook users will understand this sentence: Datashader works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points. 

I probably wrote that; any suggestions on how to make it clearer?

We're also defining Datashader in multiple places in hvPlot's docs. Ideally again, we could have a "Large data" guide that would be the only place where we would define and explain Datashader, and link to it from other places.

Sounds good. I hear you volunteering to write that! :-)

I also find the guide isn't extended enough on anti-aliasing, I bet most users aren't familiar with it and need more explanation.

I think we can put in a link that explains it.

Minimap: What should be the main reference place to introduce the RangeToolLink in hvPlot's docs? 

Good question! It seems to me that the minimap is primarily useful for large timeseries, and so to me it belongs here, in the large timeseries notebook.

@hoxbro, can you link to the issues that detail the remaining warts and areas for improvement in this notebook? I think you mentioned that they existed but I don't see how to get to them from here.

@droumis
Copy link
Member

droumis commented Dec 12, 2023

I'm starting to address the points raised above. I've collected the tasks in a board

@droumis
Copy link
Member

droumis commented Dec 13, 2023

I've added it, along with an explainer admonition, but it's a bit awkward to add the following to the notebook. Ideally, we could either run this in the CI workflow somehow or use a hidden cell (not sure how).

from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

@maximlt
Copy link
Member

maximlt commented Dec 21, 2023

@droumis I made a couple of small changes to attempt to hide the cell we were talking about the other day (the one setting the resampling dimensions). This is usually supported by MyST-NB by adding the hide-cell tag to the cell and I was happy to see that nbsite doesn't affect that. I changed the config on the clean_notebook hook to ignore the tags key in the cell metadata. The cell is correctly hidden on the site :)

image

I added a small comment to make it clear there's something special with this cell, it's not so obvious otherwise when you work from JupyterLab/Notebook.


I'm planning to release hvPlot 0.9.1 today, how do you feel about this PR? It seems to me it's in a much better state and it could go as in. I'm even fine having some sections marked as WIP. Up to you to tell me what you think, there's no emergency to merge this either, at least from my side.

@droumis
Copy link
Member

droumis commented Dec 21, 2023

re: hidden cell, that's great to see, @maximlt! I think that will help us in several other places across holoviz docs.

re: merging now, let's wait for the Bokeh 3.4 and the next HoloViews release, as this notebook requires Bokeh #13603, and benefits from HoloViews #6030. I'm also actively working on the things marked WIP. I'd also really like to see auto-ranging for multiple lines fixed before this is released, which @jlstevens will hopefully have time to address early Jan.

@maximlt maximlt marked this pull request as draft December 21, 2023 21:34
@droumis
Copy link
Member

droumis commented Dec 28, 2023

I processed some real spike waveform data to create a new datashader section on plotting many lines per multiple categories. As far as I could figure out, until we resolve the relevant data format issues in HoloViews, the simplest way for hvPlot is to add NaN separators to a dataframe, so I've done that step prior to upload the data and just explained it in the notebook. It's up on the dev website.

image

Unless there are any further comments, I think we are just waiting on Bokeh 3.4 and the next HoloViews release to merge this PR. If autoranging, ds inspections, or this nan issue gets resolved before then - great, but those can also be follow-ups.

@droumis droumis marked this pull request as ready for review December 29, 2023 15:29
@jbednar
Copy link
Member

jbednar commented Jan 7, 2024

@droumis , that new plot looks great! So nice to see that after years of just imagining it. :-)

Are the new issues you found when doing that now part of https://github.com/orgs/holoviz/projects/14/views/2 ? If not please add them there. We've come up with a nice, comprehensive set of issues to address, now we just need to address them!

@maximlt
Copy link
Member

maximlt commented Jan 23, 2024

@droumis
Copy link
Member

droumis commented Apr 8, 2024

superseded by #1302

@droumis droumis closed this Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants