Large timeseries #1205

hoxbro · 2023-11-22T12:44:08Z

This adds a notebook that explains the different ways of working with large time-series datasets with holoviz

maximlt

While this guide is useful in its own, I believe it needs to be significantly re-worked before being integrated in the docs. It feels to me it's been written as a standalone guide. Actually, I believe it could be pretty easily turned into a nice blog post!

The guide should be more integrated into the docs, probably moving some of its content to the Time Series Data guide.
The guide focuses too much on Bokeh, hvPlot supports Matplotlib and Plotly too.
The guide could link more to resources in HoloViews and Datashader.

More specifically:

The old way: Bokeh's custom Canvas rendering and WebGL: new baseline for timeseries plotting: I'm not sure we should mention how things used to be? Ideally, we'd have a guide specific to Bokeh like Plotting with Bokeh in HoloViews' docs, that mentions WebGL.
Datashader rasterizing: I just realized explaining Datashader is difficult, I don't know how many notebook users will understand this sentence: Datashader works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points. We're also defining Datashader in multiple places in hvPlot's docs. Ideally again, we could have a "Large data" guide that would be the only place where we would define and explain Datashader, and link to it from other places. I also find the guide isn't extended enough on anti-aliasing, I bet most users aren't familiar with it and need more explanation.
Minimap: What should be the main reference place to introduce the RangeToolLink in hvPlot's docs? Currently it's only used in the OHLC guide. Should it be in the yet-to-me-made Plotting with Bokeh guide? It seems to be the minimap approach could already be introduced in the Timeseries Data guide.

cc @droumis @jbednar

jbednar · 2023-12-01T02:59:49Z

Reviewing the published page https://holoviz-dev.github.io/hvplot/user_guide/visualizing_large_timeseries.html rather than the source code:

visualizing large timeseries

Title of notebook needs to be capitalized to match others and have a reasonable title. hvPlot is inherently a plotting library, so "visualizing" seems redundant. Just "Large_Timeseries", maybe?

Datashader rasterizing

When pages are built for our website, it uses the default Datashader image size. The default image size is intentionally set to a low value to avoid generating a large image that is then thrown away in interactive usage, updating to the actual display resolution via a RangeXY callback. Here, because the callback is never invoked, the image is rendered at a very low resolution, which looks bad on the website. I think the images can be improved by including a cell like this early in the notebook:

from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

[The example above needs rasterize, plus instant inspection. Also needs to illustrate what happens when very large numbers of traces overlap.]

Presumably this note should be omitted, and an issue opened instead.

Minimap

The example shows no data by default; presumably we should put some sort of initial range in there that causes data to display when exported to HTML?

The instructions also don't seem to match the plot; there's no grey box visible, and once you pan to find one, it's not a small rectangle but something larger than the plot, which doesn't seem right. Plus panning and zooming in the bottom plot make it very easy to get lost; I would think that the minimap should not have any y axis panning and no zooming, just x panning. It should be hard to shoot yourself in the foot or get lost.

While this guide is useful in its own, I believe it needs to be significantly re-worked before being integrated in the docs. It feels to me it's been written as a standalone guide. Actually, I believe it could be pretty easily turned into a nice blog post!

Yes, this was a standalone guide, and we decided that hvplot was where it should end up. I agree it will make a nice blog post when we are done, but it should also have a permanent home in our docs so that people can figure out the best way to deal with their large timeseries data.

The guide should be more integrated into the docs, probably moving some of its content to the Time Series Data guide.

I could be convinced otherwise, but my first guess would be that the Time Series guide should lose its LTTB section and instead it should have a section at the end suggesting that people look at this separate guide if they have large timeseries or want to look at many of them together. It's a lot of content already and I don't think it's relevant to people with small timeseries.

The guide focuses too much on Bokeh, hvPlot supports Matplotlib and Plotly too.

That's a general tension in the hvPlot docs that I believe remains unresolved -- how do we show how the backends differ, as well as how the various data sources differ? I don't think this one is particularly different in that respect, but if it is, it can have an explicit statement that these examples focus on Bokeh but in some cases similar functionality is available for the other backends.

The old way: Bokeh's custom Canvas rendering and WebGL: new baseline for timeseries plotting: I'm not sure we should mention how things used to be? Ideally, we'd have a guide specific to Bokeh like Plotting with Bokeh in HoloViews' docs, that mentions WebGL.

Maybe don't say it's the old way, then, but just mention that it's an option and that it's not recommended any more.

Datashader rasterizing: I just realized explaining Datashader is difficult, I don't know how many notebook users will understand this sentence: Datashader works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points.

I probably wrote that; any suggestions on how to make it clearer?

We're also defining Datashader in multiple places in hvPlot's docs. Ideally again, we could have a "Large data" guide that would be the only place where we would define and explain Datashader, and link to it from other places.

Sounds good. I hear you volunteering to write that! :-)

I also find the guide isn't extended enough on anti-aliasing, I bet most users aren't familiar with it and need more explanation.

I think we can put in a link that explains it.

Minimap: What should be the main reference place to introduce the RangeToolLink in hvPlot's docs?

Good question! It seems to me that the minimap is primarily useful for large timeseries, and so to me it belongs here, in the large timeseries notebook.

@hoxbro, can you link to the issues that detail the remaining warts and areas for improvement in this notebook? I think you mentioned that they existed but I don't see how to get to them from here.

droumis · 2023-12-12T00:06:13Z

I'm starting to address the points raised above. I've collected the tasks in a board

droumis · 2023-12-13T20:35:32Z

I've added it, along with an explainer admonition, but it's a bit awkward to add the following to the notebook. Ideally, we could either run this in the CI workflow somehow or use a hidden cell (not sure how).

from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

maximlt · 2023-12-21T10:52:43Z

@droumis I made a couple of small changes to attempt to hide the cell we were talking about the other day (the one setting the resampling dimensions). This is usually supported by MyST-NB by adding the hide-cell tag to the cell and I was happy to see that nbsite doesn't affect that. I changed the config on the clean_notebook hook to ignore the tags key in the cell metadata. The cell is correctly hidden on the site :)

I added a small comment to make it clear there's something special with this cell, it's not so obvious otherwise when you work from JupyterLab/Notebook.

I'm planning to release hvPlot 0.9.1 today, how do you feel about this PR? It seems to me it's in a much better state and it could go as in. I'm even fine having some sections marked as WIP. Up to you to tell me what you think, there's no emergency to merge this either, at least from my side.

droumis · 2023-12-21T13:48:30Z

re: hidden cell, that's great to see, @maximlt! I think that will help us in several other places across holoviz docs.

re: merging now, let's wait for the Bokeh 3.4 and the next HoloViews release, as this notebook requires Bokeh #13603, and benefits from HoloViews #6030. I'm also actively working on the things marked WIP. I'd also really like to see auto-ranging for multiple lines fixed before this is released, which @jlstevens will hopefully have time to address early Jan.

droumis · 2023-12-28T19:23:53Z

I processed some real spike waveform data to create a new datashader section on plotting many lines per multiple categories. As far as I could figure out, until we resolve the relevant data format issues in HoloViews, the simplest way for hvPlot is to add NaN separators to a dataframe, so I've done that step prior to upload the data and just explained it in the notebook. It's up on the dev website.

Unless there are any further comments, I think we are just waiting on Bokeh 3.4 and the next HoloViews release to merge this PR. If autoranging, ds inspections, or this nan issue gets resolved before then - great, but those can also be follow-ups.

jbednar · 2024-01-07T00:45:55Z

@droumis , that new plot looks great! So nice to see that after years of just imagining it. :-)

Are the new issues you found when doing that now part of https://github.com/orgs/holoviz/projects/14/views/2 ? If not please add them there. We've come up with a nice, comprehensive set of issues to address, now we just need to address them!

maximlt · 2024-01-23T16:48:20Z

Need to document hvPlot can take advantage of tsdownsample when it's installed (Use tsdownsample library for downsampling if available holoviews#6059)

droumis · 2024-04-08T18:29:10Z

superseded by #1302

hoxbro added 2 commits November 22, 2023 11:58

First addition of large_timeseries notebook

c907234

Update datapath and add to index.rst

f406eb6

maximlt requested changes Nov 28, 2023

View reviewed changes

droumis added 2 commits December 11, 2023 17:36

rename to Large Timeseries

153f316

add datashader resample resolution defaults

2c0cacc

droumis added 6 commits December 12, 2023 13:55

update index for new file name

8fae784

justify focus on Bokeh and webgl

57e4f55

update LTTB section

f63beb4

remove autorange y for multi sensor lttb

439d077

Update datashader and minimap sections

70fc75a

sample df

75428ce

droumis and others added 5 commits December 13, 2023 15:02

improve lttb comparison and fix minimap opts

f2ae8f9

alpha lttb and remove colorbar from ds

d038680

clarify LTTB, DS, minimap text

a8da1a6

bump clean_notebook hook

0340da1

hide the cell setting the resampling dims

33adb50

droumis mentioned this pull request Dec 21, 2023

Move ResampleOperation2D settings in docs notebooks to hidden cell holoviz/holoviews#6044

Open

maximlt marked this pull request as draft December 21, 2023 21:34

droumis added 2 commits December 28, 2023 13:15

add many ds lines example

84d1cc2

fix typos

18ad934

droumis added 3 commits December 28, 2023 14:35

link from ts to large ts user guide

7c0ca85

revise datashader section

473d9f3

clarify Bokeh Canvas rendering

6fbcf39

droumis marked this pull request as ready for review December 29, 2023 15:29

maximlt marked this pull request as draft January 28, 2024 09:42

droumis mentioned this pull request Apr 8, 2024

Add user guide for working with large time-series datasets #1302

Merged

droumis closed this Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large timeseries #1205

Large timeseries #1205

hoxbro commented Nov 22, 2023

maximlt left a comment

jbednar commented Dec 1, 2023

droumis commented Dec 12, 2023

droumis commented Dec 13, 2023 •

edited

Loading

maximlt commented Dec 21, 2023

droumis commented Dec 21, 2023

droumis commented Dec 28, 2023 •

edited

Loading

jbednar commented Jan 7, 2024

maximlt commented Jan 23, 2024

droumis commented Apr 8, 2024

Large timeseries #1205

Large timeseries #1205

Conversation

hoxbro commented Nov 22, 2023

maximlt left a comment

Choose a reason for hiding this comment

jbednar commented Dec 1, 2023

droumis commented Dec 12, 2023

droumis commented Dec 13, 2023 • edited Loading

maximlt commented Dec 21, 2023

droumis commented Dec 21, 2023

droumis commented Dec 28, 2023 • edited Loading

jbednar commented Jan 7, 2024

maximlt commented Jan 23, 2024

droumis commented Apr 8, 2024

droumis commented Dec 13, 2023 •

edited

Loading

droumis commented Dec 28, 2023 •

edited

Loading