You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
benchmark: Benchmark speed of initial display and interaction (zoom, pan)
Key Benchmarking Metrics:
Latency to initial display (of anything useful)
Latency for interaction updates (primarily pan and zoom)
How much zoom/pan should we test?
I think this depends on the modality/workflow, and should mimic a reasonable action by the user. For instance, let's say we start by rendering a typical display frame for EEG of 20 seconds (of a total 1 hour recording)... We could test zooming out to half the total duration (let's say 30 minutes), and then test panning from the first to the second half of the dataset (the second 30 minutes).
Memory
CPU
Test Scenarios:
Test scenarios are the workflow notebooks (or dev versions of them)
Benchmarking Dimensions/Parameters:
Tweaks to the workflow code
This includes any code that lives outside the workflow notebooks, but is specifically created for them, such as anything that would go into the hvneuro module.
Each benchmarking run would ideally be labeled with a commit hash of the code it tested
Backend/approach employed
For example, WebGL, Datashader, LTTB, caching
HoloViz/Bokeh versions
Let's just start with a single (latest) version of each package and then when things are further along, we can look at expanding the test matrix. I'm mostly thinking about the situation where we would want to note when a new Bokeh release happened which may impact benchmarking results.
Dataset size
For each modality, we should have at least a lower, mid, and upper dataset size tested.
Use of CPU and/or GPU for computing
This is highly dependent on the approach, but probably impactful on the metrics enough that they should be distinguished.
Ideally, we would only rely on the CPU for computing, but it's not a hard requirement at this time.
Environment
Jupyter Lab vs VS Code
Other thoughts:
We want benchmarking set up in such a way that we can trigger it somewhat automatically in the future.
Maybe this means incorporating it into the CI?
We want to specify some threshold of diminishing returns; for example, at what point is the interaction latency good enough that any further improvements provide little additional value to the user
We want to generate a report with these benchmarking results such that we can note certain approaches or improvements made over time and how they impacted the benchmarking scores
If we can't achieve something reasonable for latency to initial display of something useful for the largest datasets, I could imagine that we take an approach where the initial render is slow (and we provide info, apologies, and a loading indicator) but subsequent loading and interaction are very fast.
Software for Benchmarking:
Bokeh custom JavaScript callbacks that capture the time before and after user interaction events
Playwright
airspeed velocity (asv)
Benchmark comparisons
fastplotlib
napari
The text was updated successfully, but these errors were encountered:
Summary and Links
benchmark
: Benchmark speed of initial display and interaction (zoom, pan)Key Benchmarking Metrics:
Test Scenarios:
Benchmarking Dimensions/Parameters:
hvneuro
module.Other thoughts:
Software for Benchmarking:
Benchmark comparisons
The text was updated successfully, but these errors were encountered: