Skip to content

Commit

Permalink
Merge pull request #62 from ianthomas23/benchmark_framework
Browse files Browse the repository at this point in the history
Benchmarking framework using ASV and Playwright
  • Loading branch information
droumis authored Jul 26, 2023
2 parents b29301a + 21b55b9 commit fdce0a1
Show file tree
Hide file tree
Showing 7 changed files with 409 additions and 1 deletion.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -138,4 +138,7 @@ dmypy.json

.vscode

**/*.pt
**/*.pt

# Benchmarks
.asv/
55 changes: 55 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Benchmarking

`hvneuro` uses [Playwright for Python](https://playwright.dev/python/docs/intro) and [ASV](https://asv.readthedocs.io) for benchmarking. Playwright automates interaction with the web browser, ASV controls the benchmarking process so that it is statistically valid and repeatable.

## Installing ASV

Benchmarks must be run from a clone of the `hvneuro` github repo. ASV creates and uses isolated virtual environments for benchmarking, so the running of benchmarks needs to be performed from within a Python environment that has access to both `asv` and `virtualenv`. This could be a `conda`, `pyenv` or `venv` for example.

Example setup using `conda`:
```
conda create -n hvneuro_asv python=3.11
conda activate hvneuro_asv
conda install -c conda-forge asv virtualenv "nodejs>=18"
```

# Running benchmarks

To run all benchmarks:
```
cd benchmarks
asv run -e
```

The first time this is run it creates a machine file to store information about your machine. Then a virtual environment is created and each benchmark is run multiple times to obtain a statistically valid benchmark time.

The virtual environment contains `hvneuro` and its dependencies as defined in the top-level `pyproject.toml` file. It also contains `playwright`, the latest version of `chromium` as installed by `playwright`, and a particular branch of `bokeh` that contains extra code to record when the canvas is rendered. The latter is compiled by source and extra dependencies may be required for this to work on all test machines (to be determined).

The `-e` flag catches and displays stderr after the benchmark results. This should be free of errors but may contain some warnings.

# Viewing benchmark results

To list benchmark runs use
```
asv show
```

Initially this will just list the `hvneuro` commit that the benchmarks are run against. To display the benchmark timings for this commit use:
```
asv show <commit hash>
```
using enough of the commit hash to uniquely identify it.

ASV ships with its own simple webserver to interactively display the results in a webbrowser. To use this:
```
asv publish
asv preview
```
and then open a web browser at the URL specified.

## Configuration

ASV configuration information is stored in `benchmarks/asv.conf.json`. This includes a list of branches to benchmark. If you are using a feature branch and wish to benchmark the code in that branch rather than `main`, edit `asv.conf.json` to change the line:
```
"branches": ["main"],
```
196 changes: 196 additions & 0 deletions benchmarks/asv.conf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
{
// The version of the config file format. Do not change, unless
// you know what you are doing.
"version": 1,

// The name of the project being benchmarked
"project": "hvneuro",

// The project's homepage
"project_url": "https://github.com/holoviz-topics/neuro",

// The URL or local path of the source code repository for the
// project being benchmarked
"repo": "..",

// The Python project's subdirectory in your repo. If missing or
// the empty string, the project is assumed to be located at the root
// of the repository.
// "repo_subdir": "",

// Customizable commands for building, installing, and
// uninstalling the project. See asv.conf.json documentation.
//
// "install_command": ["in-dir={env_dir} python -mpip install {wheel_file}"],
// "uninstall_command": ["return-code=any python -mpip uninstall -y {project}"],
// "build_command": [
// "python setup.py build",
// "PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}"
// ],
"build_command": [
"python -m pip install --upgrade build pip",
"python -m build --wheel -o {build_cache_dir} {build_dir}"
],
"install_command": [
"in-dir={env_dir} python -mpip install {wheel_file}",
// Install bokeh from specific repo branch containing the console.log of render count
"python -m pip install bokeh git+https://github.com/bokeh/bokeh.git@ianthomas23/log_render_count#egg=bokeh",
// Install browsers for playwright
"playwright install chromium"
],

// List of branches to benchmark. If not provided, defaults to "master"
// (for git) or "default" (for mercurial).
"branches": ["benchmark_framework"],

// The DVCS being used. If not set, it will be automatically
// determined from "repo" by looking at the protocol in the URL
// (if remote), or by looking for special directories, such as
"dvcs": "git",

// The tool to use to create environments. May be "conda",
// "virtualenv" or other value depending on the plugins in use.
// If missing or the empty string, the tool will be automatically
// determined by looking for tools on the PATH environment
// variable.
"environment_type": "virtualenv",

// timeout in seconds for installing any dependencies in environment
// defaults to 10 min
//"install_timeout": 600,

// the base URL to show a commit for the project.
"show_commit_url": "https://github.com/holoviz-topics/neuro/commit/",

// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run `asv`.
// "pythons": ["2.7", "3.6"],

// The list of conda channel names to be searched for benchmark
// dependency packages in the specified order
//"conda_channels": ["conda-forge", "defaults"],

// A conda environment file that is used for environment creation.
// "conda_environment_file": "environment.yml",

// The matrix of dependencies to test. Each key of the "req"
// requirements dictionary is the name of a package (in PyPI) and
// the values are version numbers. An empty list or empty string
// indicates to just test against the default (latest)
// version. null indicates that the package is to not be
// installed. If the package to be tested is only available from
// PyPi, and the 'environment_type' is conda, then you can preface
// the package name by 'pip+', and the package will be installed
// via pip (with all the conda available packages installed first,
// followed by the pip installed packages).
//
// The ``@env`` and ``@env_nobuild`` keys contain the matrix of
// environment variables to pass to build and benchmark commands.
// An environment will be created for every combination of the
// cartesian product of the "@env" variables in this matrix.
// Variables in "@env_nobuild" will be passed to every environment
// during the benchmark phase, but will not trigger creation of
// new environments. A value of ``null`` means that the variable
// will not be set for the current combination.
//
// "matrix": {
// "req": {
// "numpy": ["1.6", "1.7"],
// "six": ["", null], // test with and without six installed
// "pip+emcee": [""] // emcee is only available for install with pip.
// },
// "env": {"ENV_VAR_1": ["val1", "val2"]},
// "env_nobuild": {"ENV_VAR_2": ["val3", null]},
// },
"matrix": {
"playwright": []
},

// Combinations of libraries/python versions can be excluded/included
// from the set to test. Each entry is a dictionary containing additional
// key-value pairs to include/exclude.
//
// An exclude entry excludes entries where all values match. The
// values are regexps that should match the whole string.
//
// An include entry adds an environment. Only the packages listed
// are installed. The 'python' key is required. The exclude rules
// do not apply to includes.
//
// In addition to package names, the following keys are available:
//
// - python
// Python version, as in the *pythons* variable above.
// - environment_type
// Environment type, as above.
// - sys_platform
// Platform, as in sys.platform. Possible values for the common
// cases: 'linux2', 'win32', 'cygwin', 'darwin'.
// - req
// Required packages
// - env
// Environment variables
// - env_nobuild
// Non-build environment variables
//
// "exclude": [
// {"python": "3.2", "sys_platform": "win32"}, // skip py3.2 on windows
// {"environment_type": "conda", "req": {"six": null}}, // don't run without six on conda
// {"env": {"ENV_VAR_1": "val2"}}, // skip val2 for ENV_VAR_1
// ],
//
// "include": [
// // additional env for python2.7
// {"python": "2.7", "req": {"numpy": "1.8"}, "env_nobuild": {"FOO": "123"}},
// // additional env if run on windows+conda
// {"platform": "win32", "environment_type": "conda", "python": "2.7", "req": {"libpython": ""}},
// ],

// The directory (relative to the current directory) that benchmarks are
// stored in. If not provided, defaults to "benchmarks"
"benchmark_dir": "benchmarks",

// The directory (relative to the current directory) to cache the Python
// environments in. If not provided, defaults to "env"
"env_dir": ".asv/env",

// The directory (relative to the current directory) that raw benchmark
// results are stored in. If not provided, defaults to "results".
"results_dir": ".asv/results",

// The directory (relative to the current directory) that the html tree
// should be written to. If not provided, defaults to "html".
"html_dir": ".asv/html",

// The number of characters to retain in the commit hashes.
// "hash_length": 8,

// `asv` will cache results of the recent builds in each
// environment, making them faster to install next time. This is
// the number of builds to keep, per environment.
// "build_cache_size": 2,

// The commits after which the regression search in `asv publish`
// should start looking for regressions. Dictionary whose keys are
// regexps matching to benchmark names, and values corresponding to
// the commit (exclusive) after which to start looking for
// regressions. The default is to start from the first commit
// with results. If the commit is `null`, regression detection is
// skipped for the matching benchmark.
//
// "regressions_first_commits": {
// "some_benchmark": "352cdf", // Consider regressions only after this commit
// "another_benchmark": null, // Skip regression detection altogether
// },

// The thresholds for relative change in results, after which `asv
// publish` starts reporting regressions. Dictionary of the same
// form as in ``regressions_first_commits``, with values
// indicating the thresholds. If multiple entries match, the
// maximum is taken. If no entry matches, the default is 5%.
//
// "regressions_thresholds": {
// "some_benchmark": 0.01, // Threshold of 1%
// "another_benchmark": 0.5, // Threshold of 50%
// },
}
1 change: 1 addition & 0 deletions benchmarks/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

75 changes: 75 additions & 0 deletions benchmarks/benchmarks/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
from __future__ import annotations

from typing import TYPE_CHECKING

from bokeh.models import Plot
from bokeh.server.server import Server
from playwright.sync_api import sync_playwright

if TYPE_CHECKING:
from typing import Callable

from bokeh.document import Document
from playwright.sync_api import ConsoleMessage


class Base:
def __init__(self, catch_console: bool = True):
self._catch_console = catch_console
self._port = 5006
self.render_count = -1
self._figure_id = None # Unique ID of the figure to grab the console messages of.

def _console_callback(self, msg: ConsoleMessage) -> None:
if self._figure_id is None or len(msg.args) != 4:
return

msg, figure_id, count, start_or_end = [arg.json_value() for arg in msg.args]

if msg == "PlotView._actual_paint" and figure_id == self._figure_id:
if start_or_end == "start":
# TODO: need to handle start of render if want to time a single render.
pass
elif start_or_end == "end":
self.render_count += 1
count = int(count)
if count != self.render_count:
raise RuntimeError(f"Mismatch in render count: {count} != {self.render_count}")

def playwright_setup(self, bokeh_doc: Callable[[Document], None]) -> None:
# Playwright context manager needs to span multiple functions,
# so manually call __enter__ and __exit__ methods.
self._playwright_context_manager = sync_playwright()
playwright = self._playwright_context_manager.__enter__()

self._server = Server({'/': bokeh_doc}, port=self._port)
self._server.start()

self._browser = playwright.chromium.launch(headless=True)

self.page = self._browser.new_page()
self.page.goto(f"http://localhost:{self._port}/")

# Assume Bokeh document contains a single figure, and obtain its ID.
sessions = self._server.get_sessions()
if len(sessions) != 1:
raise RuntimeError(f"Expected a single session but have {len(sessions)}")
doc = sessions[0].document
# This raises an error if there is more than one figure in the Bokeh document.
self._figure_id = doc.select_one(dict(type=Plot)).id

if self._catch_console:
self.page.on("console", self._console_callback)

def playwright_teardown(self):
self._figure_id = None
if self._catch_console:
self.page.remove_listener("console", self._console_callback)
self.render_count = -1
# Wait a few milliseconds for emitted console messages to be handled before closing
# browser. May need to increase this if Playwright complains that browser is closed.
self.page.wait_for_timeout(10)

self._browser.close()
self._server.stop()
self._playwright_context_manager.__exit__(None, None, None)
55 changes: 55 additions & 0 deletions benchmarks/benchmarks/timeseries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from __future__ import annotations

from functools import partial
from typing import TYPE_CHECKING

from bokeh.models import Button, ColumnDataSource
from bokeh.plotting import column, figure
import numpy as np

from .base import Base

if TYPE_CHECKING:
from bokeh.document import Document


def bkapp(doc: Document, n: int, output_backend: str):
cds = ColumnDataSource(data=dict(x=[], y=[]))

p = figure(width=600, height=400, output_backend=output_backend)
p.line(source=cds, x="x", y="y")

# Prepare data but do not send it to browser yet.
x = np.arange(n)
y = np.random.default_rng(8343).uniform(size=n)

def python_callback(event):
# Benchmark times the sending and rendering of this data.
cds.data = dict(x=x, y=y)

button = Button(label="run")
button.on_click(python_callback)

doc.add_root(column(p, button))


class Timeseries(Base):
params: tuple[list[int], list[str]] = (
[1_000, 10_000, 100_000, 1_000_000, 10_000_000],
["canvas", "webgl"],
)
param_names: tuple[str] = ("n", "output_backend")

def setup(self, n: int, output_backend: str) -> None:
bkapp_n = partial(bkapp, n=n, output_backend=output_backend)
self.playwright_setup(bkapp_n)

def teardown(self, n: int, output_backend: str) -> None:
self.playwright_teardown()

def time_values(self, n: int, output_backend: str) -> None:
button = self.page.get_by_role("button", name="run")
start_render_count = self.render_count
button.click()
while self.render_count == start_render_count:
self.page.wait_for_timeout(1)
Loading

0 comments on commit fdce0a1

Please sign in to comment.