User guide page for where reduction (#1172)

holoviz · Feb 2, 2023 · 481c85f · 481c85f
1 parent e815bb7
commit 481c85f
Show file tree

Hide file tree

Showing 3 changed files with 236 additions and 0 deletions.
diff --git a/doc/api.rst b/doc/api.rst
@@ -186,6 +186,7 @@ Definitions
 .. autoclass:: sum
 .. autoclass:: summary
 .. autoclass:: var
+.. autoclass:: where
 
 .. automodule:: datashader.transfer_functions
    :members:
diff --git a/doc/user_guide/index.rst b/doc/user_guide/index.rst
@@ -44,6 +44,9 @@ Contents:
 `11. Geography <Geography.html>`_
  Pointers to using Datashader for geographic and other spatial applications.
 
+`12. Inspection Reductions <Inspection_Reductions.html>`_
+ Using reduction to inspect rather than aggregate data.
+
 .. toctree::
     :hidden:
     :maxdepth: 3
@@ -59,3 +62,4 @@ Contents:
     Extending <Extending>
     Performance <Performance>
     Geography <Geography>
+    Inspection Reductions <Inspection_Reductions>
diff --git a/examples/user_guide/12_Inspection_Reductions.ipynb b/examples/user_guide/12_Inspection_Reductions.ipynb
@@ -0,0 +1,231 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each Datashader `canvas` function call accepts an `agg` argument which is a `Reduction` that is used to aggregate values in each pixel (histogram bin) to return to the user. Each `Reduction` is in one of two categories:\n",
+    "\n",
+    "1. Mathematical combination of data such as the `count` of data points per pixel or the `mean` of a column of the supplied dataset.\n",
+    "2. Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.\n",
+    "\n",
+    "This notebook explains how to use selection reductions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. `first` and `last` selection reductions\n",
+    "\n",
+    "The simplest selection reduction is the `first` reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.\n",
+    "\n",
+    "Firstly create a sample dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datashader as ds\n",
+    "import pandas as pd\n",
+    "\n",
+    "df = pd.DataFrame(dict(\n",
+    "    x     = [ 0,  0,  1,  1,  0,  0,  2,  2],\n",
+    "    y     = [ 0,  0,  0,  0,  1,  1,  1,  1],\n",
+    "    value = [ 9,  8,  7,  6,  2,  3,  4,  5],\n",
+    "    other = [11, 12, 13, 14, 15, 16, 17, 18],\n",
+    "))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are 8 rows in the dataset with columns for `x` and `y` coordinates as well as a `value` and an `other` column.\n",
+    "\n",
+    "Next create a Datashader `canvas` with a height of 2 pixels and a width of 3 pixels:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas = ds.Canvas(plot_height=2, plot_width=3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Two rows of the dataset map to each canvas pixel with the exception of pixels `[0, 2]` and `[1, 1]` which do not have any rows mapped to them.\n",
+    "\n",
+    "Now call `canvas.line` using a `first` reduction:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.first('value'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The returned `xarray.DataArray` is the same shape as the canvas and contains values taken from the `'value'` column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain `NaN` values.\n",
+    "\n",
+    "Here are the results using a `last` selection reduction:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.last('value'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. `max` and `min` selection reductions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A `max` selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.max('value'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The corresponding `min` selection reduction is:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.min('value'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. `where` selection reductions\n",
+    "\n",
+    "A `where` reduction takes two arguments, a `selector` reduction and a `lookup_column` name. The `selector` reduction, such as a `first` or `max`, selects which row of the dataset to return information about for each pixel. But the information returned is that from the `lookup_column` rather than the column used by the `selector`.\n",
+    "\n",
+    "Again this is best illustrated by an example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This returns, for each pixel, the value of the `'other'` column corresponding to the maximum of the `'value'` column of the data points that map to that pixel.\n",
+    "\n",
+    "Although it is possible to use a `first` or `last` as a `selector` with a `lookup_column`, such as\n",
+    "\n",
+    "```python\n",
+    "ds.where(ds.first('value'), 'other')\n",
+    "```\n",
+    "this is unnecessary as it is identical to the simpler\n",
+    "```python\n",
+    "ds.where(ds.first('other'))\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. `where` selection reductions returning a row index\n",
+    "\n",
+    "The `lookup_column` argument to `where` is optional. If not specified, `where` defaults to returning the index of the row in the dataset corresponding to the `selector` for each pixel."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.where(ds.max('value')))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.\n",
+    "\n",
+    "`first` and `last` can be used as `where` reduction `selector`s that return row indexes, for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "canvas.points(df, 'x', 'y', ds.where(ds.first('value')))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}