Skip to content

Commit

Permalink
User guide page for where reduction (#1172)
Browse files Browse the repository at this point in the history
  • Loading branch information
ianthomas23 authored Feb 2, 2023
1 parent e815bb7 commit 481c85f
Show file tree
Hide file tree
Showing 3 changed files with 236 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ Definitions
.. autoclass:: sum
.. autoclass:: summary
.. autoclass:: var
.. autoclass:: where

.. automodule:: datashader.transfer_functions
:members:
4 changes: 4 additions & 0 deletions doc/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ Contents:
`11. Geography <Geography.html>`_
Pointers to using Datashader for geographic and other spatial applications.

`12. Inspection Reductions <Inspection_Reductions.html>`_
Using reduction to inspect rather than aggregate data.

.. toctree::
:hidden:
:maxdepth: 3
Expand All @@ -59,3 +62,4 @@ Contents:
Extending <Extending>
Performance <Performance>
Geography <Geography>
Inspection Reductions <Inspection_Reductions>
231 changes: 231 additions & 0 deletions examples/user_guide/12_Inspection_Reductions.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each Datashader `canvas` function call accepts an `agg` argument which is a `Reduction` that is used to aggregate values in each pixel (histogram bin) to return to the user. Each `Reduction` is in one of two categories:\n",
"\n",
"1. Mathematical combination of data such as the `count` of data points per pixel or the `mean` of a column of the supplied dataset.\n",
"2. Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.\n",
"\n",
"This notebook explains how to use selection reductions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. `first` and `last` selection reductions\n",
"\n",
"The simplest selection reduction is the `first` reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.\n",
"\n",
"Firstly create a sample dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import datashader as ds\n",
"import pandas as pd\n",
"\n",
"df = pd.DataFrame(dict(\n",
" x = [ 0, 0, 1, 1, 0, 0, 2, 2],\n",
" y = [ 0, 0, 0, 0, 1, 1, 1, 1],\n",
" value = [ 9, 8, 7, 6, 2, 3, 4, 5],\n",
" other = [11, 12, 13, 14, 15, 16, 17, 18],\n",
"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are 8 rows in the dataset with columns for `x` and `y` coordinates as well as a `value` and an `other` column.\n",
"\n",
"Next create a Datashader `canvas` with a height of 2 pixels and a width of 3 pixels:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas = ds.Canvas(plot_height=2, plot_width=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two rows of the dataset map to each canvas pixel with the exception of pixels `[0, 2]` and `[1, 1]` which do not have any rows mapped to them.\n",
"\n",
"Now call `canvas.line` using a `first` reduction:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.first('value'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The returned `xarray.DataArray` is the same shape as the canvas and contains values taken from the `'value'` column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain `NaN` values.\n",
"\n",
"Here are the results using a `last` selection reduction:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.last('value'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. `max` and `min` selection reductions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `max` selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.max('value'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The corresponding `min` selection reduction is:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.min('value'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. `where` selection reductions\n",
"\n",
"A `where` reduction takes two arguments, a `selector` reduction and a `lookup_column` name. The `selector` reduction, such as a `first` or `max`, selects which row of the dataset to return information about for each pixel. But the information returned is that from the `lookup_column` rather than the column used by the `selector`.\n",
"\n",
"Again this is best illustrated by an example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This returns, for each pixel, the value of the `'other'` column corresponding to the maximum of the `'value'` column of the data points that map to that pixel.\n",
"\n",
"Although it is possible to use a `first` or `last` as a `selector` with a `lookup_column`, such as\n",
"\n",
"```python\n",
"ds.where(ds.first('value'), 'other')\n",
"```\n",
"this is unnecessary as it is identical to the simpler\n",
"```python\n",
"ds.where(ds.first('other'))\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. `where` selection reductions returning a row index\n",
"\n",
"The `lookup_column` argument to `where` is optional. If not specified, `where` defaults to returning the index of the row in the dataset corresponding to the `selector` for each pixel."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.where(ds.max('value')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.\n",
"\n",
"`first` and `last` can be used as `where` reduction `selector`s that return row indexes, for example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"canvas.points(df, 'x', 'y', ds.where(ds.first('value')))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 481c85f

Please sign in to comment.