From 481c85f8df70e252b2f2d91074be066217d9583b Mon Sep 17 00:00:00 2001 From: Ian Thomas Date: Thu, 2 Feb 2023 11:38:28 +0000 Subject: [PATCH] User guide page for where reduction (#1172) --- doc/api.rst | 1 + doc/user_guide/index.rst | 4 + .../user_guide/12_Inspection_Reductions.ipynb | 231 ++++++++++++++++++ 3 files changed, 236 insertions(+) create mode 100644 examples/user_guide/12_Inspection_Reductions.ipynb diff --git a/doc/api.rst b/doc/api.rst index 2f9545462..59a048789 100644 --- a/doc/api.rst +++ b/doc/api.rst @@ -186,6 +186,7 @@ Definitions .. autoclass:: sum .. autoclass:: summary .. autoclass:: var +.. autoclass:: where .. automodule:: datashader.transfer_functions :members: diff --git a/doc/user_guide/index.rst b/doc/user_guide/index.rst index 7b7b76348..9172920a0 100644 --- a/doc/user_guide/index.rst +++ b/doc/user_guide/index.rst @@ -44,6 +44,9 @@ Contents: `11. Geography `_ Pointers to using Datashader for geographic and other spatial applications. +`12. Inspection Reductions `_ + Using reduction to inspect rather than aggregate data. + .. toctree:: :hidden: :maxdepth: 3 @@ -59,3 +62,4 @@ Contents: Extending Performance Geography + Inspection Reductions diff --git a/examples/user_guide/12_Inspection_Reductions.ipynb b/examples/user_guide/12_Inspection_Reductions.ipynb new file mode 100644 index 000000000..8d7bcc48d --- /dev/null +++ b/examples/user_guide/12_Inspection_Reductions.ipynb @@ -0,0 +1,231 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each Datashader `canvas` function call accepts an `agg` argument which is a `Reduction` that is used to aggregate values in each pixel (histogram bin) to return to the user. Each `Reduction` is in one of two categories:\n", + "\n", + "1. Mathematical combination of data such as the `count` of data points per pixel or the `mean` of a column of the supplied dataset.\n", + "2. Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.\n", + "\n", + "This notebook explains how to use selection reductions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. `first` and `last` selection reductions\n", + "\n", + "The simplest selection reduction is the `first` reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.\n", + "\n", + "Firstly create a sample dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import datashader as ds\n", + "import pandas as pd\n", + "\n", + "df = pd.DataFrame(dict(\n", + " x = [ 0, 0, 1, 1, 0, 0, 2, 2],\n", + " y = [ 0, 0, 0, 0, 1, 1, 1, 1],\n", + " value = [ 9, 8, 7, 6, 2, 3, 4, 5],\n", + " other = [11, 12, 13, 14, 15, 16, 17, 18],\n", + "))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are 8 rows in the dataset with columns for `x` and `y` coordinates as well as a `value` and an `other` column.\n", + "\n", + "Next create a Datashader `canvas` with a height of 2 pixels and a width of 3 pixels:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas = ds.Canvas(plot_height=2, plot_width=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Two rows of the dataset map to each canvas pixel with the exception of pixels `[0, 2]` and `[1, 1]` which do not have any rows mapped to them.\n", + "\n", + "Now call `canvas.line` using a `first` reduction:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.first('value'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The returned `xarray.DataArray` is the same shape as the canvas and contains values taken from the `'value'` column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain `NaN` values.\n", + "\n", + "Here are the results using a `last` selection reduction:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.last('value'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. `max` and `min` selection reductions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A `max` selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.max('value'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The corresponding `min` selection reduction is:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.min('value'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. `where` selection reductions\n", + "\n", + "A `where` reduction takes two arguments, a `selector` reduction and a `lookup_column` name. The `selector` reduction, such as a `first` or `max`, selects which row of the dataset to return information about for each pixel. But the information returned is that from the `lookup_column` rather than the column used by the `selector`.\n", + "\n", + "Again this is best illustrated by an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This returns, for each pixel, the value of the `'other'` column corresponding to the maximum of the `'value'` column of the data points that map to that pixel.\n", + "\n", + "Although it is possible to use a `first` or `last` as a `selector` with a `lookup_column`, such as\n", + "\n", + "```python\n", + "ds.where(ds.first('value'), 'other')\n", + "```\n", + "this is unnecessary as it is identical to the simpler\n", + "```python\n", + "ds.where(ds.first('other'))\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. `where` selection reductions returning a row index\n", + "\n", + "The `lookup_column` argument to `where` is optional. If not specified, `where` defaults to returning the index of the row in the dataset corresponding to the `selector` for each pixel." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.where(ds.max('value')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.\n", + "\n", + "`first` and `last` can be used as `where` reduction `selector`s that return row indexes, for example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "canvas.points(df, 'x', 'y', ds.where(ds.first('value')))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}