Skip to content
forked from pydata/xarray

Commit

Permalink
Add map_blocks docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Mar 19, 2020
1 parent d9029eb commit 6f69955
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 6 deletions.
5 changes: 3 additions & 2 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ Computation
Dataset.quantile
Dataset.differentiate
Dataset.integrate
Dataset.map_blocks

**Aggregation**:
:py:attr:`~Dataset.all`
Expand Down Expand Up @@ -350,6 +351,8 @@ Computation
DataArray.differentiate
DataArray.integrate
DataArray.str
DataArray.map_blocks


**Aggregation**:
:py:attr:`~DataArray.all`
Expand Down Expand Up @@ -507,7 +510,6 @@ Dataset methods
Dataset.load
Dataset.chunk
Dataset.unify_chunks
Dataset.map_blocks
Dataset.filter_by_attrs
Dataset.info

Expand Down Expand Up @@ -539,7 +541,6 @@ DataArray methods
DataArray.load
DataArray.chunk
DataArray.unify_chunks
DataArray.map_blocks

Coordinates objects
===================
Expand Down
106 changes: 102 additions & 4 deletions doc/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -274,12 +274,21 @@ loaded into Dask or not:

.. _dask.automatic-parallelization:

Automatic parallelization
-------------------------
Automatic parallelization with ``apply_ufunc`` and ``map_blocks``
-----------------------------------------------------------------

Almost all of xarray's built-in operations work on Dask arrays. If you want to
use a function that isn't wrapped by xarray, one option is to extract Dask
arrays from xarray objects (``.data``) and use Dask directly.
use a function that isn't wrapped by xarray, and have it applied in parallel on
each block of your xarray object, you have three options:

1. One option is to extract Dask arrays from xarray objects (``.data``) and use Dask directly.
2. Use :py:func:`~xarray.apply_ufunc` to apply functions that consume and return NumPy arrays.
3. Use :py:func:`~xarray.map_blocks`, :py:meth:`Dataset.map_blocks` or :py:meth:`DataArray.map_blocks`
to apply functions that consume and return xarray objects.


``apply_ufunc``
~~~~~~~~~~~~~~~

Another option is to use xarray's :py:func:`~xarray.apply_ufunc`, which can
automate `embarrassingly parallel
Expand Down Expand Up @@ -382,6 +391,95 @@ application.
structure of a problem, unlike the generic speedups offered by
``dask='parallelized'``.


``map_blocks``
~~~~~~~~~~~~~~

Functions that consume and return xarray objects can be easily applied in parallel using :py:func:`map_blocks`. Your function will receive an xarray Dataset or DataArray subset to one chunk
along each chunked dimension.

.. ipython:: python
ds.temperature
This DataArray has 3 chunks each with length 10 along the time dimension. A function applied with :py:func:`map_blocks` will receive a DataArray corresponding to a single block of shape 10x180x180
(time x latitude x longitude). The following snippet illustrates how to check the shape of the object
received by the applied function.

.. ipython:: python
def func(da):
print(da.sizes)
return da.time
mapped = xr.map_blocks(func, ds.temperature)
mapped
Notice that the :py:meth:`map_blocks` call printed
``Frozen({'time': 0, 'latitude': 0, 'longitude': 0})`` to screen.
``func`` is received 0-sized blocks! :py:meth:`map_blocks` needs to know what the final result
looks like in terms of dimensions, shapes etc. It does so by running the provided function on 0-shaped
inputs (*automated inference*). This works in many cases, but not all. If automatic inference does not
work for your function, provide the ``template`` kwarg (see below).

In this case, automatic inference has worked so let's check that the result is as expected.

.. ipython:: python
mapped.compute(scheduler="single-threaded")
mapped.identical(ds.time)
Note that we use ``.compute(scheduler="single-threaded")``.
This executes the Dask graph in `serial` using a for loop, but allows for printing to screen and other
debugging techniques. We can easily see that our function is receiving blocks of shape 10x180x180 and
the returned result is identical to ``ds.time`` as expected.


Here is a common example where automated inference will not work.

.. ipython:: python
:okexcept:
def func(da):
print(da.sizes)
return da.isel(time=[1])
mapped = xr.map_blocks(func, ds.temperature)
``func`` cannot be run on 0-shaped inputs because it is not possible to extract element 1 along a
dimension of size 0. In this case we need to tell :py:func:`map_blocks` what the returned result looks
like using the ``template`` kwarg. ``template`` must be an xarray Dataset or DataArray (depending on
what the function returns) with dimensions, shapes, chunk sizes, coordinate variables *and* data
variables that look exactly like the expected result. The variables should be dask-backed and hence not
incur much memory cost.

.. ipython:: python
template = ds.temperature.isel(time=[1, 11, 21])
mapped = xr.map_blocks(func, ds.temperature, template=template)
Notice that the 0-shaped sizes were not printed to screen. Since ``template`` has been provided
:py:func:`map_blocks` does not need to infer it by running ``func`` on 0-shaped inputs.

.. ipython:: python
mapped.identical(template)
:py:func:`map_blocks` also allows passing ``args`` and ``kwargs`` down to the user function ``func``.
``func`` will be executed as ``func(block_xarray, *args, **kwargs)`` so ``args`` must be a list and ``kwargs`` must be a dictionary.

.. ipython:: python
def func(obj, a, b=0):
return obj + a + b
mapped = ds.map_blocks(func, args=[10], kwargs={"b": 10})
expected = ds + 10 + 10
mapped.identical(expected)
Chunking and performance
------------------------

Expand Down

0 comments on commit 6f69955

Please sign in to comment.