-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose apply_ufunc as public API and add documentation #1619
Changes from 1 commit
a15a932
04cdcda
88aa68f
224f917
63a98aa
c6f4371
03754f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ Top-level functions | |
.. autosummary:: | ||
:toctree: generated/ | ||
|
||
apply_ufunc | ||
align | ||
broadcast | ||
concat | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -303,15 +303,21 @@ Datasets support most of the same methods found on data arrays: | |
ds.mean(dim='x') | ||
abs(ds) | ||
|
||
Unfortunately, a limitation of the current version of numpy means that we | ||
cannot override ufuncs for datasets, because datasets cannot be written as | ||
a single array [1]_. :py:meth:`~xarray.Dataset.apply` works around this | ||
Unfortunately, we current do not support NUmPy ufuncs for datasets [1]_. | ||
:py:meth:`~xarray.Dataset.apply` works around this | ||
limitation, by applying the given function to each variable in the dataset: | ||
|
||
.. ipython:: python | ||
|
||
ds.apply(np.sin) | ||
|
||
You can also use the wrapped functions in the ``xarray.ufuncs`` module: | ||
|
||
.. ipython:: python | ||
|
||
import xarray.ufuncs as xu | ||
xu.sin(ds) | ||
|
||
Datasets also use looping over variables for *broadcasting* in binary | ||
arithmetic. You can do arithmetic between any ``DataArray`` and a dataset: | ||
|
||
|
@@ -329,5 +335,93 @@ Arithmetic between two datasets matches data variables of the same name: | |
Similarly to index based alignment, the result has the intersection of all | ||
matching data variables. | ||
|
||
.. [1] In some future version of NumPy, we should be able to override ufuncs for | ||
datasets by making use of ``__numpy_ufunc__``. | ||
.. [1] This was previously due to a limitation of NumPy, but with NumPy 1.13 | ||
we should be able to support this by leveraging ``__array_ufunc__`` | ||
(:issue:`1617`). | ||
|
||
.. computation.wrapping-custom: | ||
|
||
Wrapping custom computation | ||
=========================== | ||
|
||
It doesn't always make sense to do computation directly with xarray objects: | ||
|
||
- When working with small arrays (less than ~1e7 elements), applying an | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the point on speed is a distraction? If an array is that small, the absolute difference in speed is still very small, so it really only makes a difference if you're doing those operations in a loop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I had "in the inner loop" in mind when I wrote this, but I see now that that never made it into the text. Let me know if this latest update seems more reasonable to you, or if I'm still pushing too hard on performance considerations. |
||
operation with xarray can be significantly slower. Keeping track of labels | ||
and ensuring their consistency adds overhead, and xarray's high level label- | ||
based APIs remove low-level control over the implementation of operations. | ||
Also, xarray's core itself is not especially fast, because it's written in | ||
Python rather than a compiled language like C. | ||
- Even if speed doesn't matter, it can be important to wrap existing code, or | ||
to support alternative interfaces that don't use xarray objects. | ||
|
||
For these reasons, it is often well-advised to write low-level routines that | ||
work with NumPy arrays, and to wrap these routines to work with xarray objects. | ||
However, adding support for labels on both :py:class:`~xarray.Dataset` and | ||
:py:class:`~xarray.DataArray` can be a bit of a chore. | ||
|
||
To make this easier, xarray supplies the :py:func:`~xarray.apply_ufunc` helper | ||
function, designed for wrapping functions that support broadcasting and | ||
vectorization on unlabeled arrays in the style of a NumPy | ||
`universal function <https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html>`_ ("ufunc" for short). | ||
``apply_ufunc`` takes care of everything needed for an idiomatic xarray wrapper, | ||
including alignment, broadcasting, looping over ``Dataset`` variables (if | ||
needed), and merging of coordinates. In fact, many internal xarray | ||
functions/methods are written using ``apply_ufunc``. | ||
|
||
Simple functions that act independently on each value should work without | ||
any additional arguments: | ||
|
||
.. ipython:: python | ||
|
||
squared_error = lambda x, y: (x - y) ** 2 | ||
arr1 = xr.DataArray([0, 1, 2, 3], dims='x') | ||
xr.apply_func(squared_error, arr1, 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
For using more complex operations that consider some array values collectively, | ||
it's important to understand the idea of "core dimensions" from NumPy's | ||
`generalized ufuncs <http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html>`_. Core dimensions are defined as dimensions | ||
that should *not* be broadcast over. Usually, they correspond to the fundamental | ||
dimensions over which an operation is defined, e.g., the summed axis in | ||
``np.sum``. A good clue that core dimensions are needed is the presence of an | ||
``axis`` argument on the corresponding NumPy function. | ||
|
||
With ``apply_ufunc``, core dimensions are recognized by name, and then moved to | ||
the last dimension of any input arguments before applying the given function. | ||
This means that for functions that accept an ``axis`` argument, you usually need | ||
to set ``axis=-1``. As an example, here is how we would wrap | ||
:py:func:`numpy.linalg.norm` to calculate the vector norm: | ||
|
||
.. code-block:: python | ||
|
||
def vector_norm(x, dim, ord=None): | ||
return xr.apply_ufunc(np.linalg.norm, x, | ||
input_core_dims=[[dim]], | ||
kwargs={'ord': ord, 'axis': -1}) | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
def vector_norm(x, dim, ord=None): | ||
return xr.apply_ufunc(np.linalg.norm, x, | ||
input_core_dims=[[dim]], | ||
kwargs={'ord': ord, 'axis': -1}) | ||
|
||
.. ipython:: python | ||
|
||
vector_norm(arr1, dim='x') | ||
|
||
Because ``apply_ufunc`` follows a standard convention for ufuncs, it plays | ||
nicely with tools for building vectorized functions, like | ||
:func:`numpy.broadcast_arrays` and :func:`numpy.vectorize`. For high performance | ||
needs, consider using Numba's `vectorize and guvectorize <http://numba.pydata.org/numba-doc/dev/user/vectorize.html>`_. | ||
|
||
In addition to wrapping functions, ``apply_ufunc`` can automatically parallelize | ||
many functions when using dask by setting ``dask='parallelized'``. This is | ||
illustrated in a separate example. | ||
|
||
.. TODO: add link! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking the recipes would be a good place for this: http://xarray.pydata.org/en/stable/auto_gallery/index.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, if I can come up with a figure! |
||
|
||
:py:func:`~xarray.apply_ufunc` also supports some advanced options for | ||
controlling alignment of variables and the form of the result. See the | ||
docstring for full details and more examples. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,6 +75,11 @@ Backward Incompatible Changes | |
Enhancements | ||
~~~~~~~~~~~~ | ||
|
||
- New helper function :py:func:`~xarray.apply_ufunc` for wrapping functions | ||
written to work on NumPy arrays to support labels on xarray objects. | ||
``apply_ufunc`` also support automatic parallelization for many functions | ||
with dask. See :ref:`computation.wrapping-custom` for details. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't forget:
|
||
- Support for `pathlib.Path` objects added to | ||
:py:func:`~xarray.open_dataset`, :py:func:`~xarray.open_mfdataset`, | ||
:py:func:`~xarray.to_netcdf`, and :py:func:`~xarray.save_mfdataset` | ||
|
@@ -232,7 +237,7 @@ Bug fixes | |
The previous behavior unintentionally causing additional tests to be skipped | ||
(:issue:`1531`). By `Joe Hamman <https://github.com/jhamman>`_. | ||
|
||
- Fix pynio backend for upcoming release of pynio with python3 support | ||
- Fix pynio backend for upcoming release of pynio with python3 support | ||
(:issue:`1611`). By `Ben Hillman <https://github/brhillman>`_. | ||
|
||
.. _whats-new.0.9.6: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently