Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose apply_ufunc as public API and add documentation #1619

Merged
merged 7 commits into from
Oct 20, 2017

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Oct 10, 2017

apply_ufunc() does not meet all our needs for wrapping unlabeled array routines with xarray (see #1618 for a proposal for apply_raw()), but it should be useful for many advanced users and isn't doing much in its current state as non-public API. So I'd like to make it public for the xarray 0.10 release.

@MaximilianR @rabernat @jhamman Review from any of you would be appreciated here!

TODO: turn the description from GH1517 into an auto-gallery example, and link
to it from `computation.rst` and `dask.rst`.
@shoyer shoyer added this to the 0.10 milestone Oct 10, 2017
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I understand this much better now myself. Ping me again once you have linked to the doc/example/recipe.

written to work on NumPy arrays to support labels on xarray objects.
``apply_ufunc`` also support automatic parallelization for many functions
with dask. See :ref:`computation.wrapping-custom` for details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget:

(:issue:`XXXX`). By `Stephan Hoyer <https://github.com/shoyer>`_. 

many functions when using dask by setting ``dask='parallelized'``. This is
illustrated in a separate example.

.. TODO: add link!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking the recipes would be a good place for this: http://xarray.pydata.org/en/stable/auto_gallery/index.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if I can come up with a figure!

Copy link
Member

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Just found two small typos while reading the new doc.

Unfortunately, a limitation of the current version of numpy means that we
cannot override ufuncs for datasets, because datasets cannot be written as
a single array [1]_. :py:meth:`~xarray.Dataset.apply` works around this
Unfortunately, we current do not support NUmPy ufuncs for datasets [1]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently


squared_error = lambda x, y: (x - y) ** 2
arr1 = xr.DataArray([0, 1, 2, 3], dims='x')
xr.apply_func(squared_error, arr1, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xr.apply_ufunc

@shoyer
Copy link
Member Author

shoyer commented Oct 10, 2017

@benbovy @jhamman thanks for giving this a read. The docs actually build now, and I added a section on "Automatic parallelization" to the dask docs.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc updates look good. Just one small comment.

doc/dask.rst Outdated
Another option is to use xarray's :py:func:`~xarray.apply_ufunc`, which can
automatically parallelize functions written for processing NumPy arrays when
applied to dask arrays. It works similarly to :py:func:`dask.array.map_blocks`
and :py:func:`dask.array.atop`, but without requiring an immediate layer of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean intermediate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed


It doesn't always make sense to do computation directly with xarray objects:

- When working with small arrays (less than ~1e7 elements), applying an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point on speed is a distraction? If an array is that small, the absolute difference in speed is still very small, so it really only makes a difference if you're doing those operations in a loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I had "in the inner loop" in mind when I wrote this, but I see now that that never made it into the text. Let me know if this latest update seems more reasonable to you, or if I'm still pushing too hard on performance considerations.

@@ -724,7 +727,7 @@ def apply_ufunc(func, *args, **kwargs):
``apply_ufunc`` to write functions to (very nearly) replicate existing
xarray functionality:

Calculate the vector magnitude of two arguments:
Calculate the vector magnitude of two arguments::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you write ::, Sphinx formats the block below as code in the generated API docs.

@shoyer
Copy link
Member Author

shoyer commented Oct 19, 2017

Any additional comments? If not, I'll merge this in about 24 hours.

@shoyer shoyer merged commit b3387cb into pydata:master Oct 20, 2017
@shoyer shoyer deleted the apply_ufunc-docs-and-public-api branch October 20, 2017 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants