Consistent naming for xarray's methods that apply functions #1251

shoyer · 2017-02-05T21:27:24Z

We currently have two types of methods that take a function to apply to xarray objects:

pipe (on DataArray and Dataset): apply a function to this entire object (array.pipe(func) -> func(array))
apply (on Dataset and GroupBy): apply a function to each labeled object in this object (e.g., ds.apply(func) -> ds({k: func(v) for k, v in ds.data_vars.items()})).

And one more method that we want to add but isn't finalized yet -- currently named apply_ufunc:

Apply a function that acts on unlabeled (i.e., numpy) arrays to each array in the object

I'd like to have three distinct names that makes it clear what these methods do and how they are different. This has come up a few times recently, e.g., #1130

One proposal: rename apply to map, and then use apply only for methods that act on unlabeled arrays. This would require a deprecation cycle, but eventually it would let us add .apply methods for handling raw arrays to both Dataset and DataArray. (We could use a separate apply method from apply_ufunc to convert dim arguments to axis and not do automatic broadcasting.)

The text was updated successfully, but these errors were encountered:

max-sixty · 2017-02-06T03:01:06Z

Sounds good. It breaks the consistency with pandas' apply, but map is much more logical

shoyer · 2017-02-07T03:08:41Z

Another option is to keep apply as-is for Dataset and GroupBy objects, but add a separate apply_raw method for applying functions that act on "raw" arrays. This would be a little more similar to pandas' apply with raw=True.

We could even do the raw=True keyword argument like pandas, but this is a little awkward because there are some additional arguments on apply_raw that don't make sense on apply (e.g., arguments that specify that some dimensions should be dropped or added).

benbovy · 2017-02-07T09:16:34Z

I would be +1 for apply and apply_raw.

max-sixty · 2019-01-17T22:06:05Z

One proposal: rename apply to map

Would we accept this? I'd be up for doing the PR to deprecate apply and introduce map. It makes a Dataset more consistent with a standard mapping interface. But it would be inconsistent with pandas and a rename of fairly widely used method

dcherian · 2019-01-17T22:10:44Z

One proposal: rename apply to map

-1 for pandas incompatibility.

I would like to rename rolling.reduce to rolling.apply to be consistent with pandas & groupby

max-sixty · 2019-01-17T22:21:06Z

I would like to rename rolling.reduce to rolling.apply to be consistent with pandas & groupby

+0.5 if map fails

shoyer · 2019-01-21T20:52:07Z

I don't think we should consider ourselves beholden to pandas's bad names, but we should definitely try to preserve backwards compatibility and interpretability for users.

Going back to Python itself:

apply(func, args, kwargs) (from Python 2.x) is equivalent to func(*args, **kwargs)
map() maps a function over each element of an iterable
functools.reduce() applies a binary function repeatedly to convert an iterable into a single element

For xarray, we need:

a method for wrapping functions that work on unlabeled arrays
a method for mapping functions over each element of a Dataset or grouped object.
(possibly) a method for wrapping aggregation functions that act on unlabeled arrays

Currently, we call both (1) and (2) apply(), which is pretty confusing, and use reduce() for (3) even though it could potentially be a special case of (1) with a bit of extra magic and is quite unlike functools.reduce. In contrast, pandas calls both (1) and (2) apply() (using raw=True/raw=False to distinguish), and calls (3) aggregate or agg.

So long term, it could make sense to rename the current Dataset.apply()/GroupBy.apply() (case 2) to .map, and also rename .reduce() to the more generic .aggregate().

That said, I'm trying to imagine what the transition process for switching to new behavior for Dataset.apply looks like. We already will re-add dimensions to the output from calling functions in apply(), but at some point we have to a do a hard cut-off from passing DataArray objects to the function in apply to passing in a raw array.

I suppose we could do this by adding a raw keyword-only argument to .apply():

If raw=False (current default), we would raise a warning about changing behavior and would pass-on DataArray objects to the applied function. Users would be encouraged to use .map() instead.
If raw=True (future default behavior), we would pass in raw numpy/dask arrays to the future function.
The dim argument might only be supported with raw=True.

We would end up with an extra extraneous raw argument, which we could remove/deprecate at our leisure.

max-sixty · 2019-10-29T19:32:14Z

I put the change for Dataset.apply -> Dataset.map in. Should we do the same for GroupBy?

I think those are probably the two easiest decisions to make (and hopefully will kick off moving this issue forwards)

Edit: the reason I hesitated for GroupBy is that it's not exactly the same: the object returned isn't the same (i.e. Dataset.map returns a Dataset, while GroupBy.map would return a Dataset)

shoyer · 2019-11-05T01:48:30Z

+@dcherian

@max-sixty thanks for pushing this along!

I think I'm coming to appreciate backwards compatibility as an important consideration more and more these days. It's just really painful to reuse methods for something entirely different.

This makes me lean towards adding separate apply_raw() methods. The name is definitely less memorable than apply, but on the other hand it is also definitely easier to guess the difference between apply/apply_raw then apply/map.

max-sixty · 2019-11-05T21:44:44Z

OK. Does that inform your view on map vs apply?

I more strongly think that apply is a confusing and non-standard term for "run this function on each item in this container", even if it's pandas' name.

I'm keener to offer map an as option that necessarily reusing apply.

What are your thoughts re:

Adding map as the documented approach for run-on-each on Dataset & GroupBy
Adding apply_raw (or similar) as a new function that runs functions on the 'raw' arrays
Keeping apply around for backward-compat, similar to the drop case

shoyer · 2019-11-05T21:50:01Z

I more strongly think that apply is a confusing and non-standard term for "run this function on each item in this container", even if it's pandas' name.

This is a fair point. map is definitely the standard name in the context of "map reduce" type operations.

What are your thoughts re:

Adding map as the documented approach for run-on-each on Dataset & GroupBy

Adding apply_raw (or similar) as a new function that runs functions on the 'raw' arrays

Keeping apply around for backward-compat, similar to the drop case

I would support this.

dcherian · 2019-11-05T22:14:49Z

@max-sixty I like your proposal!

max-sixty · 2022-04-27T20:06:24Z

FYI the apply / map change went in a couple of years ago.

We still don't have an apply_raw. I think it probably makes sense to consolidate this with #1618, so I'll close this, even though it has some good discussion.

shoyer mentioned this issue Oct 10, 2017

apply_raw() for a simpler version of apply_ufunc() #1618

Open

shoyer mentioned this issue Jan 17, 2019

Skipping variables in datasets that don't have the core dim #2674

Closed

max-sixty mentioned this issue Oct 29, 2019

Dataset.map #3459

Merged

4 tasks

max-sixty mentioned this issue Feb 21, 2020

Add groupby.pipe? #3782

Closed

dcherian added the API design label Jul 4, 2021

max-sixty closed this as completed Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent naming for xarray's methods that apply functions #1251

Consistent naming for xarray's methods that apply functions #1251

shoyer commented Feb 5, 2017

max-sixty commented Feb 6, 2017

shoyer commented Feb 7, 2017

benbovy commented Feb 7, 2017

max-sixty commented Jan 17, 2019

dcherian commented Jan 17, 2019

max-sixty commented Jan 17, 2019

shoyer commented Jan 21, 2019

max-sixty commented Oct 29, 2019 •

edited

Loading

shoyer commented Nov 5, 2019

max-sixty commented Nov 5, 2019

shoyer commented Nov 5, 2019

dcherian commented Nov 5, 2019

max-sixty commented Apr 27, 2022

Consistent naming for xarray's methods that apply functions #1251

Consistent naming for xarray's methods that apply functions #1251

Comments

shoyer commented Feb 5, 2017

max-sixty commented Feb 6, 2017

shoyer commented Feb 7, 2017

benbovy commented Feb 7, 2017

max-sixty commented Jan 17, 2019

dcherian commented Jan 17, 2019

max-sixty commented Jan 17, 2019

shoyer commented Jan 21, 2019

max-sixty commented Oct 29, 2019 • edited Loading

shoyer commented Nov 5, 2019

max-sixty commented Nov 5, 2019

shoyer commented Nov 5, 2019

dcherian commented Nov 5, 2019

max-sixty commented Apr 27, 2022

max-sixty commented Oct 29, 2019 •

edited

Loading