Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping variables in datasets that don't have the core dim #2674

Closed
wants to merge 2 commits into from

Conversation

max-sixty
Copy link
Collaborator

ref #2650 (comment)

This seems an ugly way of accomplishing the goal; any ideas for a better way of doing this?

And stepping back, do others think a) it's helpful to skip variables in a dataset, and b) apply_ufunc should do this?

@shoyer
Copy link
Member

shoyer commented Jan 15, 2019

cc @fujiisoup

The challenge here is that this logic depends on the nature of the function. It only makes skip variables if applying the operation to a scalar returns the scalar. This is true for many but not all reductions, and there are lots of edge cases, e.g.,

  • mean returns the same value, but integers should be cast to floats for consistency
  • sum returns the original element unless it's a scalar NaN (in which case it gets replaced by 0, unless skipna=False
  • count should always return 1, regardless of the original value
  • indexing does always return the same value

So this really needs to be opt-in only, and even then I'm not sure it's worth the trouble. It might be better to explicitly define functions even in the case of axis=() (the empty tuple), and either reuse the reduce() interface or make another (e.g., transform?) for functions that are oriented along a set of axes.

Actually, it isn't documented behavior but reduce() will already correctly handle this cases, too, e.g., for functions like cumsum() (this is used internally):

In [13]: ds = xarray.Dataset({'a': ('x', np.arange(3))})

In [14]: ds.reduce(np.cumsum)
Out[14]:
<xarray.Dataset>
Dimensions:  (x: 3)
Dimensions without coordinates: x
Data variables:
    a        (x) int64 0 1 3

@shoyer shoyer mentioned this pull request Jan 15, 2019
5 tasks
@max-sixty
Copy link
Collaborator Author

Thanks a lot for the comments @shoyer , that's v clarifying

Is there a conceptual overlap between the goals of .reduce and apply_ufunc? I had initially thought that .reduce strictly reduced a dimension, though that's not actually the case given cumsum

@shoyer
Copy link
Member

shoyer commented Jan 17, 2019

Is there a conceptual overlap between the goals of .reduce and apply_ufunc? I had initially thought that .reduce strictly reduced a dimension, though that's not actually the case given cumsum

Yes, I think so. reduce() is a simpler but complementary model to apply_ufunc.

Maybe we should rename reduce() to something like apply_axis_func() or apply_over_dim? I would use the name apply_along_axis but the numpy function of the same name does something different.

@shoyer
Copy link
Member

shoyer commented Jan 17, 2019

It might also be clearer to split this functionality into two methods rather than the single reduce() method, e.g., reduce() and transform() in the model of pandas's groupby methods.

It's probably worth thinking about these APIs more systematically, see #1618, #1251, #1130

@max-sixty
Copy link
Collaborator Author

It's probably worth thinking about these APIs more systematically, see #1618, #1251, #1130

Nice, thanks for finding those

It might also be clearer to split this functionality into two methods rather than the single reduce() method, e.g., reduce() and transform() in the model of pandas's groupby methods.

Is the split needed? I had thought that the function either

  • returned a reduced array, and the output dims should be reduced
  • returned an array with the original dims, and then output dims shouldn't be reduced

@max-sixty
Copy link
Collaborator Author

Is there a next step for a function which:

I'd be happy to give this a go if we agree on the broad design

For names, it's a shame the current apply isn't called map, and this could be apply. I'm not a great fan of apply_raw, since I'm can't yet see why it's any more 'raw'.

@max-sixty max-sixty mentioned this pull request Oct 29, 2019
4 tasks
@max-sixty max-sixty closed this May 13, 2021
@max-sixty max-sixty deleted the apply-ufunc-ds branch May 13, 2021 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants