Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling_exp (nee ewm) #2650

Merged
merged 51 commits into from
Jun 24, 2019
Merged

rolling_exp (nee ewm) #2650

merged 51 commits into from
Jun 24, 2019

Conversation

max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Jan 4, 2019

  • Duplicate groupby / rolling interface
  • Integrate with apply_ufunc, including orientation and coords
  • Define com / span / alpha interface
  • Implement for Variable & Dataset
  • Docs

xarray/core/ewm.py Outdated Show resolved Hide resolved
@shoyer
Copy link
Member

shoyer commented Jan 6, 2019

I know the name ewm matches pandas, but I find it rather inscrutable if you don't already know the acronym. What about something a little longer, maybe exp_rolling()?

@shoyer
Copy link
Member

shoyer commented Jan 6, 2019

Before we add even an optional dependency on numbagg in xarray, we should probably do a bit of cleanup (e.g., making sure we're happy with its public interface, and putting a release up on pypi)

@max-sixty
Copy link
Collaborator Author

I know the name ewm matches pandas, but I find it rather inscrutable if you don't already know the acronym.

👍

What about something a little longer, maybe exp_rolling()?

Yes, that works.
One alternative is to add a window_type / weighting argument to Rolling - Exponentially weighted is one of many alternatives

I'm fairly balanced between them - others' thoughts?

@fujiisoup
Copy link
Member

I like window_type / weighting argument to Rolling.
But a little concern is whether it matches to the current API, such as __iter__.

BTW, does ewm computes the window mean based on index or coordinate value?

@shoyer
Copy link
Member

shoyer commented Jan 8, 2019

I would lean towards a dedicated method, since there are method specific options. It's pretty awkward to reuse a single interface for that.

@max-sixty
Copy link
Collaborator Author

BTW, does ewm computes the window mean based on index or coordinate value?

Index, currently.

Would be great to have an algo that dealt with coord value, and I think not too difficult

@max-sixty
Copy link
Collaborator Author

Tests seem to be failing on a different issue? https://travis-ci.org/pydata/xarray/jobs/479042667#L7759

@shoyer
Copy link
Member

shoyer commented Jan 13, 2019 via email

@max-sixty
Copy link
Collaborator Author

max-sixty commented Jan 14, 2019

I think this is in a reasonable state for DataArray, excluding docs. Let me know any feedback on the APi

Does anyone have a view on the canonical way to implement these for Dataset, given potentially only a subset of the variables will have the dimension?
Tests fail when naively using apply_ufunc; .reduce looks like it has some functionality for skipping those variables. Or I could do it manually in a couple of lines.

@shoyer
Copy link
Member

shoyer commented Jan 14, 2019

Does anyone have a view on the canonical way to implement these for Dataset, given potentially only a subset of the variables will have the dimension?
Tests fail when naively using apply_ufunc; .reduce looks like it has some functionality for skipping those variables. Or I could do it manually in a couple of lines.

You could probably copy the logic from Dataset.reduce, which simply applies different logic for coordinates (they either get preserved or dropped depending on if they reuse the reduced dimension). That said, if EWM preserves the dimension size/labels you probably don't nee any special logic for coordinates -- see DatasetRolling.reduce as well.

@max-sixty
Copy link
Collaborator Author

That said, if EWM preserves the dimension size/labels you probably don't nee any special logic for coordinates

The error is when applying over a dimension on a dataset where only some of the variables have the dimension; e.g. applying over time on this:

<xarray.Dataset>
Dimensions:  (time: 10, x: 8, y: 2)
Coordinates:
  * x        (x) float64 0.0 0.1429 0.2857 0.4286 0.5714 0.7143 0.8571 1.0
  * time     (time) float64 0.0 0.1111 0.2222 0.3333 ... 0.7778 0.8889 1.0
    c        (y) <U1 'a' 'b'
  * y        (y) int64 0 1
Data variables:
    z1       (y, x) float64 -0.1035 -0.8153 -1.583 ... 1.447 0.7768 -0.2699
    z2       (time, y) float64 0.968 0.7156 -1.64 ... 0.1889 -1.142 0.7172

...rather than any issues applying on coords. I think the solution is easy - only apply on variables where the dimension exists.

If it's helpful to add that functionality directly to apply_ufunc, lmk. I'll focus on getting this working, though

@fujiisoup
Copy link
Member

@max-sixty

The error is when applying over a dimension on a dataset where only some of the variables have the dimension;

I remember I faced the same issue in implementing differentiate, interp, trapz, etc... and manually wrote the same logic for several times.

If it's helpful to add that functionality directly to apply_ufunc, lmk.

I think it would make the code much cleaner at least for these methods.

@max-sixty
Copy link
Collaborator Author

I made an attempt to add the "skip variables without the dimension" to apply_ufunc, but it's much harder than I expected - there are more cases than I expected (e.g. multiple datasets).

I may be missing something - let me know if there's an reasonable approach

Otherwise I'll do the close thing for this PR, and potentially we can have a look at the general solution later

xarray/core/rolling_exp.py Outdated Show resolved Hide resolved
@max-sixty
Copy link
Collaborator Author

This is updated! Could put an "Experimental" label on if we want (or maybe that's implicit).

Let me know any final changes. Will be good to get this merged at last.

@max-sixty
Copy link
Collaborator Author

The gentlest of reminders that I think this is ready to merge (mea culpa for leaving it at 90% for so long)

xarray/core/rolling_exp.py Outdated Show resolved Hide resolved
xarray/core/common.py Outdated Show resolved Hide resolved
@pep8speaks
Copy link

pep8speaks commented Jun 21, 2019

Hello @max-sixty! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-24 03:44:56 UTC

@max-sixty
Copy link
Collaborator Author

Great - updated! Let me know any final comments!

xarray/core/rolling_exp.py Outdated Show resolved Hide resolved
@max-sixty
Copy link
Collaborator Author

Updated! Let me know any final changes!

@shoyer shoyer merged commit cfd8210 into pydata:master Jun 24, 2019
@shoyer
Copy link
Member

shoyer commented Jun 24, 2019

OK, in it goes. Thanks @max-sixty !

@max-sixty
Copy link
Collaborator Author

Thanks for all the help!

@max-sixty max-sixty deleted the ewm branch June 24, 2019 15:26
dcherian added a commit to dcherian/xarray that referenced this pull request Jun 24, 2019
* master: (31 commits)
  Add quantile method to GroupBy (pydata#2828)
  rolling_exp (nee ewm) (pydata#2650)
  Ensure explicitly indexed arrays are preserved (pydata#3027)
  add back dask-dev tests (pydata#3025)
  ENH: keepdims=True for xarray reductions (pydata#3033)
  Revert cmap fix (pydata#3038)
  Add "errors" keyword argument to drop() and drop_dims() (pydata#2994) (pydata#3028)
  More consistency checks (pydata#2859)
  Check types in travis (pydata#3024)
  Update issue templates (pydata#3019)
  Add pytest markers to avoid warnings (pydata#3023)
  Feature/merge errormsg (pydata#2971)
  More support for missing_value. (pydata#2973)
  Use flake8 rather than pycodestyle (pydata#3010)
  Pandas labels deprecation (pydata#3016)
  Pytest capture uses match, not message (pydata#3011)
  dask-dev tests to allowed failures in travis (pydata#3014)
  Fix 'to_masked_array' computing dask arrays twice (pydata#3006)
  str accessor (pydata#2991)
  fix safe_cast_to_index (pydata#3001)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants