Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add var and std to weighted computations #5870

Merged
merged 6 commits into from
Oct 28, 2021

Conversation

cjauvin
Copy link
Contributor

@cjauvin cjauvin commented Oct 15, 2021

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

This follows #2922 to add var, std and sum_of_squares to DataArray.weighted and Dataset.weighted. I would also like to add weighted quantile, eventually.

@pep8speaks
Copy link

pep8speaks commented Oct 15, 2021

Hello @cjauvin! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-15 20:23:32 UTC

@github-actions
Copy link
Contributor

github-actions bot commented Oct 15, 2021

Unit Test Results

         6 files           6 suites   57m 3s ⏱️
16 298 tests 14 562 ✔️ 1 736 💤 0
90 984 runs  82 802 ✔️ 8 182 💤 0

Results for commit 4b7d866.

♻️ This comment has been updated with latest results.

xarray/core/weighted.py Outdated Show resolved Hide resolved
cjauvin and others added 4 commits October 15, 2021 15:21
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>

da = DataArray([1, 2])
weights = DataArray(weights)
result = da.weighted(weights).sum_of_squares()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look quite similar to each other. Would the tests still be readable if we parametrize the function as well? Something like:

func = "sum_of_squares"  
result = getattr(da.weighted(weights), func)()

Copy link
Contributor Author

@cjauvin cjauvin Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the tests are all very similar, but I didn't really see a way to generalize the set of already existing ones, apart from extending it by following the same pattern (with nan, without, bool and equals weights). I thought of doing something like you suggest, but the potential problem is that it would make them harder to understand, and also I am of the opinion that the tests are a domain where the DRY principle is less important and applicable. That said, I am very open to revisiting the option, if you feel it would be better in this context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there's great value in them being simple and easy to read. I'm just feeling tldr-symptoms with these now but if you don't think it'll improve readability then that's fine.

@mathause
Copy link
Collaborator

Looks good - nice first xarray PR! One thing you could add is "degrees of freedom" for var and std (see statsmodels.stats.weightstats) - but I would also be ok to leave this out until someone requests it.

@huard
Copy link
Contributor

huard commented Oct 25, 2021

Is there an appetite to add a weighted quantile method as well ?

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjauvin let us know if you want to add anything to this PR, else I am happy to merge this as is before the next release.


btw - I wrote these all these not-so-readable tests. They are here to test all kinds of edge cases. I agree it could be good to simplify them but that could be another PR. I see two ways (1) don't care about the result but only if there is a number or NaN at a certain position (2) use an external library to check the results (although am I not sure there is a library that handles all these weighted operations with NaN).

@mathause mathause mentioned this pull request Oct 26, 2021
@cjauvin
Copy link
Contributor Author

cjauvin commented Oct 26, 2021

Thanks for the feedback @mathause! If you are ok with postponing the addition of an extra ddof param for a later PR, as you have suggested, then this PR is indeed complete from my perspective. I will try to propose an implementation for weighted quantiles soon.

@dcherian
Copy link
Contributor

Thanks @cjauvin I see this is your first PR. It's a great one. Welcome to xarray!

@dcherian dcherian changed the title Add variance and std dev to weighted computations Add var and std to weighted computations Oct 28, 2021
@dcherian dcherian merged commit b3b77f5 into pydata:main Oct 28, 2021
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 28, 2021
* upstream/main:
  Add var and std to weighted computations (pydata#5870)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2021
* main:
  Add typing_extensions as a required dependency (pydata#5911)
  pydata#5740 follow up: supress xr.ufunc warnings in tests (pydata#5914)
  Avoid accessing slow .data in unstack (pydata#5906)
  Add wradlib to ecosystem in docs (pydata#5915)
  Use .to_numpy() for quantified facetgrids (pydata#5886)
  [test-upstream] fix pd skipna=None (pydata#5899)
  Add var and std to weighted computations (pydata#5870)
  Check for path-like objects rather than Path type, use os.fspath (pydata#5879)
  Handle single `PathLike` objects in `open_mfdataset()` (pydata#5884)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2021
* upstream/main:
  Add typing_extensions as a required dependency (pydata#5911)
  pydata#5740 follow up: supress xr.ufunc warnings in tests (pydata#5914)
  Avoid accessing slow .data in unstack (pydata#5906)
  Add wradlib to ecosystem in docs (pydata#5915)
  Use .to_numpy() for quantified facetgrids (pydata#5886)
  [test-upstream] fix pd skipna=None (pydata#5899)
  Add var and std to weighted computations (pydata#5870)
@cjauvin cjauvin mentioned this pull request Dec 10, 2021
4 tasks
@dgilford
Copy link

dgilford commented Jan 4, 2022

Is there an appetite to add a weighted quantile method as well ?

I would be strongly interested in this capability!

@cjauvin
Copy link
Contributor Author

cjauvin commented Jan 4, 2022

@dgilford I have worked on it: #6059, but unfortunately the PR is currently stuck in limbo, for technical reasons.

snowman2 pushed a commit to snowman2/xarray that referenced this pull request Feb 9, 2022
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants