-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xr.cov() and xr.corr() #4089
xr.cov() and xr.corr() #4089
Conversation
Still issues I think
Hello @AndrewWilliams3142! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-25 13:55:29 UTC |
The current problem is that we can't use Pandas to fully test As such, I think it maybe just makes sense to test a few low-dimensional cases? Eg >>> da_a = xr.DataArray(
np.random.random((3, 21, 4)),
coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)},
dims=("a", "time", "x"),
)
>>> da_b = xr.DataArray(
np.random.random((3, 21, 4)),
coords={"time": pd.date_range("2000-01-01", freq="1D", periods=21)},
dims=("a", "time", "x"),
)
>>> xr.cov(da_a, da_b, 'time')
<xarray.DataArray (a: 3, x: 4)>
array([[-0.01824046, 0.00373796, -0.00601642, -0.00108818],
[ 0.00686132, -0.02680119, -0.00639433, -0.00868691],
[-0.00889806, 0.02622817, -0.01022208, -0.00101257]])
Dimensions without coordinates: a, x
>>> xr.cov(da_a, da_b, 'time').sel(a=0,x=0)
<xarray.DataArray ()>
array(-0.01824046)
>>> da_a.sel(a=0,x=0).to_series().cov(da_b.sel(a=0,x=0).to_series())
-0.018240458880158048 So, while it's easy to check that a few individual points from I think it would also make sense to have some test cases where we don't use Pandas at all, but we specify the output manually? >>> da_a = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"])
>>> expected = [1, np.nan]
>>> actual = xr.corr(da_a, da_a, dim='time')
>>> assert_allclose(actual, expected) Does this seem like a good way forward? |
If you want to test individual values without reimplementing the function in the tests (which is what I suspect comparing with the result of If not, you could also check properties of covariance / correlation matrices, e.g. that |
One problem I came across here is that Current tests implemented are (in pseudocode...):
@keewis I tried reading the Hypothesis docs and got a bit overwhelmed, so I've stuck with example-based tests for now. |
Currently def corr(da_a, da_b, dim=None, ddof=0):
return _cov_corr(da_a, da_b, dim=None, ddof=0, method="corr")
def cov(da_a, da_b, dim=None, ddof=0):
return _cov_corr(da_a, da_b, dim=None, ddof=0, method="cov")
def _cov_corr(da_a, da_b, dim=None, ddof=0, method=None):
# compute cov
if method = "cov":
return cov
# compute corr
return corr Maybe you could use |
Could you also add a test for the with raises_regex(TypeError, "Only xr.DataArray is supported"):
xr.corr(xr.Dataset(), xr.Dataset()) |
Where do you mean sorry? Isn't this already there in corr()? if any(not isinstance(arr, (Variable, DataArray)) for arr in [da_a, da_b]):
raise TypeError(
"Only xr.DataArray and xr.Variable are supported."
"Given {}.".format([type(arr) for arr in [da_a, da_b]])
) EDIT: Scratch that, I get what you mean :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Four more nits ;)
One more thing actually, is there an argument for not defining |
If you insist ;)
is indeed marginally faster. As they are already aligned, we don't have to worry about this. |
Sweet! On second thought, I might leave it for now...the sun is too nice today. Can always have it as a future PR or something. :) |
Awesome @AndrewWilliams3142 ! Very excited we have this. Thanks for the review @mathause Hitting merge; any other feedback is welcome and we can iterate. |
* upstream/master: Improve interp performance (pydata#4069) Auto chunk (pydata#4064) xr.cov() and xr.corr() (pydata#4089) allow multiindex levels in plots (pydata#3938) Fix bool weights (pydata#4075) fix dangerous default arguments (pydata#4006)
Just a small comment: in the docs (http://xarray.pydata.org/en/latest/generated/xarray.cov.html#xarray.cov) there is a typo: da_a is declared twice, the second should really be da_b. |
thanks. Do you want to put in a PR fixing that? |
Well, actually I was thinking, that correcting it for someone who is working on the code on a daily basis is ~30 seconds. For me, I think, it would be quite a bit of overhead for a single character... |
@kefirbandi I didn't want to step on your toes, but I'm happy to put in a PR to fix the typo. :) |
@AndrewWilliams3142 I see. Thanks. |
PR for the
xr.cov()
andxr.corr()
functionality which others have been working on. Most code adapted from @r-beer in PR #3550.TODO:
Write a reasonable set of tests, maybe not usingpandas
as a benchmark? (See Function for regressing/correlating multiple fields? #3784 (comment)) Will probably need some help with thisCHECKLIST:
isort -rc . && black . && mypy . && flake8
(something wrong with docs though??)whats-new.rst
for all changes andapi.rst
for new API