-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to skip tests in weighted()
#4541
Comments
I don't have that much context on
|
The relevant context is that |
Sorry, I completely misunderstood! I thought you were asking about skipping tests as in pytest, hence my confusion. For sure re skipping those checks with dask arrays. |
Sorry if my initial issue was unclear. |
Not at all, my mistake
👍 |
What would happen in this case if a dask array with nans is passed? Would this somehow silently influence the results or would it not matter (in that case I wonder what the check was for). |
If it leads to incorrect results, I agree. If it leads to a lazy error (even if more confusing), or a result array full of NaNs, then I think it's fine. Not super confident on the latter case, tbc. If we want more control, I would advocate for using a standard kwarg that offers control over the computation — e.g. |
Sounds good. I'll see if I can make some time to test and put up a PR this week. |
I think The relevant test is here: xarray/xarray/tests/test_weighted.py Line 22 in adc55ac
|
The other possibility would be to do sth like: def __init__(..., skipna=False):
if skipna:
weights = weighs.fillna(0) we did decide to not do this somewhere in the discussion, not entirely sure anymore why. |
Thanks @mathause , I was wondering how much of a performance trade off I favor this, since it allows slicing before the calculation is triggered: I have a current situation where I do a bunch of operations on a large multi-model dataset. The weights are time and member dependent and I am trying to save each member separately. Having the calculation triggered for the full dataset is problematic and |
The discussion goes back to here: #2922 (comment) (by @dcherian)
Thinking a bit more about this I now favour the In addition, I am also not entirely sure I understand where your issue lies. You eventually have to compute, right? Do you do something between Ah maybe I understand, your data looks like:
And now My limited speed tests: import numpy as np
import xarray as xr
a = xr.DataArray(np.random.randn(1000, 1000, 10, 10))
%timeit a.isnull().any()
%timeit a.fillna(0)
b = xr.DataArray(np.random.randn(1000, 1000, 10, 10)).chunk(100)
%timeit b.isnull().any()
%timeit b.fillna(0) |
Ah, sorry! I was thinking of weights as being numpy arrays, not so much dask arrays.
Yeah I think this is the issue.
This would be OK. We could also drop the check and let users deal with it, and also add a warning to the docstring. |
Another option would be to put the check in a |
Uh that sounds great actually. Same functionality, no triggered computation, and no intervention needed from the user. Should I try to implement this? |
Yes that would be great. |
When working with large dask-array weights, this check triggers computation of the array. This affects xgcms ability to layer operations lazily
Would you be open to implement an option to skip this test, maybe with a warning displayed? Happy to submit a PR.
The text was updated successfully, but these errors were encountered: