-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boolean indexing with multi-dimensional key arrays #1887
Comments
Since #3206 has been implemented now: |
Just wanted to confirm, that boolean indexing is indeed highly relevant, especially for assigning values instead of just selecting them. Here is a use case which I encounter very often: I'm working with very sparse data (e.g a satellite image of some islands surrounded by water), and I want to modify it using Here is how I would achieve this in numpy:
The most important fact here is that Unfortunately, nothing like this is currently possible with XArray. If implemented, it would enable some crazy speedups for operations like spatial interpolation, where we don't want to interpolate the whole image, but only some pixels that we care about. |
I've added the "good first issue" label — at least the first two bullets of the proposal would be relatively simple to implement, given they're mostly syntactic sugar. |
It's worth noting that there is at least one other way boolean indexing could work:
We can't support both with the same syntax, so we have to make a choice here :). See also the discussion about what |
I've been trying to conceptualize why I think the
But I don't do much pointwise indexing — and so maybe we do want to prioritize that |
Here are two reasons why I like the
As a side note: one nice feature of using
To match the semantics of NumPy,
I'm not quite sure this is true -- it's the difference between needing to call |
OK great. To confirm, this is what it would look like: Context: In [81]: da = xr.DataArray(np.arange(12).reshape(3,4), dims=list('ab'))
In [82]: da
Out[82]:
<xarray.DataArray (a: 3, b: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Dimensions without coordinates: a, b
In [84]: key = da % 3 == 0
In [83]: key
Out[83]:
<xarray.DataArray (a: 3, b: 4)>
array([[ True, False, False, True],
[False, False, True, False],
[False, True, False, False]])
Dimensions without coordinates: a, b Currently In [85]: da[key]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-85-7fd83c907cb6> in <module>
----> 1 da[key]
...
~/.asdf/installs/python/3.8.8/lib/python3.8/site-packages/xarray/core/variable.py in _validate_indexers(self, key)
697 )
698 if k.ndim > 1:
--> 699 raise IndexError(
700 "{}-dimensional boolean indexing is "
701 "not supported. ".format(k.ndim)
IndexError: 2-dimensional boolean indexing is not supported. Current proposal (" In [86]: da.values[key.values]
Out[86]: array([0, 3, 6, 9]) # But the xarray version Previous suggestion (" In [87]: da.where(key)
Out[87]:
<xarray.DataArray (a: 3, b: 4)>
array([[ 0., nan, nan, 3.],
[nan, nan, 6., nan],
[nan, 9., nan, nan]])
Dimensions without coordinates: a, b (small follow up I'll put in another message, for clarity) |
This was a tiny point so it's fine to discard. I had meant that producing the |
Yes, this looks right to me. |
The part about this new proposal that is most annoying is that the |
I'm still working through this. Using this to jot down my notes, no need to respond. One property that seems to be lacking is that if In [171]: a
Out[171]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [172]: mask
Out[172]: array([ True, False, True])
In [173]: a[mask]
Out[173]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]]) ...as expected, but now let's make a 2D mask... In [174]: full_mask = np.broadcast_to(mask[:, np.newaxis], (3,4))
In [175]: full_mask
Out[175]:
array([[ True, True, True, True],
[False, False, False, False],
[ True, True, True, True]])
In [176]: a[full_mask]
Out[176]: array([ 0, 1, 2, 3, 8, 9, 10, 11]) # flattened! |
IMO, the perfect solution would be masking support. In [87]: da[key]
Out[87]:
<xarray.DataArray (a: 3, b: 4)>
array([[ 0, <NA>, <NA>, 3],
[<NA>, <NA>, 6, <NA>],
[<NA>, 9, <NA>, <NA>]])
dtype: int
Dimensions without coordinates: a, b Then we could have something like In [87]: da[key].stack(new_dim=["a", "b"], dropna=True)
Out[87]:
<xarray.DataArray (newdim: 4)>
array([0, 3, 6, 9])
coords{
"a" (newdim): [0, 0, 1, 2],
"b" (newdim): [0, 3, 2, 1],
}
Dimensions without coordinates: newdim Here, Also, that would avoid all those unnecessary |
This could be useful (potentially we can open a different issue). While someone can call |
Originally from #974
For boolean indexing:
da[key]
wherekey
is a boolean labelled array (with any number of dimensions) is made equivalent toda.where(key.reindex_like(ds), drop=True)
. This matches the existing behavior ifkey
is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means thatda[key].mean()
gives the same result as in NumPy.da[key] = value
wherekey
is a boolean labelled array can be made equivalent toda = da.where(*align(key.reindex_like(da), value.reindex_like(da)))
(that is, the three argument form ofwhere
).da[key_0, ..., key_n]
where all ofkey_i
are boolean arrays gets handled in the usual way. It is anIndexingError
to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply writeda[key_0 & ... & key_n]
.The text was updated successfully, but these errors were encountered: