disallow boolean coordinates? #4892

mathause · 2021-02-11T09:33:17Z

Today I stumbled over a small pitfall, which I think could be avoided:

I am working with arrays that have axes labeled with categorical values and I ended up using True/False as labels for some binary categories:

test = xarray.DataArray(
     numpy.ones((3,2)), 
     dims=["binary","ternary"],
     coords={"ternary":[3,7,9],"binary":[False,True]}
)

now came the big surprise, when I wanted to reduce over selections of the data:

test.sel(ternary=[9,3,7]) # does exactly what I expect and gives me the correctly permuted 3x2 array
test.sel(binary=[True,False]) # does not do what I expect

Instead of using the coordinate values like with the ternary category, it uses the list as boolean mask and hence I get a 3x1 array at the binary=False coordinate.

I assume that this behavior is reasonable in most cases - And I for sure will stop using bools as binary category labels.
That said in the above case the conceptually identical call results in completely different outcome.

My (radical) proposal would be: forbid binary coordinates in general to avoid such confusion.

Curious about your thoughts! Hth,

Marti

Originally posted by @martinitus in #4861

The text was updated successfully, but these errors were encountered:

max-sixty · 2021-03-21T00:35:57Z

It's a good issue!

I would have thought .sel should default to using labels. I recognize that boolean indexing is really helpful — and think we should try and support it more (e.g. #1887).

And I recognize .sel delegates to .isel in some cases — e.g. where there are no indexes on a dimension.

But here, I propose we prioritize the labels above the boolean indexing in .sel. People can still use .isel if they want bool indexing in the method call syntax:

In [4]: test = xr.DataArray(
   ...:      np.ones((3,2)),
   ...:      dims=["ternary","binary"],
   ...:      coords={"ternary":[3,7,9],"binary":[False,True]}
   ...: )

In [5]: test.sel(binary=[True,False])
Out[5]:
<xarray.DataArray (ternary: 3, binary: 1)>
array([[1.],
       [1.],
       [1.]])
Coordinates:
  * ternary  (ternary) int64 3 7 9
  * binary   (binary) bool False

In [6]: test.isel(binary=[True,False])
Out[6]:
<xarray.DataArray (ternary: 3, binary: 1)>
array([[1.],
       [1.],
       [1.]])
Coordinates:
  * ternary  (ternary) int64 3 7 9
  * binary   (binary) bool False

I would also be OK removing the .sel delegation to .isel if it causes complications. IIUC the main use case is a single method call to index labels and dimensions without labels. But that's easy to replace with two method calls, I think?

martinitus · 2021-03-31T10:30:49Z

I don't know the internals of delegation between .sel and .isel. But from the user side I would expect that boolean indexing requires me to use .isel naturally. I mean, I have to provide a boolean mask that fits the shape of the array, i.e. it is naturally index based and should only be used with .isel irrespective of the coordinate types.

While that probably be a breaking change for some people, I think it makes a quite complicated topic slightly easier to document, and figure out intentions in written code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disallow boolean coordinates? #4892

disallow boolean coordinates? #4892

mathause commented Feb 11, 2021 •

edited

Loading

max-sixty commented Mar 21, 2021

martinitus commented Mar 31, 2021

disallow boolean coordinates? #4892

disallow boolean coordinates? #4892

Comments

mathause commented Feb 11, 2021 • edited Loading

max-sixty commented Mar 21, 2021

martinitus commented Mar 31, 2021

mathause commented Feb 11, 2021 •

edited

Loading