Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

square-bracket slice a Dataset with a DataArray #2027

Open
crusaderky opened this issue Mar 29, 2018 · 4 comments
Open

square-bracket slice a Dataset with a DataArray #2027

crusaderky opened this issue Mar 29, 2018 · 4 comments

Comments

@crusaderky
Copy link
Contributor

crusaderky commented Mar 29, 2018

Given this:

ds = xarray.Dataset(
    data_vars={
        'vote': ('pupil', [5, 7, 8]),
        'age': ('pupil', [15, 14, 16])
    },
    coords={
        'pupil': ['Alice', 'Bob', 'Charlie']
    })


<xarray.Dataset>
Dimensions:  (pupil: 3)
Coordinates:
  * pupil    (pupil) <U7 'Alice' 'Bob' 'Charlie'
Data variables:
    vote     (pupil) int64 5 7 8
    age      (pupil) int64 15 14 16

Why does this work:

ds.age[ds.vote >= 6]

<xarray.DataArray 'age' (pupil: 2)>
array([14, 16])
Coordinates:
  * pupil    (pupil) <U7 'Bob' 'Charlie'

But this doesn't?

ds[ds.vote >= 6]

KeyError: False

ds.vote >= 6 is a DataArray with dims=('pupil', ) and dtype=bool, so I can't think of any ambiguity in what I want to achieve?

Workaround:

ds.sel(pupil=ds.vote >= 6)


<xarray.Dataset>
Dimensions:  (pupil: 2)
Coordinates:
  * pupil    (pupil) <U7 'Bob' 'Charlie'
Data variables:
    vote     (pupil) int64 7 8
    age      (pupil) int64 14 16
@shoyer
Copy link
Member

shoyer commented Mar 29, 2018

I think the short answer why we don't support this is that with __getitem__ on Dataset it's potentially ambiguous which dimensions you are slicing along. This is why we require you to specify the dimensions using sel().

This might be clearer with integer indexing. We support indexing like ds.vote[np.array([1, 2])] or ds.vote[xarray.DataArray([1, 2], dims='new_dim')] because it's clear what the first dimension of ds.vote is. (Recall that the dimensions of the indexing key only determine how data in the result is arranged, not what is indexed.) But we don't support ds[np.array([1, 2])], because axis-order dependent indexing on a Dataset is potentially ambiguous.

However, we could potentially support this as a form of "multi-dimensional boolean indexing" (#1887). Basically, ds[key] where key is a single indexer with boolean dtype could be interpreted as equivalent to ds.where(key, drop=True).

@stale
Copy link

stale bot commented Jun 17, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jun 17, 2020
@crusaderky
Copy link
Contributor Author

Still relevant

@stale stale bot removed the stale label Jun 17, 2020
@stale
Copy link

stale bot commented Apr 18, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 18, 2022
@max-sixty max-sixty removed the stale label Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants