Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Dataset.loc should accept comparisons with a contained DataArray as key #2689

Closed
jendrikjoe opened this issue Jan 18, 2019 · 5 comments

Comments

@jendrikjoe
Copy link
Contributor

Hey xarray maintainers,

first of all, I love this repository! Thanks for all your effort.
I have a feature request for the .loc function of the Dataset class.
I think it would be beneficial to allow comparisons with DataArrays as keys for the loc function.
Obviously, this only works for DataArrays which contain the same dimensions or more than the indexer.
But e.g. in the weather dataset used as the example in the docs, it would allow one to filter precipitation and temperature by longitude or latitude, or precipitation by temperature.
I started implementing it, cause I required it for another use case.
I would be happy to integrate it to xarray, if it is of interest.

Cheers,
Jendrik

@shoyer
Copy link
Member

shoyer commented Jan 18, 2019

Hi Jendrik, thanks for your interest!

Could you please give an example of what exactly your proposed syntax would look like, with example inputs/outputs?

@jendrikjoe
Copy link
Contributor Author

Hey Shoyer,

sure I am happy to propose one.
Given the input from the xarray example page (http://xarray.pydata.org/en/stable/examples/weather-data.html), I would imagine something like this:

xarr = xarr.loc[xarr['tmin'] > 5]

If the DataArray is one dimensional this is straight forward to achieve by altering the _LocIndexer in the following way:

class _LocIndexer(object):
    def __init__(self, dataset):
        self.dataset = dataset

    def __getitem__(self, key):
        if not utils.is_dict_like(key):
            selector = {dim: key[dim][key] for dim in key.dims}
            keep_vars = []
            for var in self.dataset.data_vars:
                if np.all(dim in self.dataset[var].dims for dim in key.dims):
                    keep_vars.append(var)
            return self.dataset[keep_vars].sel(selector)
        return self.dataset.sel(**key)

This does not work for higher dimensions though as 2-dimensional boolean indexing is not supported.
It would as well get rid of all other DataArrarys which do not have shared dimensions with the indexer.
Probably, there is a better place to do this, that in the loc function. However, I think it would be great in case people need to filter their data by something else than the array dimensions.

Cheers,

Jendrik

@shoyer
Copy link
Member

shoyer commented Jan 21, 2019

Oh, OK, that makes sense now.

Yes, we are definitely interested in supporting multi-dimensional boolean indexing. See #1887 for thoughts on what this could look.

@jendrikjoe
Copy link
Contributor Author

Okay will have a look at #1887 first, before going forward with this request :)

@shoyer
Copy link
Member

shoyer commented Jan 22, 2019

I'm going to close this in favor of #1887. Not to reject this approach, but just to keep discussion in one place.

@shoyer shoyer closed this as completed Jan 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants