[Feature Request] Dataset.loc should accept comparisons with a contained DataArray as key #2689

jendrikjoe · 2019-01-18T11:06:50Z

Hey xarray maintainers,

first of all, I love this repository! Thanks for all your effort.
I have a feature request for the .loc function of the Dataset class.
I think it would be beneficial to allow comparisons with DataArrays as keys for the loc function.
Obviously, this only works for DataArrays which contain the same dimensions or more than the indexer.
But e.g. in the weather dataset used as the example in the docs, it would allow one to filter precipitation and temperature by longitude or latitude, or precipitation by temperature.
I started implementing it, cause I required it for another use case.
I would be happy to integrate it to xarray, if it is of interest.

Cheers,
Jendrik

shoyer · 2019-01-18T15:55:23Z

Hi Jendrik, thanks for your interest!

Could you please give an example of what exactly your proposed syntax would look like, with example inputs/outputs?

jendrikjoe · 2019-01-21T16:21:29Z

Hey Shoyer,

sure I am happy to propose one.
Given the input from the xarray example page (http://xarray.pydata.org/en/stable/examples/weather-data.html), I would imagine something like this:

xarr = xarr.loc[xarr['tmin'] > 5]

If the DataArray is one dimensional this is straight forward to achieve by altering the _LocIndexer in the following way:

class _LocIndexer(object):
    def __init__(self, dataset):
        self.dataset = dataset

    def __getitem__(self, key):
        if not utils.is_dict_like(key):
            selector = {dim: key[dim][key] for dim in key.dims}
            keep_vars = []
            for var in self.dataset.data_vars:
                if np.all(dim in self.dataset[var].dims for dim in key.dims):
                    keep_vars.append(var)
            return self.dataset[keep_vars].sel(selector)
        return self.dataset.sel(**key)

This does not work for higher dimensions though as 2-dimensional boolean indexing is not supported.
It would as well get rid of all other DataArrarys which do not have shared dimensions with the indexer.
Probably, there is a better place to do this, that in the loc function. However, I think it would be great in case people need to filter their data by something else than the array dimensions.

Cheers,

Jendrik

shoyer · 2019-01-21T17:47:17Z

Oh, OK, that makes sense now.

Yes, we are definitely interested in supporting multi-dimensional boolean indexing. See #1887 for thoughts on what this could look.

jendrikjoe · 2019-01-21T21:17:02Z

Okay will have a look at #1887 first, before going forward with this request :)

shoyer · 2019-01-22T18:12:00Z

I'm going to close this in favor of #1887. Not to reject this approach, but just to keep discussion in one place.

shoyer closed this as completed Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Dataset.loc should accept comparisons with a contained DataArray as key #2689

[Feature Request] Dataset.loc should accept comparisons with a contained DataArray as key #2689

jendrikjoe commented Jan 18, 2019

shoyer commented Jan 18, 2019

jendrikjoe commented Jan 21, 2019

shoyer commented Jan 21, 2019

jendrikjoe commented Jan 21, 2019

shoyer commented Jan 22, 2019 •

edited

Loading

[Feature Request] Dataset.loc should accept comparisons with a contained DataArray as key #2689

[Feature Request] Dataset.loc should accept comparisons with a contained DataArray as key #2689

Comments

jendrikjoe commented Jan 18, 2019

shoyer commented Jan 18, 2019

jendrikjoe commented Jan 21, 2019

shoyer commented Jan 21, 2019

jendrikjoe commented Jan 21, 2019

shoyer commented Jan 22, 2019 • edited Loading

shoyer commented Jan 22, 2019 •

edited

Loading