Datasets more robust to non-string keys #2174

max-sixty · 2018-05-23T08:53:36Z

Closes Errors on pycharm completion #2172 & Formatting error in conjunction with pandas.DataFrame #2173
Tests added
Tests passed
Fully documented, including whats-new.rst for all changes and api.rst for new API

I don't think this is the most efficient way of doing this, though it does work. Any ideas for a more efficient implementation?

max-sixty · 2018-05-23T09:01:09Z

setup.cfg

@@ -8,6 +8,7 @@ testpaths=xarray/tests
 [flake8]
 max-line-length=79
 ignore=
+  W503


This is line break before operator, which has been changed in PEP8

shoyer · 2018-05-23T16:40:49Z

xarray/core/dataarray.py

+        try:
+            is_coord_key = any([
+                isinstance(key, basestring),
+                key in set(self.dims).union(self.coords)


I would hesitate to do this fix -- it means array[2] could pull out a coordinate if 2 is the name of a dimension. More generally, DataArray.__getitem__ is only meant as a convenient short-cut for getting coordinates, which we might even want to deprecate entirely. DataArray.coords.__getitem__ is the reliable way to get a coordinate out.

Instead, I think the better option would be to avoid making use of ** unpacking below, in the expression self.isel(**self._item_key_to_dict(key)). This would be a little more involved, but I think the right fix would be to make most of these functions accept a positional dictionaries instead of **kwargs, as suggested in #1231.

Yes, completely agree, thanks

max-sixty · 2018-05-24T08:45:59Z

I added an initial attempt at allowing for supplying a dict; on isel only at first

Let me know ppl's thoughts:

Arg name?
Combine or raise if supplied with kwargs?
Is there an encapsulation leak? Should we have a user API layer which is then translated to a more robust representation (i.e. kwargs to a dict), and those be separate methods? My initial inclination is that the current approach is good and managable

max-sixty · 2018-05-24T08:47:46Z

DataArray.__getitem__ is only meant as a convenient short-cut for getting coordinates, which we might even want to deprecate entirely

👍

max-sixty · 2018-05-25T06:25:18Z

@shoyer I see you're online - let me know if you have any thoughts re the changes. Am on a burst of open-source work atm

shoyer · 2018-05-25T06:42:28Z

Arg name?

I'd like a convention of matching names to make it clear in docstrings that these are the same thing, e.g.,

something_dict/**something, or
something/**something_kwargs

Probably the last is best since users will never have cause to type the longer argument name ending with _kwargs.

In practice, I suspect these first arguments are almost always going to be used positionally. We might even imagine making them positional only argument if PEP 570 is ever accepted. So I don't think it matters too much.

Combine or raise if supplied with kwargs?

Let's raise (like your current implementation) if both are provided. Combining could be error prone and seems unnecessarily complicated.

This is the same logic I implemented in the somewhat misleadingly named utils.combine_pos_and_kw_args() utility function (used by Dataset.reindex()).

Is there an encapsulation leak? Should we have a user API layer which is then translated to a more robust representation (i.e. kwargs to a dict), and those be separate methods? My initial inclination is that the current approach is good and managable

I agree -- I think the current version is manageable.

For internal use, we should always use the positional argument, but there's not much harm in leaving **kwargs around.

max-sixty · 2018-05-25T13:24:03Z

sel, isel & reindex are done

sel_points etc are deprecated so we can leave those

Any others on the 'first priority' list?

max-sixty · 2018-05-25T14:53:22Z

AppVeyor failure unrelated

shoyer · 2018-05-25T15:43:34Z

xarray/core/dataset.py

@@ -1404,12 +1406,16 @@ def isel(self, drop=False, **indexers):
        Dataset.sel
        DataArray.isel
        """
+
+        indexers = combine_pos_and_kw_args(indexers, indexers_kwargs, 'isel')
+        assert isinstance(drop, bool)


This should either be dropped for now, or raise TypeError. assert isn't appropriate for external APIs.

shoyer · 2018-05-25T15:45:09Z

xarray/core/dataarray.py

        """Conform this object onto a new set of indexes, filling in
        missing values with NaN.

        Parameters
        ----------
+        **indexers : dict


shouldn't start with **

shoyer · 2018-05-25T15:49:06Z

xarray/core/dataarray.py

@@ -18,7 +18,9 @@
 from .formatting import format_item
 from .options import OPTIONS
 from .pycompat import OrderedDict, basestring, iteritems, range, zip
-from .utils import decode_numpy_dict_values, ensure_us_time_resolution
+from .utils import (
+    combine_pos_and_kw_args, decode_numpy_dict_values,


While we're at it, maybe we should rename combine_pos_and_kw_args to something clearer, maybe either_dict_or_kwargs?

shoyer · 2018-05-25T15:50:31Z

xarray/core/dataarray.py

-            values will be filled in with NaN, and any mis-matched dimension
-            names will simply be ignored.
+        **indexers_kwargs : {dim: indexer, ...}
+            The keyword arguments form of ``indexers``


We should add something like: Only indexers or indexer_kwargs may be provided.

shoyer · 2018-05-25T15:59:50Z

Looking through Dataset/DataArray methods that support **kwargs, I think this may be the full list:

shoyer · 2018-05-25T16:28:52Z

But I think just supporting this for isel/sel/reindex for now would be enough to be valuable. Note that currently it's on Dataset.reindex() but not DataArray.reindex().

shoyer

This looks great to me!

shoyer · 2018-05-26T23:59:24Z

xarray/core/dataset.py

@@ -1444,6 +1452,15 @@ def sel(self, method=None, tolerance=None, drop=False, **indexers):

        Parameters
        ----------
+        ----------


shoyer · 2018-05-27T00:00:56Z

xarray/core/utils.py

@@ -183,7 +183,7 @@ def is_full_slice(value):
    return isinstance(value, slice) and value == slice(None)


-def combine_pos_and_kw_args(pos_kwargs, kw_kwargs, func_name):
+def either_dict_or_kwargs(pos_kwargs, kw_kwargs, func_name):


We don't need to test every use, but we should at least add unit test for this helper function -- I don't think we have any test coverage currently.

shoyer · 2018-05-27T20:48:40Z

thanks @maxim-lian !

max-sixty · 2018-05-28T01:43:57Z

thanks for working through the feedback, as ever, @shoyer !
It's like free coding classes

max-sixty added 2 commits May 23, 2018 04:49

ds more robust to non-str keys

b4c7c87

formatting

fc3f729

max-sixty commented May 23, 2018

View reviewed changes

time.dayofyear needs cover in dataarray getitem

e29273b

shoyer reviewed May 23, 2018

View reviewed changes

trial of indexer_dict

c15bcd0

shoyer added the needs work label May 25, 2018

max-sixty added 3 commits May 25, 2018 06:57

feedback from stephan

b765c3c

a few more methods

e441614

reindex added

4612ac0

max-sixty force-pushed the ds-number-keys branch from 8bca6fd to 4612ac0 Compare May 25, 2018 13:18

shoyer reviewed May 25, 2018

View reviewed changes

max-sixty added 7 commits May 26, 2018 00:46

rename to either_dict_or_kwargs

b301b17

remove assert check

3edfda6

docstring

2ad7425

more docstring

7052520

optional goes last

8f01942

last docstring

186d3e3

what's new

8c75d1f

max-sixty mentioned this pull request May 26, 2018

Allow all dims-as-kwargs methods to take a dict instead #2188

Closed

7 tasks

shoyer reviewed May 27, 2018

View reviewed changes

artefact

2656bea

test either_dict_or_kwargs

e2241df

shoyer merged commit 8470500 into pydata:master May 27, 2018

max-sixty deleted the ds-number-keys branch May 28, 2018 01:44

fujiisoup mentioned this pull request Aug 17, 2018

More support of non-string dimension names #2373

Merged

1 task

max-sixty mentioned this pull request Apr 23, 2020

Formatting error in conjunction with pandas.DataFrame #2173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets more robust to non-string keys #2174

Datasets more robust to non-string keys #2174

max-sixty commented May 23, 2018 •

edited

Loading

max-sixty May 23, 2018

shoyer May 23, 2018

max-sixty May 24, 2018

max-sixty commented May 24, 2018

max-sixty commented May 24, 2018

max-sixty commented May 25, 2018

shoyer commented May 25, 2018

max-sixty commented May 25, 2018

max-sixty commented May 25, 2018

shoyer May 25, 2018

shoyer May 25, 2018

shoyer May 25, 2018

shoyer May 25, 2018

shoyer commented May 25, 2018

shoyer commented May 25, 2018 •

edited

Loading

shoyer left a comment

shoyer May 26, 2018

shoyer May 27, 2018

shoyer commented May 27, 2018

max-sixty commented May 28, 2018

Datasets more robust to non-string keys #2174

Datasets more robust to non-string keys #2174

Conversation

max-sixty commented May 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented May 24, 2018

max-sixty commented May 24, 2018

max-sixty commented May 25, 2018

shoyer commented May 25, 2018

max-sixty commented May 25, 2018

max-sixty commented May 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented May 25, 2018

shoyer commented May 25, 2018 • edited Loading

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented May 27, 2018

max-sixty commented May 28, 2018

max-sixty commented May 23, 2018 •

edited

Loading

shoyer commented May 25, 2018 •

edited

Loading