BUG: loc dropping levels when df has only one row #38150

phofl · 2020-11-29T16:01:18Z

closes BUG: inconsisten multi-level indexing when levels are dropped #10521
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Originally this case was dispatched to xs, which dispatched to _get_loc_level, which raised if a slice is not slice(None, None) and went back to our original function. If the DataFrame contains only one row, _get_loc_level does not raise and hence we drop levels.

I added len(self.obj) for performance reasons becasue I am not sure if the any check for slices is faster than supressing the IndexingError in general.

…/pandas into 10521 � Conflicts: � doc/source/whatsnew/v1.2.0.rst

pandas/tests/indexing/multiindex/test_loc.py

jbrockmendel · 2020-11-30T18:46:49Z

If the DataFrame contains only one row, _get_loc_level does not raise and hence we drop levels.

Should the _get_loc_level behavior correct?

phofl · 2020-11-30T22:07:06Z

From reading the code I think this is the intended behavior, but not quite sure.

jreback · 2020-12-02T02:23:05Z

pandas/core/indexing.py

@@ -838,8 +838,10 @@ def _getitem_nested_tuple(self, tup: Tuple):
            if self.name != "loc":
                # This should never be reached, but lets be explicit about it
                raise ValueError("Too many indices")
-            with suppress(IndexingError):
-                return self._handle_lowerdim_multi_index_axis0(tup)
+            if len(self.obj) > 1 or not any(isinstance(x, slice) for x in tup):


hmm can we just handle this properly in _handle_lower_multi_index_axis0 ?

We could raise an indexingError in there. We can not call xs.

axis = self.axis or 0 return self._getitem_axis(tup, axis=axis)

Would this be preferably to the if condition outside?

why is the len(self.obj) needed here? that smells a bit

Performance reasons, not necessary from a technical standpoint. Will remove it

ok pls add a comment then

jbrockmendel · 2020-12-03T21:34:21Z

It looks like jreback and I had different suggestions, but the theme to both is "can this be handled any earlier?" have you ruled out the lower-level options?

phofl · 2020-12-03T22:13:01Z

Sorry tough week at work. We can not call xs, because this would return wrong results. Do you mean mit earlier before reaching this point? Because I understood the suggestions as doing this deeper down in the calling stack

jbrockmendel · 2020-12-07T18:51:51Z

Do you mean mit earlier before reaching this point? Because I understood the suggestions as doing this deeper down in the calling stack

Deeper in the call stack is ideal, yes.

jbrockmendel · 2020-12-22T21:45:01Z

@phofl are you of the opinion this is the best available approach? you're pretty well versed in this part of the code and im inclined to trust your opinion.

phofl · 2020-12-22T21:58:12Z

Forgot that one unfortunately, had 3 pretty busy weeks at work, but will be better now.

I am not 100% sure, but I tried this out a bit locally. We have to do this if condition somewhere before dispatching to xs, _get_label itself is not really an option, would be too ugly. Doing it in _handle_lowerdim_multi_index_axis0 is possible, but would be quite ugly, because the if condition is the same case as raising TypeError or InvalidIndexError. This would be quite ugly, also I think this is just unnecessary convolution for the other case where _handle_lowerdim_multi_index_axis0 is called.

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

jreback · 2020-12-23T16:14:59Z

pandas/core/indexing.py

@@ -838,8 +838,10 @@ def _getitem_nested_tuple(self, tup: Tuple):
            if self.name != "loc":
                # This should never be reached, but lets be explicit about it
                raise ValueError("Too many indices")
-            with suppress(IndexingError):
-                return self._handle_lowerdim_multi_index_axis0(tup)
+            if len(self.obj) > 1 or not any(isinstance(x, slice) for x in tup):


why is the len(self.obj) needed here? that smells a bit

phofl · 2020-12-24T10:36:12Z

Was a bit hasty. The problem only occurs for Objects with only one row and slices in indexer. Objects with more than one row have to go in there, no matter if the indexer has slices.

jreback · 2020-12-24T20:34:39Z

Was a bit hasty. The problem only occurs for Objects with only one row and slices in indexer. Objects with more than one row have to go in there, no matter if the indexer has slices.

this is very strange then, these must be hitting a different path that other types, no? maybe something different in the indexing engine itself? its pretty odd to special case like this.

jreback

comment and can you merge master

jreback · 2020-12-29T17:18:46Z

pandas/core/indexing.py

@@ -838,8 +838,10 @@ def _getitem_nested_tuple(self, tup: Tuple):
            if self.name != "loc":
                # This should never be reached, but lets be explicit about it
                raise ValueError("Too many indices")
-            with suppress(IndexingError):
-                return self._handle_lowerdim_multi_index_axis0(tup)
+            if len(self.obj) > 1 or not any(isinstance(x, slice) for x in tup):


ok pls add a comment then

phofl · 2020-12-29T20:36:53Z

Looked into this again, the issue is with Series, the dimension of the MultiIndex should be reduced there, #6022

jreback · 2020-12-29T23:33:49Z

cc @jbrockmendel if any commetns.

jbrockmendel · 2020-12-30T03:14:05Z

pandas/core/indexing.py

@@ -842,8 +842,14 @@ def _getitem_nested_tuple(self, tup: Tuple):
            if self.name != "loc":
                # This should never be reached, but lets be explicit about it
                raise ValueError("Too many indices")
-            with suppress(IndexingError):
-                return self._handle_lowerdim_multi_index_axis0(tup)
+            if isinstance(self.obj, ABCSeries) or not any(


self.ndim == 1 is faster than this isinstance check (41 ns vs 297 ns, so not a huge deal either way)

Is better to read too :)

jbrockmendel · 2020-12-30T03:20:04Z

pandas/tests/indexing/multiindex/test_loc.py

+    ser = Series([0], index=mi)
+    result = ser.loc["x", :, "z"]
+    expected = Series([0], index=Index(["y"], name="b"))
+    tm.assert_series_equal(result, expected)


big picture, why isnt ser_expected = frame_expected["d"]?

Series drops levels of MultiIndex (for example #6022), while DataFrame should keep them. The current behavior is, that DataFrame keeps them all the times except when it has one row.

is #38150 (comment) saying that you plant to change the Series behavior to match DataFrame?

No, the comment was meant to clarify why we have to check for ndim==1

jreback · 2020-12-30T13:50:16Z

thanks @phofl

jbrockmendel · 2021-06-30T02:05:57Z

pandas/core/indexing.py

@@ -842,8 +842,12 @@ def _getitem_nested_tuple(self, tup: Tuple):
            if self.name != "loc":
                # This should never be reached, but lets be explicit about it
                raise ValueError("Too many indices")
-            with suppress(IndexingError):
-                return self._handle_lowerdim_multi_index_axis0(tup)
+            if self.ndim == 1 or not any(isinstance(x, slice) for x in tup):


@phofl the more i look at this the weirder it seems to be treating Series/DataFrame differently. Are you clear on if there's a reason for this vs just historical coincidence?

@jbrockmendel

sorry for the slow reply. I am pretty sure that this is for historical reasons, but would have to take a closer look.

I am currently on vacation until mid July. Can get back to you afterwards, will also have more time for pandas then again.

no worries, enjoy your vacation

phofl added 4 commits November 29, 2020 15:20

Fix dropping of levels in multiindex

bef2819

Add test and whatsnew

7408464

Add note

da86c93

Merge branches 'master' and 'master' of https://github.com/pandas-dev…

4d9272d

…/pandas into 10521 � Conflicts: � doc/source/whatsnew/v1.2.0.rst

phofl added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Nov 29, 2020

jbrockmendel reviewed Nov 30, 2020

View reviewed changes

pandas/tests/indexing/multiindex/test_loc.py Outdated Show resolved Hide resolved

Simplify test

3f54db9

jreback requested changes Dec 2, 2020

View reviewed changes

phofl added 2 commits December 22, 2020 22:59

Merge branch 'master' of https://github.com/pandas-dev/pandas into 10521

d49e512

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Move whatsnew

7a1c22a

jreback requested changes Dec 23, 2020

View reviewed changes

Remove len

8517274

jreback added this to the 1.3 milestone Dec 24, 2020

jreback approved these changes Dec 24, 2020

View reviewed changes

Revert

ee106a9

jreback requested changes Dec 29, 2020

View reviewed changes

phofl added 3 commits December 29, 2020 21:33

Add test and fix bug

693038f

Merge branch 'master' of https://github.com/pandas-dev/pandas into 10521

e87865c

Add comment

68987cd

jreback approved these changes Dec 29, 2020

View reviewed changes

jbrockmendel reviewed Dec 30, 2020

View reviewed changes

Change validation

04bfc9b

jreback merged commit bbd0f66 into pandas-dev:master Dec 30, 2020

phofl deleted the 10521 branch December 30, 2020 13:51

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

BUG: loc dropping levels when df has only one row (pandas-dev#38150)

d6d8483

jbrockmendel reviewed Jun 30, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: loc dropping levels when df has only one row #38150

BUG: loc dropping levels when df has only one row #38150

phofl commented Nov 29, 2020

jbrockmendel commented Nov 30, 2020

phofl commented Nov 30, 2020

jreback Dec 2, 2020

phofl Dec 3, 2020

jreback Dec 23, 2020

phofl Dec 23, 2020

jreback Dec 29, 2020

phofl Dec 29, 2020

jbrockmendel commented Dec 3, 2020

phofl commented Dec 3, 2020

jbrockmendel commented Dec 7, 2020

jbrockmendel commented Dec 22, 2020

phofl commented Dec 22, 2020

jreback Dec 23, 2020

phofl commented Dec 24, 2020

jreback commented Dec 24, 2020

jreback left a comment

jreback Dec 29, 2020

phofl commented Dec 29, 2020

jreback commented Dec 29, 2020

jbrockmendel Dec 30, 2020

phofl Dec 30, 2020

jbrockmendel Dec 30, 2020

phofl Dec 30, 2020

jbrockmendel Dec 30, 2020

phofl Dec 30, 2020

jreback commented Dec 30, 2020

jbrockmendel Jun 30, 2021

phofl Jul 4, 2021

jbrockmendel Jul 4, 2021

BUG: loc dropping levels when df has only one row #38150

BUG: loc dropping levels when df has only one row #38150

Conversation

phofl commented Nov 29, 2020

jbrockmendel commented Nov 30, 2020

phofl commented Nov 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 3, 2020

phofl commented Dec 3, 2020

jbrockmendel commented Dec 7, 2020

jbrockmendel commented Dec 22, 2020

phofl commented Dec 22, 2020

Choose a reason for hiding this comment

phofl commented Dec 24, 2020

jreback commented Dec 24, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Dec 29, 2020

jreback commented Dec 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment