-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: loc dropping levels when df has only one row #38150
Conversation
…/pandas into 10521 � Conflicts: � doc/source/whatsnew/v1.2.0.rst
Should the _get_loc_level behavior correct? |
From reading the code I think this is the intended behavior, but not quite sure. |
pandas/core/indexing.py
Outdated
@@ -838,8 +838,10 @@ def _getitem_nested_tuple(self, tup: Tuple): | |||
if self.name != "loc": | |||
# This should never be reached, but lets be explicit about it | |||
raise ValueError("Too many indices") | |||
with suppress(IndexingError): | |||
return self._handle_lowerdim_multi_index_axis0(tup) | |||
if len(self.obj) > 1 or not any(isinstance(x, slice) for x in tup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm can we just handle this properly in _handle_lower_multi_index_axis0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could raise an indexingError in there. We can not call xs.
axis = self.axis or 0
return self._getitem_axis(tup, axis=axis)
Would this be preferably to the if condition outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the len(self.obj)
needed here? that smells a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance reasons, not necessary from a technical standpoint. Will remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok pls add a comment then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
It looks like jreback and I had different suggestions, but the theme to both is "can this be handled any earlier?" have you ruled out the lower-level options? |
Sorry tough week at work. We can not call xs, because this would return wrong results. Do you mean mit earlier before reaching this point? Because I understood the suggestions as doing this deeper down in the calling stack |
Deeper in the call stack is ideal, yes. |
@phofl are you of the opinion this is the best available approach? you're pretty well versed in this part of the code and im inclined to trust your opinion. |
Forgot that one unfortunately, had 3 pretty busy weeks at work, but will be better now. I am not 100% sure, but I tried this out a bit locally. We have to do this if condition somewhere before dispatching to xs, |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
pandas/core/indexing.py
Outdated
@@ -838,8 +838,10 @@ def _getitem_nested_tuple(self, tup: Tuple): | |||
if self.name != "loc": | |||
# This should never be reached, but lets be explicit about it | |||
raise ValueError("Too many indices") | |||
with suppress(IndexingError): | |||
return self._handle_lowerdim_multi_index_axis0(tup) | |||
if len(self.obj) > 1 or not any(isinstance(x, slice) for x in tup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the len(self.obj)
needed here? that smells a bit
Was a bit hasty. The problem only occurs for Objects with only one row and slices in indexer. Objects with more than one row have to go in there, no matter if the indexer has slices. |
this is very strange then, these must be hitting a different path that other types, no? maybe something different in the indexing engine itself? its pretty odd to special case like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment and can you merge master
pandas/core/indexing.py
Outdated
@@ -838,8 +838,10 @@ def _getitem_nested_tuple(self, tup: Tuple): | |||
if self.name != "loc": | |||
# This should never be reached, but lets be explicit about it | |||
raise ValueError("Too many indices") | |||
with suppress(IndexingError): | |||
return self._handle_lowerdim_multi_index_axis0(tup) | |||
if len(self.obj) > 1 or not any(isinstance(x, slice) for x in tup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok pls add a comment then
Looked into this again, the issue is with Series, the dimension of the MultiIndex should be reduced there, #6022 |
cc @jbrockmendel if any commetns. |
pandas/core/indexing.py
Outdated
@@ -842,8 +842,14 @@ def _getitem_nested_tuple(self, tup: Tuple): | |||
if self.name != "loc": | |||
# This should never be reached, but lets be explicit about it | |||
raise ValueError("Too many indices") | |||
with suppress(IndexingError): | |||
return self._handle_lowerdim_multi_index_axis0(tup) | |||
if isinstance(self.obj, ABCSeries) or not any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.ndim == 1
is faster than this isinstance check (41 ns vs 297 ns, so not a huge deal either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is better to read too :)
ser = Series([0], index=mi) | ||
result = ser.loc["x", :, "z"] | ||
expected = Series([0], index=Index(["y"], name="b")) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big picture, why isnt ser_expected = frame_expected["d"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series drops levels of MultiIndex (for example #6022), while DataFrame should keep them. The current behavior is, that DataFrame keeps them all the times except when it has one row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is #38150 (comment) saying that you plant to change the Series behavior to match DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the comment was meant to clarify why we have to check for ndim==1
thanks @phofl |
@@ -842,8 +842,12 @@ def _getitem_nested_tuple(self, tup: Tuple): | |||
if self.name != "loc": | |||
# This should never be reached, but lets be explicit about it | |||
raise ValueError("Too many indices") | |||
with suppress(IndexingError): | |||
return self._handle_lowerdim_multi_index_axis0(tup) | |||
if self.ndim == 1 or not any(isinstance(x, slice) for x in tup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phofl the more i look at this the weirder it seems to be treating Series/DataFrame differently. Are you clear on if there's a reason for this vs just historical coincidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the slow reply. I am pretty sure that this is for historical reasons, but would have to take a closer look.
I am currently on vacation until mid July. Can get back to you afterwards, will also have more time for pandas then again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries, enjoy your vacation
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Originally this case was dispatched to xs, which dispatched to
_get_loc_level
, which raised if a slice is notslice(None, None)
and went back to our original function. If theDataFrame
contains only one row,_get_loc_level
does not raise and hence we drop levels.I added len(self.obj) for performance reasons becasue I am not sure if the any check for slices is faster than supressing the IndexingError in general.