Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: loc dropping levels when df has only one row #38150

Merged
merged 13 commits into from
Dec 30, 2020
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ Indexing
- Bug in :meth:`CategoricalIndex.get_indexer` failing to raise ``InvalidIndexError`` when non-unique (:issue:`38372`)
- Bug in inserting many new columns into a :class:`DataFrame` causing incorrect subsequent indexing behavior (:issue:`38380`)
- Bug in :meth:`DataFrame.iloc.__setitem__` and :meth:`DataFrame.loc.__setitem__` with mixed dtypes when setting with a dictionary value (:issue:`38335`)
-
- Bug in :meth:`DataFrame.loc` dropping levels of :class:`MultiIndex` when :class:`DataFrame` used as input has only one row (:issue:`10521`)
-

Missing
Expand Down
10 changes: 8 additions & 2 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -842,8 +842,14 @@ def _getitem_nested_tuple(self, tup: Tuple):
if self.name != "loc":
# This should never be reached, but lets be explicit about it
raise ValueError("Too many indices")
with suppress(IndexingError):
return self._handle_lowerdim_multi_index_axis0(tup)
if isinstance(self.obj, ABCSeries) or not any(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.ndim == 1 is faster than this isinstance check (41 ns vs 297 ns, so not a huge deal either way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is better to read too :)

isinstance(x, slice) for x in tup
):
# GH#10521 Series should reduce MultiIndex dimensions instead of
# DataFrame, IndexingError is not raised when slice(None,None,None)
# with one row.
with suppress(IndexingError):
return self._handle_lowerdim_multi_index_axis0(tup)

# this is a series with a multi-index specified a tuple of
# selectors
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/indexing/multiindex/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -695,3 +695,17 @@ def test_loc_getitem_index_differently_ordered_slice_none():
columns=["a", "b"],
)
tm.assert_frame_equal(result, expected)


def test_loc_getitem_drops_levels_for_one_row_dataframe():
# GH#10521
mi = MultiIndex.from_arrays([["x"], ["y"], ["z"]], names=["a", "b", "c"])
df = DataFrame({"d": [0]}, index=mi)
expected = df.copy()
result = df.loc["x", :, "z"]
tm.assert_frame_equal(result, expected)

ser = Series([0], index=mi)
result = ser.loc["x", :, "z"]
expected = Series([0], index=Index(["y"], name="b"))
tm.assert_series_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big picture, why isnt ser_expected = frame_expected["d"]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series drops levels of MultiIndex (for example #6022), while DataFrame should keep them. The current behavior is, that DataFrame keeps them all the times except when it has one row.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is #38150 (comment) saying that you plant to change the Series behavior to match DataFrame?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the comment was meant to clarify why we have to check for ndim==1