Skip to content

Commit

Permalink
BUG: head and tail not dropping groups with nan
Browse files Browse the repository at this point in the history
  • Loading branch information
phofl committed Dec 29, 2021
1 parent 2f915b3 commit a51cd10
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -894,6 +894,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` (:issue:`43926`)
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not respecting right bound on centered datetime-like windows, if the index contain duplicates (:issue:`3944`)
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` when using a :class:`pandas.api.indexers.BaseIndexer` subclass that returned unequal start and end arrays would segfault instead of raising a ``ValueError`` (:issue:`44470`)
- Bug in :meth:`GroupBy.head` and :meth:`GroupBy.tail` no dropping groups with ``NaN`` when ``dropna=True`` (:issue:`45089`)
- Fixed bug in :meth:`GroupBy.__iter__` after selecting a subset of columns in a :class:`GroupBy` object, which returned all columns instead of the chosen subset (:issue:`#44821`)
- Bug in :meth:`Groupby.rolling` when non-monotonic data passed, fails to correctly raise ``ValueError`` (:issue:`43909`)
- Fixed bug where grouping by a :class:`Series` that has a categorical data type and length unequal to the axis of grouping raised ``ValueError`` (:issue:`44179`)
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -3580,6 +3580,9 @@ def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:
Series or DataFrame
Filtered _selected_obj.
"""
ids = self.grouper.group_info[0]
mask = mask & (ids != -1)

if self.axis == 0:
return self._selected_obj[mask]
else:
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/groupby/test_nth.py
Original file line number Diff line number Diff line change
Expand Up @@ -809,3 +809,29 @@ def test_nth_slices_with_column_axis(
}[method](start, stop)
expected = DataFrame([expected_values], columns=expected_columns)
tm.assert_frame_equal(result, expected)


def test_head_tail_dropna_true():
# GH#45089
df = DataFrame(
[["a", "z"], ["b", np.nan], ["c", np.nan], ["c", np.nan]], columns=["X", "Y"]
)
expected = DataFrame([["a", "z"]], columns=["X", "Y"])

result = df.groupby(["X", "Y"]).head(n=1)
tm.assert_frame_equal(result, expected)

result = df.groupby(["X", "Y"]).tail(n=1)
tm.assert_frame_equal(result, expected)


def test_head_tail_dropna_false():
# GH#45089
df = DataFrame([["a", "z"], ["b", np.nan], ["c", np.nan]], columns=["X", "Y"])
expected = DataFrame([["a", "z"], ["b", np.nan], ["c", np.nan]], columns=["X", "Y"])

result = df.groupby(["X", "Y"], dropna=False).head(n=1)
tm.assert_frame_equal(result, expected)

result = df.groupby(["X", "Y"], dropna=False).tail(n=1)
tm.assert_frame_equal(result, expected)

0 comments on commit a51cd10

Please sign in to comment.