-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add overload for DataFrameGroupBy.groupby("size") return Series #739
Add overload for DataFrameGroupBy.groupby("size") return Series #739
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. You wrote:
although I saw the SeriesGroupBy.agg is a redeclaration rather than assignment, I am happy to switch if necessary.
Actually, this was an oversight from a previous PR 10 months ago. I think I'd rather use assignment in SeriesGroupBy
, so can you make that change there so we are consistent? You don't have to add a test for that.
tests/test_frame.py
Outdated
# GH 736 | ||
check(assert_type(df1.groupby(by="col1").aggregate("size"), pd.Series), pd.Series) | ||
check(assert_type(df1.groupby(by="col1").agg("size"), pd.Series), pd.Series) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move these tests to the function test_types_groupby_agg()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, what about the other aggregates/transforms in test_types_groupby
?
pandas-stubs/tests/test_frame.py
Lines 870 to 881 in 6fe90bb
df1: pd.DataFrame = df.groupby(by="col1").agg("sum") | |
df2: pd.DataFrame = df.groupby(level="ind").aggregate("sum") | |
df3: pd.DataFrame = df.groupby(by="col1", sort=False, as_index=True).transform( | |
lambda x: x.max() | |
) | |
df4: pd.DataFrame = df.groupby(by=["col1", "col2"]).count() | |
df5: pd.DataFrame = df.groupby(by=["col1", "col2"]).filter(lambda x: x["col1"] > 0) | |
df6: pd.DataFrame = df.groupby(by=["col1", "col2"]).nunique() | |
df7: pd.DataFrame = df.groupby(by="col1").apply(sum) | |
df8: pd.DataFrame = df.groupby("col1").transform("sum") | |
s1: pd.Series = df.set_index("col1")["col2"] | |
s2: pd.Series = s1.groupby("col1").transform("sum") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you could add extra tests in another PR, but let's leave this PR to focus on just the one issue. Thanks.
@@ -159,6 +159,10 @@ class DataFrameGroupBy(GroupBy, Generic[ByT]): | |||
def apply( # pyright: ignore[reportOverlappingOverload] | |||
self, func: Callable[[Iterable], float], *args, **kwargs | |||
) -> DataFrame: ... | |||
# error: overload 1 overlaps overload 2 as "size" is in AggFuncTypeFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to say # error: overload 1 overlaps overload 2 because of different return types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ljmc-github
Fairly straightforward, I had to add ignores for mypy and pyright since
"size"
is included inAggFuncTypeFrame
'sstr
.I made as few changes as possible leaving
agg = aggregate
which seems to work, although I saw theSeriesGroupBy.agg
is a redeclaration rather than assignment, I am happy to switch if necessary.pandas-stubs/pandas-stubs/core/groupby/generic.pyi
Lines 53 to 60 in 6fe90bb
assert_type()
to assert the type of any return value