Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overload for DataFrameGroupBy.groupby("size") return Series #739

Merged

Conversation

ljmc-github
Copy link
Contributor

Fairly straightforward, I had to add ignores for mypy and pyright since "size" is included in AggFuncTypeFrame's str.

I made as few changes as possible leaving agg = aggregate which seems to work, although I saw the SeriesGroupBy.agg is a redeclaration rather than assignment, I am happy to switch if necessary.

@overload
def aggregate(self, func: list[AggFuncTypeBase], *args, **kwargs) -> DataFrame: ...
@overload
def aggregate(self, func: AggFuncTypeBase, *args, **kwargs) -> Series: ...
@overload
def agg(self, func: list[AggFuncTypeBase], *args, **kwargs) -> DataFrame: ...
@overload
def agg(self, func: AggFuncTypeBase, *args, **kwargs) -> Series: ...

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. You wrote:

although I saw the SeriesGroupBy.agg is a redeclaration rather than assignment, I am happy to switch if necessary.

Actually, this was an oversight from a previous PR 10 months ago. I think I'd rather use assignment in SeriesGroupBy, so can you make that change there so we are consistent? You don't have to add a test for that.

# GH 736
check(assert_type(df1.groupby(by="col1").aggregate("size"), pd.Series), pd.Series)
check(assert_type(df1.groupby(by="col1").agg("size"), pd.Series), pd.Series)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these tests to the function test_types_groupby_agg() ?

Copy link
Contributor Author

@ljmc-github ljmc-github Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, what about the other aggregates/transforms in test_types_groupby ?

df1: pd.DataFrame = df.groupby(by="col1").agg("sum")
df2: pd.DataFrame = df.groupby(level="ind").aggregate("sum")
df3: pd.DataFrame = df.groupby(by="col1", sort=False, as_index=True).transform(
lambda x: x.max()
)
df4: pd.DataFrame = df.groupby(by=["col1", "col2"]).count()
df5: pd.DataFrame = df.groupby(by=["col1", "col2"]).filter(lambda x: x["col1"] > 0)
df6: pd.DataFrame = df.groupby(by=["col1", "col2"]).nunique()
df7: pd.DataFrame = df.groupby(by="col1").apply(sum)
df8: pd.DataFrame = df.groupby("col1").transform("sum")
s1: pd.Series = df.set_index("col1")["col2"]
s2: pd.Series = s1.groupby("col1").transform("sum")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, you could add extra tests in another PR, but let's leave this PR to focus on just the one issue. Thanks.

@@ -159,6 +159,10 @@ class DataFrameGroupBy(GroupBy, Generic[ByT]):
def apply( # pyright: ignore[reportOverlappingOverload]
self, func: Callable[[Iterable], float], *args, **kwargs
) -> DataFrame: ...
# error: overload 1 overlaps overload 2 as "size" is in AggFuncTypeFrame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to say # error: overload 1 overlaps overload 2 because of different return types

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ljmc-github

@Dr-Irv Dr-Irv merged commit d23c4bb into pandas-dev:main Jul 3, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrameGroupBy.aggregate can return Series with argument "size", typed as only returning DataFrame
2 participants