Add overload for DataFrameGroupBy.groupby("size") return Series #739

ljmc-github · 2023-07-02T16:35:10Z

Fairly straightforward, I had to add ignores for mypy and pyright since "size" is included in AggFuncTypeFrame's str.

I made as few changes as possible leaving agg = aggregate which seems to work, although I saw the SeriesGroupBy.agg is a redeclaration rather than assignment, I am happy to switch if necessary.

pandas-stubs/pandas-stubs/core/groupby/generic.pyi

Lines 53 to 60 in 6fe90bb

    
           @overload 
        
           def aggregate(self, func: list[AggFuncTypeBase], *args, **kwargs) -> DataFrame: ... 
        
           @overload 
        
           def aggregate(self, func: AggFuncTypeBase, *args, **kwargs) -> Series: ... 
        
           @overload 
        
           def agg(self, func: list[AggFuncTypeBase], *args, **kwargs) -> DataFrame: ... 
        
           @overload 
        
           def agg(self, func: AggFuncTypeBase, *args, **kwargs) -> Series: ...

Closes DataFrameGroupBy.aggregate can return Series with argument "size", typed as only returning DataFrame #736
Tests added: Please use assert_type() to assert the type of any return value

Dr-Irv

Thanks for the PR. You wrote:

although I saw the SeriesGroupBy.agg is a redeclaration rather than assignment, I am happy to switch if necessary.

Actually, this was an oversight from a previous PR 10 months ago. I think I'd rather use assignment in SeriesGroupBy, so can you make that change there so we are consistent? You don't have to add a test for that.

Dr-Irv · 2023-07-03T12:00:51Z

tests/test_frame.py

+    # GH 736
+    check(assert_type(df1.groupby(by="col1").aggregate("size"), pd.Series), pd.Series)
+    check(assert_type(df1.groupby(by="col1").agg("size"), pd.Series), pd.Series)
+


Can you move these tests to the function test_types_groupby_agg() ?

Done, what about the other aggregates/transforms in test_types_groupby ?

pandas-stubs/tests/test_frame.py

Lines 870 to 881 in 6fe90bb

df1: pd.DataFrame = df.groupby(by="col1").agg("sum")

df2: pd.DataFrame = df.groupby(level="ind").aggregate("sum")

df3: pd.DataFrame = df.groupby(by="col1", sort=False, as_index=True).transform(

lambda x: x.max()

)

df4: pd.DataFrame = df.groupby(by=["col1", "col2"]).count()

df5: pd.DataFrame = df.groupby(by=["col1", "col2"]).filter(lambda x: x["col1"] > 0)

df6: pd.DataFrame = df.groupby(by=["col1", "col2"]).nunique()

df7: pd.DataFrame = df.groupby(by="col1").apply(sum)

df8: pd.DataFrame = df.groupby("col1").transform("sum")

s1: pd.Series = df.set_index("col1")["col2"]

s2: pd.Series = s1.groupby("col1").transform("sum")

If you want, you could add extra tests in another PR, but let's leave this PR to focus on just the one issue. Thanks.

Dr-Irv · 2023-07-03T12:01:26Z

pandas-stubs/core/groupby/generic.pyi

@@ -159,6 +159,10 @@ class DataFrameGroupBy(GroupBy, Generic[ByT]):
    def apply(  # pyright: ignore[reportOverlappingOverload]
        self, func: Callable[[Iterable], float], *args, **kwargs
    ) -> DataFrame: ...
+    # error: overload 1 overlaps overload 2 as "size" is in AggFuncTypeFrame


Better to say # error: overload 1 overlaps overload 2 because of different return types

Dr-Irv

thanks @ljmc-github

Add overload for DataFrameGroupBy.groupby("size") return Series

cc18bc8

Dr-Irv requested changes Jul 3, 2023

View reviewed changes

ljmc-github added 3 commits July 3, 2023 17:11

Switch SeriesGroupBy.agg to assignment

73764b1

Move tests to test_types_groupby_agg

b3144d9

Change error comment to different return types

bc05174

Dr-Irv approved these changes Jul 3, 2023

View reviewed changes

Dr-Irv merged commit d23c4bb into pandas-dev:main Jul 3, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add overload for DataFrameGroupBy.groupby("size") return Series #739

Add overload for DataFrameGroupBy.groupby("size") return Series #739

ljmc-github commented Jul 2, 2023

Dr-Irv left a comment

Dr-Irv Jul 3, 2023

ljmc-github Jul 3, 2023 •

edited

Loading

Dr-Irv Jul 3, 2023

Dr-Irv Jul 3, 2023

Dr-Irv left a comment

	@overload
	def aggregate(self, func: list[AggFuncTypeBase], args, *kwargs) -> DataFrame: ...
	@overload
	def aggregate(self, func: AggFuncTypeBase, args, *kwargs) -> Series: ...
	@overload
	def agg(self, func: list[AggFuncTypeBase], args, *kwargs) -> DataFrame: ...
	@overload
	def agg(self, func: AggFuncTypeBase, args, *kwargs) -> Series: ...

	df1: pd.DataFrame = df.groupby(by="col1").agg("sum")
	df2: pd.DataFrame = df.groupby(level="ind").aggregate("sum")
	df3: pd.DataFrame = df.groupby(by="col1", sort=False, as_index=True).transform(
	lambda x: x.max()
	)
	df4: pd.DataFrame = df.groupby(by=["col1", "col2"]).count()
	df5: pd.DataFrame = df.groupby(by=["col1", "col2"]).filter(lambda x: x["col1"] > 0)
	df6: pd.DataFrame = df.groupby(by=["col1", "col2"]).nunique()
	df7: pd.DataFrame = df.groupby(by="col1").apply(sum)
	df8: pd.DataFrame = df.groupby("col1").transform("sum")
	s1: pd.Series = df.set_index("col1")["col2"]
	s2: pd.Series = s1.groupby("col1").transform("sum")

Add overload for DataFrameGroupBy.groupby("size") return Series #739

Add overload for DataFrameGroupBy.groupby("size") return Series #739

Conversation

ljmc-github commented Jul 2, 2023

Dr-Irv left a comment

Choose a reason for hiding this comment

Dr-Irv Jul 3, 2023

Choose a reason for hiding this comment

ljmc-github Jul 3, 2023 • edited Loading

Choose a reason for hiding this comment

Dr-Irv Jul 3, 2023

Choose a reason for hiding this comment

Dr-Irv Jul 3, 2023

Choose a reason for hiding this comment

Dr-Irv left a comment

Choose a reason for hiding this comment

ljmc-github Jul 3, 2023 •

edited

Loading