BUG: Applying function on column of Groupby object with as_index=False does not select column #5764

jorisvandenbossche · 2013-12-23T13:53:32Z

>>> df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
>>> df
   A  B
0  1  2
1  1  4
2  5  6
[3 rows x 2 columns]

Selecting a column of the GroupBy object, still returns all columns:

>>> g = df.groupby('A', as_index=False)['B']
>>> g.get_group(1)
   A  B
0  1  2
1  1  4
[2 rows x 2 columns]
>>> g = df.groupby('A', as_index=False)
>>> g.get_group(1)
   A  B
0  1  2
1  1  4
[2 rows x 2 columns]
>>> g.get_group(1)['B']
0    2
1    4
Name: B, dtype: int64

So an applied function with apply is applied on all columns:

>>> df.groupby('A', as_index=False)['B'].apply(lambda x: x.cumsum())
   A  B
0  1  2
1  2  6
2  5  6
[3 rows x 2 columns]

With as_index=True it works as expected:

>>> g = df.groupby('A')
>>> g.get_group(1)
   A  B
0  1  2
1  1  4
[2 rows x 2 columns]

>>> g = df.groupby('A')['B']
>>> g.get_group(1)
0    2
1    4
Name: B, dtype: int64

>>> df.groupby('A')['B'].apply(lambda x: x.cumsum())
0    2
1    6
2    6
dtype: int64

A more elaborate example where this turned out:

>>> s="""L1  L2  L3
... X   1   200
... X   2   100
... Z   1   15
... X   3   200
... Z   2   10
... Y   1   1
... Z   3   20
... Y   2   10
... Y   3   100"""
>>> 
>>> df = pd.read_csv(StringIO(s), sep="\s+")
>>> df.groupby("L1")["L3"].apply(lambda x: x.order().cumsum()/x.sum())
L1   
X   1    0.200000
    0    0.600000
    3    1.000000
Y   5    0.009009
    7    0.099099
    8    1.000000
Z   4    0.222222
    2    0.555556
    6    1.000000
dtype: float64

But if I don't want the X, Y, Z in the index:

>>> df.groupby("L1", as_index=False)["L3"].apply(lambda x: x.order().cumsum()/x.sum())

return an error as x is a dataframe.

The text was updated successfully, but these errors were encountered:

jreback · 2014-03-22T21:21:57Z

On current master

this looks ok @jorisvandenbossche

after @hayd and @TomAugspurger recent changes

yes?

In [9]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [10]: g = df.groupby('A', as_index=False)['B']

In [11]: g.get_group(1)
Out[11]: 
0    2
1    4
Name: B, dtype: int64

In [12]: g = df.groupby('A', as_index=False)

In [13]: g.get_group(1)
Out[13]: 
   A  B
0  1  2
1  1  4

[2 rows x 2 columns]

In [14]: g.get_group(1)['B']
Out[14]: 
0    2
1    4
Name: B, dtype: int64

In [15]: df.groupby('A', as_index=False)['B'].apply(lambda x: x.cumsum())
Out[15]: 
0    2
1    6
2    6
dtype: int64

hayd · 2014-03-22T21:32:32Z

We should probably add some tests before closing. (I'm sure this came up recently on SO too.)

jreback · 2014-03-22T21:46:14Z

yep

jreback · 2014-05-01T14:23:05Z

This is now better in current master after #7000, [7] is going to be addressed in #5755

In [3]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])

In [4]: df
Out[4]: 
   A  B
0  1  2
1  1  4
2  5  6

[3 rows x 2 columns]

In [5]: g = df.groupby('A', as_index=False)['B']

In [6]: g.get_group(1)
Out[6]: 
0    2
1    4
Name: B, dtype: int64

In [7]: g = df.groupby('A', as_index=False)

In [8]: g.get_group(1)
Out[8]: 
   A  B
0  1  2
1  1  4

[2 rows x 2 columns]

In [9]: g.get_group(1)['B']
Out[9]: 
0    2
1    4
Name: B, dtype: int64

In [10]: df.groupby('A', as_index=False)['B'].apply(lambda x: x.cumsum())
Out[10]: 
0    2
1    6
2    6
dtype: int64

jbrockmendel · 2019-12-12T00:30:47Z

@jorisvandenbossche the behavior on master for the get_group looks right to me now, but the cumsum looks sketchy. can you confirm?

jreback added API Design labels Mar 22, 2014

jreback added this to the 0.14.0 milestone Mar 22, 2014

jreback modified the milestones: 0.14.1, 0.14.0 May 1, 2014

hayd mentioned this issue May 1, 2014

Consistency with groupby as_index #5755

Closed

8 tasks

jreback modified the milestones: 0.15.0, 0.14.1 May 1, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

chris-b1 mentioned this issue Feb 27, 2018

Cythonized GroupBy pct_change #19919

Merged

4 tasks

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Bug labels Sep 29, 2019

jbrockmendel added the Apply Apply, Aggregate, Transform, Map label Oct 16, 2019

mroeschke mentioned this issue Dec 30, 2019

TST: Regression testing for fixed issues #30554

Merged

9 tasks

simonjayhawkins modified the milestones: Someday, 1.0 Dec 30, 2019

TomAugspurger closed this as completed in #30554 Dec 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Applying function on column of Groupby object with as_index=False does not select column #5764

BUG: Applying function on column of Groupby object with as_index=False does not select column #5764

jorisvandenbossche commented Dec 23, 2013

jreback commented Mar 22, 2014

hayd commented Mar 22, 2014

jreback commented Mar 22, 2014

jreback commented May 1, 2014

jbrockmendel commented Dec 12, 2019

BUG: Applying function on column of Groupby object with as_index=False does not select column #5764

BUG: Applying function on column of Groupby object with as_index=False does not select column #5764

Comments

jorisvandenbossche commented Dec 23, 2013

jreback commented Mar 22, 2014

hayd commented Mar 22, 2014

jreback commented Mar 22, 2014

jreback commented May 1, 2014

jbrockmendel commented Dec 12, 2019