BUG: groupby().tranforms return ValueError #40102

yllgl · 2021-02-27T08:25:49Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Here's the code:

In [9]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
df = pd.DataFrame({'A': s,'F': 'foo'})
df.loc[1,'F']=np.nan
df

Out[9]:
	A	F
0	1.0	foo
1	3.0	NaN
2	5.0	foo
3	NaN	foo
4	6.0	foo
5	8.0	foo

In [10]:
df['A']=df.groupby(['F'])['A'].transform(lambda x: x.fillna(x.mean()))
Out[10]:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-111-5c30febb854f> in <module>
----> 1 df['A']=df.groupby(['F'])['A'].transform(lambda x: x.fillna(x.mean()))

d:\python36\lib\site-packages\pandas\core\groupby\generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    492         if not isinstance(func, str):
    493             return self._transform_general(
--> 494                 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
    495             )
    496 

d:\python36\lib\site-packages\pandas\core\groupby\generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs)
    560 
    561         result.name = self._selected_obj.name
--> 562         result.index = self._selected_obj.index
    563         return result
    564 

d:\python36\lib\site-packages\pandas\core\generic.py in __setattr__(self, name, value)
   5152         try:
   5153             object.__getattribute__(self, name)
-> 5154             return object.__setattr__(self, name, value)
   5155         except AttributeError:
   5156             pass

pandas\_libs\properties.pyx in pandas._libs.properties.AxisProperty.__set__()

d:\python36\lib\site-packages\pandas\core\series.py in _set_axis(self, axis, labels, fastpath)
    422         if not fastpath:
    423             # The ensure_index call above ensures we have an Index object
--> 424             self._mgr.set_axis(axis, labels)
    425 
    426     # ndarray compatibility

d:\python36\lib\site-packages\pandas\core\internals\managers.py in set_axis(self, axis, new_labels)
    225         if new_len != old_len:
    226             raise ValueError(
--> 227                 f"Length mismatch: Expected axis has {old_len} elements, new "
    228                 f"values have {new_len} elements"
    229             )

ValueError: Length mismatch: Expected axis has 5 elements, new values have 6 elements

Problem description

If I change column F's type to 'str' , everything goes well.

In [11]:
df['F'] = df['F'].astype('str')
df['A']=df.groupby(['F'])['A'].transform(lambda x: x.fillna(x.mean()))
df
Out[11]:
	A	F
0	1.0	foo
1	3.0	nan
2	5.0	foo
3	5.0	foo
4	6.0	foo
5	8.0	foo

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : b5958ee
python : 3.6.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.5
numpy : 1.17.2
pytz : 2018.9
dateutil : 2.7.5
pip : 19.3.1
setuptools : 41.4.0
Cython : 0.29.3
pytest : None
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.4
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.2.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.0.2
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.11.2
xlrd : None
xlwt : None
numba : 0.42.0

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2021-02-27T09:33:09Z

Hi @yllgl - you probably want dropna=False, i.e.

df.groupby(['F'], dropna=False)['A'].transform(lambda x: x.fillna(x.mean()))

does that work for you?

yllgl · 2021-02-27T09:38:09Z

Hi @yllgl - you probably want dropna=False, i.e.
df.groupby(['F'], dropna=False)['A'].transform(lambda x: x.fillna(x.mean()))
does that work for you?

new error occurs.

In [12]:
df['A']=df.groupby(['F'],dropna=False)['A'].transform(lambda x: x.fillna(x.mean()))
Out[12]:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-120-b63e4d7a1ecf> in <module>
----> 1 df['A']=df.groupby(['F'],dropna=False)['A'].transform(lambda x: x.fillna(x.mean()))

d:\python36\lib\site-packages\pandas\core\groupby\generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    492         if not isinstance(func, str):
    493             return self._transform_general(
--> 494                 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
    495             )
    496 

d:\python36\lib\site-packages\pandas\core\groupby\generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs)
    541 
    542             indexer = self._get_index(name)
--> 543             ser = klass(res, indexer)
    544             results.append(ser)
    545 

d:\python36\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    312                     if len(index) != len(data):
    313                         raise ValueError(
--> 314                             f"Length of passed values is {len(data)}, "
    315                             f"index implies {len(index)}."
    316                         )

ValueError: Length of passed values is 1, index implies 0.

MarcoGorelli · 2021-02-27T09:40:14Z

works for me:

>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([1, 3, 5, np.nan, 6, 8])
>>> df = pd.DataFrame({'A': s,'F': 'foo'})
>>> df.loc[1,'F']=np.nan
>>> df.groupby(['F'],dropna=False)['A'].transform(lambda x: x.fillna(x.mean()))
0    1.0
1    3.0
2    5.0
3    5.0
4    6.0
5    8.0
Name: A, dtype: float64

You're on an old version of pandas though, can you try updating?

Indeed, I can reproduce the bug on v1.1.5. Will do a git bisect

MarcoGorelli · 2021-02-27T10:24:23Z

OK, this was fixed in #36842

f6f3dd3e77278c9932105664a94aaca5c1422880 is the first bad commit
commit f6f3dd3e77278c9932105664a94aaca5c1422880
Author: patrick <61934744+phofl@users.noreply.github.com>
Date:   Wed Nov 4 03:59:02 2020 +0100

    BUG: Groupy dropped nan groups from result when grouping over single column (#36842)

 doc/source/whatsnew/v1.2.0.rst              |  1 +
 pandas/_libs/lib.pyx                        | 29 ++++++++++++++++++-----------
 pandas/core/groupby/ops.py                  |  9 +++------
 pandas/core/sorting.py                      | 11 +++++++++--
 pandas/tests/groupby/test_groupby.py        |  7 +++++++
 pandas/tests/groupby/test_groupby_dropna.py | 20 +++++++++++++++++++-
 pandas/tests/window/test_rolling.py         | 15 +++++++++++++++
 7 files changed, 72 insertions(+), 20 deletions(-)
bisect run success

yllgl added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 27, 2021

yllgl changed the title ~~BUG: groupb().tranforms return ValueError~~ BUG: groupby().tranforms return ValueError Feb 27, 2021

MarcoGorelli added Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 27, 2021

MarcoGorelli closed this as completed Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: groupby().tranforms return ValueError #40102

BUG: groupby().tranforms return ValueError #40102

yllgl commented Feb 27, 2021

INSTALLED VERSIONS

MarcoGorelli commented Feb 27, 2021

yllgl commented Feb 27, 2021

MarcoGorelli commented Feb 27, 2021 •

edited

Loading

MarcoGorelli commented Feb 27, 2021 •

edited

Loading

BUG: groupby().tranforms return ValueError #40102

BUG: groupby().tranforms return ValueError #40102

Comments

yllgl commented Feb 27, 2021

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

MarcoGorelli commented Feb 27, 2021

yllgl commented Feb 27, 2021

MarcoGorelli commented Feb 27, 2021 • edited Loading

MarcoGorelli commented Feb 27, 2021 • edited Loading

Output of `pd.show_versions()`

MarcoGorelli commented Feb 27, 2021 •

edited

Loading

MarcoGorelli commented Feb 27, 2021 •

edited

Loading