DataFrame.gropuby().mean() incorrect result #22487

tinchoroman · 2018-08-23T15:21:40Z

Anybody knows why I'm having different results when I apply the same operator to the same DataFrame but using groupby?
When using groupby , It returned negative values while all values are positive.

from pandas import DataFrame
df = DataFrame({"user":["A", "A", "A", "A", "A"],
                            "connections":[18446744073699999744, 4970, 4749, 4719, 4704]})

df.mean()

connections 3.689349e+18
dtype: float64

df.groupby("user")["connections"].mean()

user
A -1906546.0
Name: connections, dtype: float64

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-08-23T15:31:14Z

Can you try on master? Looks like an int overflow somewhere if still present investigation and PRs are always welcome

tinchoroman · 2018-08-23T15:53:21Z

Hi WillAyd, thanks for your prompt response. This is the first time I post an issue. Could you please explain little further what you exactly mean by "try on master" ? Thanks in advance!

tinchoroman · 2018-08-23T17:15:57Z

I've upgraded to latest version and the problem still persists. In the investigation line that WillAyd suggests, the same example whit float numbers worked fine.

df = DataFrame({"user":["A", "A", "A", "A", "A"],
           "connections":[18446744073699999744.0, 4970.0, 4749.0, 4719.0, 4704.0]})

df.mean()
connections    3.689349e+18
dtype: float64

df.groupby("user")["connections"].mean()
user
A    3.689349e+18
Name: connections, dtype: float64

df.mean()[0] == df.groupby("user")["connections"].mean()[0]
True

…#22487) When integer arrays contained integers that could were outside the range of int64, the conversion would overflow. Instead only allow allow safe casting and if a safe cast can not be done, cast to float64 instead.

WillAyd added Bug Groupby labels Aug 23, 2018

troels mentioned this issue Sep 9, 2018

BUG SeriesGroupBy.mean() overflowed on some integer arrays (#22487) #22653

Merged

4 tasks

jreback added this to the 0.24.0 milestone Sep 18, 2018

WillAyd closed this as completed in #22653 Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.gropuby().mean() incorrect result #22487

DataFrame.gropuby().mean() incorrect result #22487

tinchoroman commented Aug 23, 2018 •

edited

Loading

WillAyd commented Aug 23, 2018

tinchoroman commented Aug 23, 2018

tinchoroman commented Aug 23, 2018

DataFrame.gropuby().mean() incorrect result #22487

DataFrame.gropuby().mean() incorrect result #22487

Comments

tinchoroman commented Aug 23, 2018 • edited Loading

INSTALLED VERSIONS

WillAyd commented Aug 23, 2018

tinchoroman commented Aug 23, 2018

tinchoroman commented Aug 23, 2018

tinchoroman commented Aug 23, 2018 •

edited

Loading