Skip to content

Commit

Permalink
BUG SeriesGroupBy.mean() overflowed on some integer array (#22487)
Browse files Browse the repository at this point in the history
When integer arrays contained integers that could were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.
  • Loading branch information
troels committed Sep 11, 2018
1 parent 0976e12 commit 5340c0b
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,7 @@ Groupby/Resample/Rolling
- Bug in :meth:`Resampler.apply` when passing postiional arguments to applied func (:issue:`14615`).
- Bug in :meth:`Series.resample` when passing ``numpy.timedelta64`` to `loffset` kwarg (:issue:`7687`).
- Bug in :meth:`Resampler.asfreq` when frequency of ``TimedeltaIndex`` is a subperiod of a new frequency (:issue:`13022`).
- Bug in :meth:`SeriesGroupBy.mean` when values were integral but could not fit inside of int64, overflowing instead. (:issue:`22487`)

Sparse
^^^^^^
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1,
if (values == iNaT).any():
values = ensure_float64(values)
else:
values = values.astype('int64', copy=False)
try:
values = values.astype('int64', copy=False, casting='safe')
except TypeError:
# At least one of the integers were outside the range of
# int64. Convert to float64 instead.
values = values.astype('float64', copy=False)
elif is_numeric and not is_complex_dtype(values):
values = ensure_float64(values)
else:
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1125,3 +1125,12 @@ def h(df, arg3):
expected = pd.Series([4, 8, 12], index=pd.Int64Index([1, 2, 3]))

tm.assert_series_equal(result, expected)


def test_groupby_mean_no_overflow():
# Regression test for (#22487)
df = pd.DataFrame({
"user": ["A", "A", "A", "A", "A"],
"connections": [4970, 4749, 4719, 4704, 18446744073699999744]
})
assert df.groupby('user')['connections'].mean()['A'] == 3689348814740003840

0 comments on commit 5340c0b

Please sign in to comment.