-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG SeriesGroupBy.mean() overflowed on some integer arrays (#22487) #22653
Conversation
Hello @troels! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #22653 +/- ##
==========================================
+ Coverage 92.17% 92.17% +<.01%
==========================================
Files 169 169
Lines 50708 50711 +3
==========================================
+ Hits 46740 46743 +3
Misses 3968 3968
Continue to review full report at Codecov.
|
pandas/core/groupby/ops.py
Outdated
@@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1, | |||
if (values == iNaT).any(): | |||
values = ensure_float64(values) | |||
else: | |||
values = values.astype('int64', copy=False) | |||
try: | |||
values = values.astype('int64', copy=False, casting='safe') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm well this certainly works though I'm wondering if there's not a more comprehensive way that we should be handling (ex: adding the casting="safe"
call to our algos
templates)
cc @jreback and @jbrockmendel for any input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is something that occurs many places, as e.g. int and uint64 can not always be cast into int64 and thinking they can, just for satisfying is_integer_dtype is wrong.
A few places around the code seems to have the same problem, e.g.:
pandas/core/algorithms.py:427
pandas/core/groupby/generic.py:1168
But as far as I can see, there isn't a general solution to this except for fixing the code in the respective places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u take this code and make a function out of it and put where we defined ensure_float64
define
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I've created a function in dtypes/common.py now.
6796bf4
to
5340c0b
Compare
pandas/core/groupby/ops.py
Outdated
@@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1, | |||
if (values == iNaT).any(): | |||
values = ensure_float64(values) | |||
else: | |||
values = values.astype('int64', copy=False) | |||
try: | |||
values = values.astype('int64', copy=False, casting='safe') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u take this code and make a function out of it and put where we defined ensure_float64
define
…#22487) When integer arrays contained integers that could were outside the range of int64, the conversion would overflow. Instead only allow allow safe casting and if a safe cast can not be done, cast to float64 instead.
5340c0b
to
6e8045b
Compare
Thanks @troels - very nice change! |
When integer arrays contained integers that were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.
git diff upstream/master -u -- "*.py" | flake8 --diff