-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to_timedelta
np.int32
casting bug with NumPy 2
#57984
Conversation
pandas/_libs/tslibs/conversion.pyx
Outdated
# If ts is an integer then the fractional component will always be | ||
# zero. It helps to set this explicitly following changes to type | ||
# promotion behavior in NEP 50 (GH 56996). | ||
frac = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can do this outright since pandas will still need to support numpy < 2 behavior. Maybe ts
should be cast to int
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @mroeschke—I thought my fix would preserve NumPy < 2 behavior too, but maybe I am missing something subtle. I switched to your suggested approach. Let me know if that looks OK.
# type promotion changes in NEP 50. If this is not done, for example, | ||
# np.int32 values for ts can lead to np.int32 values for frac, which can | ||
# raise unexpected overflow errors downstream (GH 56996). | ||
if isinstance(ts, np.integer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just declare frac
to be int64 if that is what is required? Generally we are casting too much in this function. I am also surprised that frac - base
doesn't follow the C implicit conversion rules, since base
is int64 at this point; I feel like simplifying this and using explicit types would help across the board
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frac
is mainly needed for when ts
is a floating point value, in which case it also needs to be a floating point value, so we cannot declare it to be a 64-bit integer up front.
In thinking about the floating point case, it occurs to me that NEP 50 also brings slight answer changes in that context as well. E.g. with NumPy < 2:
>>> pd.to_timedelta(np.float32(3.2), "D")
Timedelta('3 days 04:48:00.004147200')
and with NumPy >= 2:
>>> pd.to_timedelta(np.float32(3.2), "D")
Timedelta('3 days 04:48:00.005046272')
Again this is due to changes in type promotion rules for frac = ts - base
. Previously np.float32(3.2) - 3
would return a 64-bit float, but with NumPy >= 2 it returns a 32-bit float.
Is that something we would also like to address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what would help this function is giving frac
a type, and maybe making another variable if we need branching. Mixing Python objects into mostly "typed" Cython code like this is tough to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and sorry for the delay in getting back. I pushed an update in 6297494—let me know if that is along the lines of what you were thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it seems like declaring frac_float64
to be a float64_t
type changes answers in the 32-bit build. Do you have any suggestions for how to address that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work if you just use the float
data type? Also can you get rid of the isinstance
check with the current design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried and unfortunately it seems like it does not—I think float
locks it to always be a float32 value in Cython.
I have not taken the time to fully understand it yet—I'm admittedly a bit confused—but in my testing in a Docker container the only approach that seems to work on 32-bit platforms is the one where the result of ts - base
is assigned to a variable with an undeclared type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat surprised there is no build warnings to hint us at the issue. @lithomas1 is there something else we need to do to make build warnings visible in CI?
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
This PR proposes a fix for #56996. I'd be happy to make adjustments as needed and eventually add a what's new entry to the appropriate file.
to_timedelta
raises unexpectedOutOfBoundsTimedelta
error with development version of NumPy #56996doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.