Fix `to_timedelta` `np.int32` casting bug with NumPy 2 #57984

spencerkclark · 2024-03-24T14:49:42Z

This PR proposes a fix for #56996. I'd be happy to make adjustments as needed and eventually add a what's new entry to the appropriate file.

closes BUG: to_timedelta raises unexpected OutOfBoundsTimedelta error with development version of NumPy #56996
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

mroeschke · 2024-03-25T17:42:56Z

pandas/_libs/tslibs/conversion.pyx

+        # If ts is an integer then the fractional component will always be
+        # zero. It helps to set this explicitly following changes to type
+        # promotion behavior in NEP 50 (GH 56996).
+        frac = 0


I'm not sure if we can do this outright since pandas will still need to support numpy < 2 behavior. Maybe ts should be cast to int instead?

Thanks for taking a look @mroeschke—I thought my fix would preserve NumPy < 2 behavior too, but maybe I am missing something subtle. I switched to your suggested approach. Let me know if that looks OK.

WillAyd · 2024-03-26T22:17:17Z

pandas/_libs/tslibs/conversion.pyx

+    # type promotion changes in NEP 50. If this is not done, for example,
+    # np.int32 values for ts can lead to np.int32 values for frac, which can
+    # raise unexpected overflow errors downstream (GH 56996).
+    if isinstance(ts, np.integer):


Can we not just declare frac to be int64 if that is what is required? Generally we are casting too much in this function. I am also surprised that frac - base doesn't follow the C implicit conversion rules, since base is int64 at this point; I feel like simplifying this and using explicit types would help across the board

frac is mainly needed for when ts is a floating point value, in which case it also needs to be a floating point value, so we cannot declare it to be a 64-bit integer up front.

In thinking about the floating point case, it occurs to me that NEP 50 also brings slight answer changes in that context as well. E.g. with NumPy < 2:

>>> pd.to_timedelta(np.float32(3.2), "D") Timedelta('3 days 04:48:00.004147200')

and with NumPy >= 2:

>>> pd.to_timedelta(np.float32(3.2), "D") Timedelta('3 days 04:48:00.005046272')

Again this is due to changes in type promotion rules for frac = ts - base. Previously np.float32(3.2) - 3 would return a 64-bit float, but with NumPy >= 2 it returns a 32-bit float.

Is that something we would also like to address?

I think what would help this function is giving frac a type, and maybe making another variable if we need branching. Mixing Python objects into mostly "typed" Cython code like this is tough to follow

Thanks and sorry for the delay in getting back. I pushed an update in 6297494—let me know if that is along the lines of what you were thinking.

Ah it seems like declaring frac_float64 to be a float64_t type changes answers in the 32-bit build. Do you have any suggestions for how to address that?

Does it work if you just use the float data type? Also can you get rid of the isinstance check with the current design?

I tried and unfortunately it seems like it does not—I think float locks it to always be a float32 value in Cython.

I have not taken the time to fully understand it yet—I'm admittedly a bit confused—but in my testing in a Docker container the only approach that seems to work on 32-bit platforms is the one where the result of ts - base is assigned to a variable with an undeclared type.

I am somewhat surprised there is no build warnings to hint us at the issue. @lithomas1 is there something else we need to do to make build warnings visible in CI?

github-actions · 2024-05-17T00:05:45Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2024-05-31T18:45:26Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

Propose fix for pandas-dev#56996

a087d40

spencerkclark requested a review from MarcoGorelli as a code owner March 24, 2024 14:49

mroeschke reviewed Mar 25, 2024

View reviewed changes

Fix by casting ts to an integer instead

187b02f

WillAyd requested changes Mar 26, 2024

View reviewed changes

spencerkclark added 2 commits April 14, 2024 07:44

Use explicitly declared types with branching

6297494

Merge branch 'main' into fix-56996

84f2135

spencerkclark mentioned this pull request May 2, 2024

⚠️ Nightly upstream-dev CI failed ⚠️ pydata/xarray#8844

Closed

github-actions bot added the Stale label May 17, 2024

mroeschke closed this May 31, 2024

spencerkclark mentioned this pull request Sep 18, 2024

Refactor datetime and timedelta encoding for increased robustness pydata/xarray#9498

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `to_timedelta` `np.int32` casting bug with NumPy 2 #57984

Fix `to_timedelta` `np.int32` casting bug with NumPy 2 #57984

spencerkclark commented Mar 24, 2024

mroeschke Mar 25, 2024

spencerkclark Mar 25, 2024

WillAyd Mar 26, 2024

spencerkclark Mar 28, 2024

WillAyd Apr 1, 2024

spencerkclark Apr 14, 2024

spencerkclark Apr 14, 2024

WillAyd Apr 14, 2024

spencerkclark Apr 16, 2024

WillAyd Apr 16, 2024

github-actions bot commented May 17, 2024

mroeschke commented May 31, 2024

Fix to_timedelta np.int32 casting bug with NumPy 2 #57984

Fix to_timedelta np.int32 casting bug with NumPy 2 #57984

Conversation

spencerkclark commented Mar 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 17, 2024

mroeschke commented May 31, 2024

Fix `to_timedelta` `np.int32` casting bug with NumPy 2 #57984

Fix `to_timedelta` `np.int32` casting bug with NumPy 2 #57984