Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to_timedelta np.int32 casting bug with NumPy 2 #57984

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,14 @@ cdef int64_t cast_from_unit(
f"cannot convert input {ts} with the unit '{unit}'"
) from err

# In the event that ts itself is already an integer type, cast it to a
# 64-bit integer to ensure that frac will also be a 64-bit integer after
# type promotion changes in NEP 50. If this is not done, for example,
# np.int32 values for ts can lead to np.int32 values for frac, which can
# raise unexpected overflow errors downstream (GH 56996).
if isinstance(ts, np.integer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not just declare frac to be int64 if that is what is required? Generally we are casting too much in this function. I am also surprised that frac - base doesn't follow the C implicit conversion rules, since base is int64 at this point; I feel like simplifying this and using explicit types would help across the board

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frac is mainly needed for when ts is a floating point value, in which case it also needs to be a floating point value, so we cannot declare it to be a 64-bit integer up front.

In thinking about the floating point case, it occurs to me that NEP 50 also brings slight answer changes in that context as well. E.g. with NumPy < 2:

>>> pd.to_timedelta(np.float32(3.2), "D")
Timedelta('3 days 04:48:00.004147200')

and with NumPy >= 2:

>>> pd.to_timedelta(np.float32(3.2), "D")
Timedelta('3 days 04:48:00.005046272')

Again this is due to changes in type promotion rules for frac = ts - base. Previously np.float32(3.2) - 3 would return a 64-bit float, but with NumPy >= 2 it returns a 32-bit float.

Is that something we would also like to address?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what would help this function is giving frac a type, and maybe making another variable if we need branching. Mixing Python objects into mostly "typed" Cython code like this is tough to follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and sorry for the delay in getting back. I pushed an update in 6297494—let me know if that is along the lines of what you were thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it seems like declaring frac_float64 to be a float64_t type changes answers in the 32-bit build. Do you have any suggestions for how to address that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if you just use the float data type? Also can you get rid of the isinstance check with the current design?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and unfortunately it seems like it does not—I think float locks it to always be a float32 value in Cython.

I have not taken the time to fully understand it yet—I'm admittedly a bit confused—but in my testing in a Docker container the only approach that seems to work on 32-bit platforms is the one where the result of ts - base is assigned to a variable with an undeclared type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat surprised there is no build warnings to hint us at the issue. @lithomas1 is there something else we need to do to make build warnings visible in CI?

ts = <int64_t>ts

frac = ts - base
if p:
frac = round(frac, p)
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/tools/test_to_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,13 @@ def test_to_timedelta_fraction(self):
expected = pd.Timedelta("0 days 00:19:59.999999998")
assert result == expected

@pytest.mark.parametrize("dtype", [np.int32, np.uint32])
def test_to_timedelta_scalar_integer(self, dtype):
# GH 56996
result = to_timedelta(dtype(1), "D")
expected = pd.Timedelta("1 days 00:00:00")
assert result == expected


def test_from_numeric_arrow_dtype(any_numeric_ea_dtype):
# GH 52425
Expand Down