Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder tests in maybe_downcast_numeric #55825

Merged
merged 7 commits into from
Nov 22, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,18 +357,20 @@ def trans(x):
# if we don't have any elements, just astype it
return trans(result).astype(dtype)

# do a test on the first element, if it fails then we are done
r = result.ravel()
arr = np.array([r[0]])

if isna(arr).any():
# if we have any nulls, then we are done
if isna(r[0]):
# do a test on the first element, if it fails then we are done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test on the first element goes back to dc73315#diff-fb8a9c322624b0777f3ff7e3ef8320d746b15a2a0b80b7cab3dfbe2e12e06daa in core.common. It appears to me it is no longer necessary.

Copy link
Contributor Author

@MichaelTiemannOSC MichaelTiemannOSC Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I constructed a simple test case and I think I understand what the earlier code was trying to do, which was to ravel only the first element of result. Consider:

import numpy as	np
import pandas as pd

aa = np.array([1.0, np.nan, 2.0])
bb = np.array([aa, aa])

print(aa, pd.isna(aa))
# [ 1. nan  2.] [False  True False]
print(bb, pd.isna(bb))
#  [[ 1. nan  2.] [ 1. nan  2.]] [[False  True False] [False  True False]]
print(pd.isna(bb).any())
# True

The fact that pd.isna(bb).any() sees the nulls inside the sub-arrays means we don't need to ravel the whole thing to look for nulls. The reason to ravel, instead, is to look at the type of the elements, and we only need to ravel the first element to look into that. (We have already tested that there are elements to ravel.)
But...decimal.Decimal doesn't have a ravel, which explains perhaps why the original code tried to wrap just the first element back into an array to ravel that.

return result

elif not isinstance(r[0], (np.integer, np.floating, int, float, bool)):
# a comparable, e.g. a Decimal may slip in here
return result

if isna(r).any():
# if we have any nulls, then we are done
return result
MichaelTiemannOSC marked this conversation as resolved.
Show resolved Hide resolved

if (
issubclass(result.dtype.type, (np.object_, np.number))
and notna(result).all()
Expand Down
Loading