Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Check before calling DTA._from_sequence instead of catching #29589

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,8 @@ def apply_raw(self):
if "Function does not reduce" not in str(err):
# catch only ValueError raised intentionally in libreduction
raise
# We expect np.apply_along_axis to give a two-dimensional result, or
# also raise.
result = np.apply_along_axis(self.f, self.axis, self.values)

# TODO: mixed type case
Expand Down
10 changes: 5 additions & 5 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -795,13 +795,13 @@ def _try_cast(self, result, obj, numeric_only: bool = False):
# Ensure we localize to UTC first before converting
# to the target timezone
arr = extract_array(obj)
try:
if not is_datetime64tz_dtype(result.dtype):
# arr may have already been cast back to tzaware in _try_cast
result = arr._from_sequence(result, dtype="datetime64[ns, UTC]")
result = result.astype(dtype)
except TypeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you can end up here with a different dtype that is not convertible to datetime64 (there is no guarantee that your groupby function or aggregation returns the same dtype as the original column). In such a case, we should not raise the error, but rather fall back to the original result as done now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the relevant functions are all dtype-preserving (well, e.g. sometimes int can be cast to float, but not relevant for datetime64). Is there a case that im forgetting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "the relevant functions" ? I think this gets used for user defined functions as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the calls that get here go through our cython functions, but you're right that we can also get here from _python_agg_general. But in the case of user defined functions that may not be dtype-preserving, we shouldn't be doing this casting at all, regardless of whether we pre-check vs catch

# _try_cast was called at a point where the result
# was already tz-aware
pass
else:
result = arr

elif is_extension_array_dtype(dtype):
# The function can return something of any type, so check
# if the type is compatible with the calling EA.
Expand Down