-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: __array__ for tz-aware Series/Index #24596
Conversation
This deprecates the current behvior when converting tz-aware Series or Index to an ndarray. Previously, we converted to M8[ns], throwing away the timezone information. In the future, we will return an object-dtype array filled with Timestamps, each of which has the correct tz. ```python In [1]: import pandas as pd; import numpy as np In [2]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET")) In [3]: np.asarray(ser) /bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. #!/Users/taugspurger/Envs/pandas-dev/bin/python3 Out[3]: array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'], dtype='datetime64[ns]') ``` xref pandas-dev#23569
@@ -1020,7 +1020,7 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'): | |||
# datetime64tz is assumed to be naive which should | |||
# be localized to the timezone. | |||
is_dt_string = is_string_dtype(value) | |||
value = to_datetime(value, errors=errors) | |||
value = to_datetime(value, errors=errors).array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to look at this closer. maybe_cast_to_datetime
seems in need of an overhaul (along with all of sanitize_array
) but this at least avoids the warning.
elif np.array(value).ndim == 2: | ||
# hasattr first, to avoid coercing to ndarray without reason. | ||
# But we may be relying on the ndarray coercion to check ndim. | ||
# Why not just convert to an ndarray earlier on if needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping to clean up the type on value
a bit to avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a TODO for any section that we should change later
xrefing #15750 as I think it's related to the eventual end goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't looked in detail
Codecov Report
@@ Coverage Diff @@
## master #24596 +/- ##
===========================================
- Coverage 92.38% 43.05% -49.34%
===========================================
Files 166 166
Lines 52478 52514 +36
===========================================
- Hits 48483 22609 -25874
- Misses 3995 29905 +25910
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24596 +/- ##
==========================================
- Coverage 92.37% 92.37% -0.01%
==========================================
Files 166 166
Lines 52396 52415 +19
==========================================
+ Hits 48403 48420 +17
- Misses 3993 3995 +2
Continue to review full report at Codecov.
|
@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True): | |||
# find a better solution | |||
hashed = hash((tuple(categories), ordered)) | |||
return hashed | |||
|
|||
if is_datetime64tz_dtype(categories.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you categories.to_numpy()
always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, possibly. We'll still need the special case for datetime64tz_dtype to pass dtype=_NS_DTYPE
, since Index[datetime64[ns, tz]].to_numpy()
returns an ndarray of Timestamp objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a TODO here, this is kind of special casing
I can reproduce the failures with older numpys locally. Debugging that now. |
Turns out it was bottleneck @jreback right now Line 147 in 19f715c
I assume we don't want to pass datetimetz to bottleneck, since the actual operation should be done on the same values (i8 or M8[ns]). |
right should exclude anything that matches needs_i8_conversion |
349f818 updates with
|
@@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0): | |||
------- | |||
Block | |||
""" | |||
values = self.get_values() | |||
values, _ = self._try_coerce_args(values, values) | |||
if self.is_datetimetz: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is getting super messy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. A proper fix is updating _try_coerce_args
/ get_values
, which I think @jbrockmendel is working on. But this is necessary now to avoid the warning / conversion to object dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the branches I have in progress would help here.
Allowing for DatetimeArray to be reshaped to (1, nrows) would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a TODO here
elif np.array(value).ndim == 2: | ||
# hasattr first, to avoid coercing to ndarray without reason. | ||
# But we may be relying on the ndarray coercion to check ndim. | ||
# Why not just convert to an ndarray earlier on if needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a TODO for any section that we should change later
@@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0): | |||
------- | |||
Block | |||
""" | |||
values = self.get_values() | |||
values, _ = self._try_coerce_args(values, values) | |||
if self.is_datetimetz: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a TODO here
assert not array_equivalent( | ||
DatetimeIndex([0, np.nan], tz='CET'), DatetimeIndex( | ||
[0, np.nan], tz='US/Eastern')) | ||
with catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what warning are you catching here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_equivalent calls __array__
, so the new deprecation warning comes through.
We don't care about the warning here (the test doesn't care whether they're objects or datetimes), so we just ignore the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tightened up the filter.
All green. I think things are decent here. I didn't add a TODO in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to merge, but would like to add a couple of TODOs where may need followups.
|
||
np.asarray(ser, dtype='datetime64[ns]') | ||
|
||
*Future Behavior* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually call this current
@@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True): | |||
# find a better solution | |||
hashed = hash((tuple(categories), ordered)) | |||
return hashed | |||
|
|||
if is_datetime64tz_dtype(categories.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a TODO here, this is kind of special casing
thanks! |
This didn't make it onto #6581, should we enforce it for 1.0? |
I don't have a strong opinion. |
@jbrockmendel I'll put up a PR enforcing this, just so we have #6581 cleared. |
This deprecates the current behvior when converting tz-aware Series
or Index to an ndarray. Previously, we converted to M8[ns], throwing
away the timezone information. In the future, we will return an
object-dtype array filled with Timestamps, each of which has the correct
tz.
xref #23569
closes #15750