Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: __array__ for tz-aware Series/Index #24596

Merged
merged 7 commits into from
Jan 5, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 63 additions & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1227,7 +1227,7 @@ Deprecations
.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Integer Addition/Subtraction with Datetimes and Timedeltas is Deprecated
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jreback marked this conversation as resolved.
Show resolved Hide resolved

In the past, users could—in some cases—add or subtract integers or integer-dtype
arrays from :class:`Timestamp`, :class:`DatetimeIndex` and :class:`TimedeltaIndex`.
Expand Down Expand Up @@ -1265,6 +1265,68 @@ the object's ``freq`` attribute (:issue:`21939`, :issue:`23878`).
dti = pd.date_range('2001-01-01', periods=2, freq='7D')
dti + pd.Index([1 * dti.freq, 2 * dti.freq])


.. _whatsnew_0240.deprecations.tz_aware_array:

Converting Timezone-Aware Series and Index to NumPy Arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The conversion from a :class:`Series` or :class:`Index` with timezone-aware
datetime data will change to preserve timezones by default (:issue:`23569`).

NumPy doesn't have a dedicated dtype for timezone-aware datetimes.
In the past, converting a :class:`Series` or :class:`DatetimeIndex` with
timezone-aware datatimes would convert to a NumPy array by

1. converting the tz-aware data to UTC
2. dropping the timezone-info
3. returning a :class:`numpy.ndarray` with ``datetime64[ns]`` dtype

Future versions of pandas will preserve the timezone information by returning an
object-dtype NumPy array where each value is a :class:`Timestamp` with the correct
timezone attached

.. ipython:: python

ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
ser

The default behavior remains the same, but issues a warning

.. code-block:: python
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

In [8]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive
ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray
with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.

To accept the future behavior, pass 'dtype=object'.
To keep the old behavior, pass 'dtype="datetime64[ns]"'.
#!/bin/python3
Out[8]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
dtype='datetime64[ns]')

The old or new behavior can be obtained by specifying the ``dtype``

.. ipython:: python
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

# Old behavior
np.asarray(ser, dtype='datetime64[ns]')

# New behavior
np.asarray(ser, dtype=object)


Or by using :meth:`Series.to_numpy`

.. ipython:: python

ser.to_numpy()
jreback marked this conversation as resolved.
Show resolved Hide resolved
ser.to_numpy(dtype="datetime64[ns]")

All the above applies to a :class:`DatetimeIndex` with tz-aware values as well.

.. _whatsnew_0240.prior_deprecations:

Removal of prior version deprecations/changes
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,7 @@ def _resolution(self):
# Array-Like / EA-Interface Methods

def __array__(self, dtype=None):
if is_object_dtype(dtype):
if is_object_dtype(dtype) or (dtype is None and self.tz):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
return np.array(list(self), dtype=object)
elif is_int64_dtype(dtype):
return self.asi8
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1020,7 +1020,7 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
# datetime64tz is assumed to be naive which should
# be localized to the timezone.
is_dt_string = is_string_dtype(value)
value = to_datetime(value, errors=errors)
value = to_datetime(value, errors=errors).array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to look at this closer. maybe_cast_to_datetime seems in need of an overhaul (along with all of sanitize_array) but this at least avoids the warning.

if is_dt_string:
# Strings here are naive, so directly localize
value = value.tz_localize(dtype.tz)
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,7 @@ def _hash_categories(categories, ordered=True):
from pandas.core.util.hashing import (
hash_array, _combine_hash_arrays, hash_tuples
)
from pandas.core.dtypes.common import is_datetime64tz_dtype, _NS_DTYPE
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

if len(categories) and isinstance(categories[0], tuple):
# assumes if any individual category is a tuple, then all our. ATM
Expand All @@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
# find a better solution
hashed = hash((tuple(categories), ordered))
return hashed

if is_datetime64tz_dtype(categories.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you categories.to_numpy() always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, possibly. We'll still need the special case for datetime64tz_dtype to pass dtype=_NS_DTYPE, since Index[datetime64[ns, tz]].to_numpy() returns an ndarray of Timestamp objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a TODO here, this is kind of special casing

# Avoid future warning.
categories = categories.astype(_NS_DTYPE)

cat_array = hash_array(np.asarray(categories), categorize=False)
if ordered:
cat_array = np.vstack([
Expand Down
5 changes: 2 additions & 3 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1271,8 +1271,8 @@ def f(self, **kwargs):
def first_compat(x, axis=0):

def first(x):
x = x.to_numpy()

x = np.asarray(x)
x = x[notna(x)]
if len(x) == 0:
return np.nan
Expand All @@ -1286,8 +1286,7 @@ def first(x):
def last_compat(x, axis=0):

def last(x):

x = np.asarray(x)
x = x.to_numpy()
x = x[notna(x)]
if len(x) == 0:
return np.nan
Expand Down
16 changes: 15 additions & 1 deletion pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,21 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, dtype=None):

# --------------------------------------------------------------------

def __array__(self, dtype=None):
if (dtype is None and isinstance(self._data, DatetimeArray)
and getattr(self.dtype, 'tz', None)):
msg = (
"Converting timezone-aware DatetimeArray to timezone-naive "
"ndarray with 'datetime64[ns]' dtype. In the future, this "
"will return an ndarray with 'object' dtype where each "
"element is a 'pandas.Timestamp' with the correct 'tz'.\n\t"
"To accept the future behavior, pass 'dtype=object'.\n\t"
"To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
)
warnings.warn(msg, FutureWarning, stacklevel=3)
dtype = 'M8[ns]'
return np.asarray(self._data, dtype=dtype)

@property
def dtype(self):
return self._eadata.dtype
Expand Down Expand Up @@ -1114,7 +1129,6 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):

strftime = ea_passthrough(DatetimeArray.strftime)
_has_same_tz = ea_passthrough(DatetimeArray._has_same_tz)
__array__ = ea_passthrough(DatetimeArray.__array__)

@property
def offset(self):
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,7 +581,12 @@ def can_do_equal_len():
setter(item, v)

# we have an equal len ndarray/convertible to our labels
elif np.array(value).ndim == 2:
# hasattr first, to avoid coercing to ndarray without reason.
# But we may be relying on the ndarray coercion to check ndim.
# Why not just convert to an ndarray earlier on if needed?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping to clean up the type on value a bit to avoid this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a TODO for any section that we should change later

elif ((hasattr(value, 'ndim') and value.ndim == 2)
or (not hasattr(value, 'ndim') and
np.array(value).ndim) == 2):

# note that this coerces the dtype if we are mixed
# GH 7551
Expand Down
20 changes: 18 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1447,8 +1447,18 @@ def quantile(self, qs, interpolation='linear', axis=0):
-------
Block
"""
values = self.get_values()
values, _ = self._try_coerce_args(values, values)
if self.is_datetimetz:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting super messy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. A proper fix is updating _try_coerce_args / get_values, which I think @jbrockmendel is working on. But this is necessary now to avoid the warning / conversion to object dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the branches I have in progress would help here.

Allowing for DatetimeArray to be reshaped to (1, nrows) would.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO here

# We need to operate on i8 values for datetimetz
# but `Block.get_values()` returns an ndarray of objects
# right now.
values = self.values.asi8

# Usual shape inconsistencies for ExtensionBlocks
if self.ndim > 1:
values = values[None, :]
else:
values = self.get_values()
values, _ = self._try_coerce_args(values, values)

is_empty = values.shape[axis] == 0
orig_scalar = not is_list_like(qs)
Expand Down Expand Up @@ -2330,6 +2340,12 @@ def get_values(self, dtype=None):
values = values.reshape(1, -1)
return values

def to_dense(self):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
# we request M8[ns] dtype here, even though it discards tzinfo,
# as lots of code (e.g. anything using values_from_object)
# expects that behavior.
return np.asarray(self.values, dtype=_NS_DTYPE)

def _slice(self, slicer):
""" return a slice of my values """
if isinstance(slicer, tuple):
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from pandas.core.indexes import base as ibase
from pandas.core.internals import (
create_block_manager_from_arrays, create_block_manager_from_blocks)
from pandas.core.internals.arrays import extract_array

# ---------------------------------------------------------------------
# BlockManager Interface
Expand Down Expand Up @@ -539,7 +540,6 @@ def sanitize_array(data, index, dtype=None, copy=False,
Sanitize input data to an ndarray, copy if specified, coerce to the
dtype if specified.
"""

if dtype is not None:
dtype = pandas_dtype(dtype)

Expand All @@ -552,8 +552,10 @@ def sanitize_array(data, index, dtype=None, copy=False,
else:
data = data.copy()

data = extract_array(data, extract_numpy=True)

# GH#846
if isinstance(data, (np.ndarray, Index, ABCSeries)):
if isinstance(data, np.ndarray):

if dtype is not None:
subarr = np.array(data, copy=False)
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/reshape/tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from pandas._libs.lib import infer_dtype

from pandas.core.dtypes.common import (
ensure_int64, is_categorical_dtype, is_datetime64_dtype,
_NS_DTYPE, ensure_int64, is_categorical_dtype, is_datetime64_dtype,
is_datetime64tz_dtype, is_datetime_or_timedelta_dtype, is_integer,
is_scalar, is_timedelta64_dtype)
from pandas.core.dtypes.missing import isna
Expand Down Expand Up @@ -226,7 +226,10 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
raise ValueError('Overlapping IntervalIndex is not accepted.')

else:
bins = np.asarray(bins)
if is_datetime64tz_dtype(bins):
jreback marked this conversation as resolved.
Show resolved Hide resolved
bins = np.asarray(bins, dtype=_NS_DTYPE)
else:
bins = np.asarray(bins)
bins = _convert_bin_to_numeric_type(bins, dtype)
if (np.diff(bins) < 0).any():
raise ValueError('bins must increase monotonically.')
Expand Down
18 changes: 16 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
is_extension_array_dtype, is_extension_type, is_hashable, is_integer,
is_iterator, is_list_like, is_scalar, is_string_like, is_timedelta64_dtype)
from pandas.core.dtypes.generic import (
ABCDataFrame, ABCDatetimeIndex, ABCSeries, ABCSparseArray, ABCSparseSeries)
ABCDataFrame, ABCDatetimeArray, ABCDatetimeIndex, ABCSeries,
ABCSparseArray, ABCSparseSeries)
from pandas.core.dtypes.missing import (
isna, na_value_for_dtype, notna, remove_na_arraylike)

Expand Down Expand Up @@ -665,7 +666,20 @@ def __array__(self, result=None):
"""
The array interface, return my values.
"""
return self.get_values()
# TODO: change the keyword name from result to dtype?
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
if (result is None and isinstance(self.array, ABCDatetimeArray)
and getattr(self.dtype, 'tz', None)):
msg = (
"Converting timezone-aware DatetimeArray to timezone-naive "
"ndarray with 'datetime64[ns]' dtype. In the future, this "
"will return an ndarray with 'object' dtype where each "
"element is a 'pandas.Timestamp' with the correct 'tz'.\n\t"
"To accept the future behavior, pass 'dtype=object'.\n\t"
"To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
)
warnings.warn(msg, FutureWarning, stacklevel=3)
result = 'M8[ns]'
return np.asarray(self.array, result)

def __array_wrap__(self, result, context=None):
"""
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/arrays/test_datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,11 +262,11 @@ def test_array(self, tz_naive_fixture):
arr = DatetimeArray(dti)

expected = dti.asi8.view('M8[ns]')
result = np.array(arr)
result = np.array(arr, dtype='M8[ns]')
tm.assert_numpy_array_equal(result, expected)

# check that we are not making copies when setting copy=False
result = np.array(arr, copy=False)
result = np.array(arr, dtype='M8[ns]', copy=False)
assert result.base is expected.base
assert result.base is not None

Expand Down
33 changes: 33 additions & 0 deletions pandas/tests/arrays/test_datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,39 @@ def test_fillna_preserves_tz(self, method):
assert arr[2] is pd.NaT
assert dti[2] == pd.Timestamp('2000-01-03', tz='US/Central')

def test_array_interface_tz(self):
tz = "US/Central"
data = DatetimeArray(pd.date_range('2017', periods=2, tz=tz))
result = np.asarray(data)

expected = np.array([pd.Timestamp('2017-01-01T00:00:00', tz=tz),
pd.Timestamp('2017-01-02T00:00:00', tz=tz)],
dtype=object)
tm.assert_numpy_array_equal(result, expected)

result = np.asarray(data, dtype=object)
tm.assert_numpy_array_equal(result, expected)

result = np.asarray(data, dtype='M8[ns]')

expected = np.array(['2017-01-01T06:00:00',
'2017-01-02T06:00:00'], dtype="M8[ns]")
tm.assert_numpy_array_equal(result, expected)

def test_array_interface(self):
data = DatetimeArray(pd.date_range('2017', periods=2))
expected = np.array(['2017-01-01T00:00:00', '2017-01-02T00:00:00'],
dtype='datetime64[ns]')

result = np.asarray(data)
tm.assert_numpy_array_equal(result, expected)

result = np.asarray(data, dtype=object)
expected = np.array([pd.Timestamp('2017-01-01T00:00:00'),
pd.Timestamp('2017-01-02T00:00:00')],
dtype=object)
tm.assert_numpy_array_equal(result, expected)


class TestSequenceToDT64NS(object):

Expand Down
25 changes: 14 additions & 11 deletions pandas/tests/dtypes/test_missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,17 +278,20 @@ def test_array_equivalent():
TimedeltaIndex([0, np.nan]))
assert not array_equivalent(
TimedeltaIndex([0, np.nan]), TimedeltaIndex([1, np.nan]))
assert array_equivalent(DatetimeIndex([0, np.nan], tz='US/Eastern'),
DatetimeIndex([0, np.nan], tz='US/Eastern'))
assert not array_equivalent(
DatetimeIndex([0, np.nan], tz='US/Eastern'), DatetimeIndex(
[1, np.nan], tz='US/Eastern'))
assert not array_equivalent(
DatetimeIndex([0, np.nan]), DatetimeIndex(
[0, np.nan], tz='US/Eastern'))
assert not array_equivalent(
DatetimeIndex([0, np.nan], tz='CET'), DatetimeIndex(
[0, np.nan], tz='US/Eastern'))
with catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what warning are you catching here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_equivalent calls __array__, so the new deprecation warning comes through.

We don't care about the warning here (the test doesn't care whether they're objects or datetimes), so we just ignore the warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tightened up the filter.

simplefilter("ignore")
assert array_equivalent(DatetimeIndex([0, np.nan], tz='US/Eastern'),
DatetimeIndex([0, np.nan], tz='US/Eastern'))
assert not array_equivalent(
DatetimeIndex([0, np.nan], tz='US/Eastern'), DatetimeIndex(
[1, np.nan], tz='US/Eastern'))
assert not array_equivalent(
DatetimeIndex([0, np.nan]), DatetimeIndex(
[0, np.nan], tz='US/Eastern'))
assert not array_equivalent(
DatetimeIndex([0, np.nan], tz='CET'), DatetimeIndex(
[0, np.nan], tz='US/Eastern'))

assert not array_equivalent(
DatetimeIndex([0, np.nan]), TimedeltaIndex([0, np.nan]))

Expand Down
Loading