Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: __array__ for tz-aware Series/Index #24596

Merged
merged 7 commits into from
Jan 5, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/api/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Attributes
.. autosummary::
:toctree: generated/

Series.array
Series.values
Series.dtype
Series.ftype
Expand Down Expand Up @@ -58,10 +59,12 @@ Conversion
Series.convert_objects
Series.copy
Series.bool
Series.to_numpy
Series.to_period
Series.to_timestamp
Series.to_list
Series.get_values
Series.__array__

Indexing, iteration
-------------------
Expand Down
70 changes: 69 additions & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1227,7 +1227,7 @@ Deprecations
.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Integer Addition/Subtraction with Datetimes and Timedeltas is Deprecated
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jreback marked this conversation as resolved.
Show resolved Hide resolved

In the past, users could—in some cases—add or subtract integers or integer-dtype
arrays from :class:`Timestamp`, :class:`DatetimeIndex` and :class:`TimedeltaIndex`.
Expand Down Expand Up @@ -1265,6 +1265,74 @@ the object's ``freq`` attribute (:issue:`21939`, :issue:`23878`).
dti = pd.date_range('2001-01-01', periods=2, freq='7D')
dti + pd.Index([1 * dti.freq, 2 * dti.freq])


.. _whatsnew_0240.deprecations.tz_aware_array:

Converting Timezone-Aware Series and Index to NumPy Arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The conversion from a :class:`Series` or :class:`Index` with timezone-aware
datetime data will change to preserve timezones by default (:issue:`23569`).

NumPy doesn't have a dedicated dtype for timezone-aware datetimes.
In the past, converting a :class:`Series` or :class:`DatetimeIndex` with
timezone-aware datatimes would convert to a NumPy array by

1. converting the tz-aware data to UTC
2. dropping the timezone-info
3. returning a :class:`numpy.ndarray` with ``datetime64[ns]`` dtype

Future versions of pandas will preserve the timezone information by returning an
object-dtype NumPy array where each value is a :class:`Timestamp` with the correct
timezone attached

.. ipython:: python

ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
ser

The default behavior remains the same, but issues a warning

.. code-block:: python
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

In [8]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive
ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray
with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.

To accept the future behavior, pass 'dtype=object'.
To keep the old behavior, pass 'dtype="datetime64[ns]"'.
#!/bin/python3
Out[8]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
dtype='datetime64[ns]')

The previous or future behavior can be obtained, without any warnings, by specifying
the ``dtype``

*Previous Behavior*

.. ipython:: python
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

np.asarray(ser, dtype='datetime64[ns]')

*Future Behavior*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually call this current


.. ipython:: python

# New behavior
np.asarray(ser, dtype=object)


Or by using :meth:`Series.to_numpy`

.. ipython:: python

ser.to_numpy()
jreback marked this conversation as resolved.
Show resolved Hide resolved
ser.to_numpy(dtype="datetime64[ns]")

All the above applies to a :class:`DatetimeIndex` with tz-aware values as well.

.. _whatsnew_0240.prior_deprecations:

Removal of prior version deprecations/changes
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,7 @@ def _resolution(self):
# Array-Like / EA-Interface Methods

def __array__(self, dtype=None):
if is_object_dtype(dtype):
if is_object_dtype(dtype) or (dtype is None and self.tz):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
return np.array(list(self), dtype=object)
elif is_int64_dtype(dtype):
return self.asi8
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1020,7 +1020,7 @@ def maybe_cast_to_datetime(value, dtype, errors='raise'):
# datetime64tz is assumed to be naive which should
# be localized to the timezone.
is_dt_string = is_string_dtype(value)
value = to_datetime(value, errors=errors)
value = to_datetime(value, errors=errors).array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to look at this closer. maybe_cast_to_datetime seems in need of an overhaul (along with all of sanitize_array) but this at least avoids the warning.

if is_dt_string:
# Strings here are naive, so directly localize
value = value.tz_localize(dtype.tz)
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,7 @@ def _hash_categories(categories, ordered=True):
from pandas.core.util.hashing import (
hash_array, _combine_hash_arrays, hash_tuples
)
from pandas.core.dtypes.common import is_datetime64tz_dtype, _NS_DTYPE
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

if len(categories) and isinstance(categories[0], tuple):
# assumes if any individual category is a tuple, then all our. ATM
Expand All @@ -420,6 +421,11 @@ def _hash_categories(categories, ordered=True):
# find a better solution
hashed = hash((tuple(categories), ordered))
return hashed

if is_datetime64tz_dtype(categories.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you categories.to_numpy() always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, possibly. We'll still need the special case for datetime64tz_dtype to pass dtype=_NS_DTYPE, since Index[datetime64[ns, tz]].to_numpy() returns an ndarray of Timestamp objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a TODO here, this is kind of special casing

# Avoid future warning.
categories = categories.astype(_NS_DTYPE)

cat_array = hash_array(np.asarray(categories), categorize=False)
if ordered:
cat_array = np.vstack([
Expand Down
5 changes: 2 additions & 3 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1271,8 +1271,8 @@ def f(self, **kwargs):
def first_compat(x, axis=0):

def first(x):
x = x.to_numpy()

x = np.asarray(x)
x = x[notna(x)]
if len(x) == 0:
return np.nan
Expand All @@ -1286,8 +1286,7 @@ def first(x):
def last_compat(x, axis=0):

def last(x):

x = np.asarray(x)
x = x.to_numpy()
x = x[notna(x)]
if len(x) == 0:
return np.nan
Expand Down
16 changes: 15 additions & 1 deletion pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,21 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, dtype=None):

# --------------------------------------------------------------------

def __array__(self, dtype=None):
if (dtype is None and isinstance(self._data, DatetimeArray)
and getattr(self.dtype, 'tz', None)):
msg = (
"Converting timezone-aware DatetimeArray to timezone-naive "
"ndarray with 'datetime64[ns]' dtype. In the future, this "
"will return an ndarray with 'object' dtype where each "
"element is a 'pandas.Timestamp' with the correct 'tz'.\n\t"
"To accept the future behavior, pass 'dtype=object'.\n\t"
"To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
)
warnings.warn(msg, FutureWarning, stacklevel=3)
dtype = 'M8[ns]'
return np.asarray(self._data, dtype=dtype)

@property
def dtype(self):
return self._eadata.dtype
Expand Down Expand Up @@ -1114,7 +1129,6 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):

strftime = ea_passthrough(DatetimeArray.strftime)
_has_same_tz = ea_passthrough(DatetimeArray._has_same_tz)
__array__ = ea_passthrough(DatetimeArray.__array__)

@property
def offset(self):
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,7 +581,12 @@ def can_do_equal_len():
setter(item, v)

# we have an equal len ndarray/convertible to our labels
elif np.array(value).ndim == 2:
# hasattr first, to avoid coercing to ndarray without reason.
# But we may be relying on the ndarray coercion to check ndim.
# Why not just convert to an ndarray earlier on if needed?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping to clean up the type on value a bit to avoid this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a TODO for any section that we should change later

elif ((hasattr(value, 'ndim') and value.ndim == 2)
or (not hasattr(value, 'ndim') and
np.array(value).ndim) == 2):

# note that this coerces the dtype if we are mixed
# GH 7551
Expand Down
26 changes: 20 additions & 6 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1447,8 +1447,20 @@ def quantile(self, qs, interpolation='linear', axis=0):
-------
Block
"""
values = self.get_values()
values, _ = self._try_coerce_args(values, values)
if self.is_datetimetz:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting super messy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. A proper fix is updating _try_coerce_args / get_values, which I think @jbrockmendel is working on. But this is necessary now to avoid the warning / conversion to object dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the branches I have in progress would help here.

Allowing for DatetimeArray to be reshaped to (1, nrows) would.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO here

# TODO: cleanup this special case.
# We need to operate on i8 values for datetimetz
# but `Block.get_values()` returns an ndarray of objects
# right now. We need an API for "values to do numeric-like ops on"
values = self.values.asi8

# TODO: NonConsolidatableMixin shape
# Usual shape inconsistencies for ExtensionBlocks
if self.ndim > 1:
values = values[None, :]
else:
values = self.get_values()
values, _ = self._try_coerce_args(values, values)

is_empty = values.shape[axis] == 0
orig_scalar = not is_list_like(qs)
Expand Down Expand Up @@ -2055,10 +2067,6 @@ def _na_value(self):
def fill_value(self):
return tslibs.iNaT

def to_dense(self):
# TODO(DatetimeBlock): remove
return np.asarray(self.values)

def get_values(self, dtype=None):
"""
return object dtype as boxed values, such as Timestamps/Timedelta
Expand Down Expand Up @@ -2330,6 +2338,12 @@ def get_values(self, dtype=None):
values = values.reshape(1, -1)
return values

def to_dense(self):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
# we request M8[ns] dtype here, even though it discards tzinfo,
# as lots of code (e.g. anything using values_from_object)
# expects that behavior.
return np.asarray(self.values, dtype=_NS_DTYPE)

def _slice(self, slicer):
""" return a slice of my values """
if isinstance(slicer, tuple):
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from pandas.core.indexes import base as ibase
from pandas.core.internals import (
create_block_manager_from_arrays, create_block_manager_from_blocks)
from pandas.core.internals.arrays import extract_array

# ---------------------------------------------------------------------
# BlockManager Interface
Expand Down Expand Up @@ -539,7 +540,6 @@ def sanitize_array(data, index, dtype=None, copy=False,
Sanitize input data to an ndarray, copy if specified, coerce to the
dtype if specified.
"""

if dtype is not None:
dtype = pandas_dtype(dtype)

Expand All @@ -552,8 +552,10 @@ def sanitize_array(data, index, dtype=None, copy=False,
else:
data = data.copy()

data = extract_array(data, extract_numpy=True)

# GH#846
if isinstance(data, (np.ndarray, Index, ABCSeries)):
if isinstance(data, np.ndarray):

if dtype is not None:
subarr = np.array(data, copy=False)
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,9 @@ def f(values, axis=None, skipna=True, **kwds):

def _bn_ok_dtype(dt, name):
# Bottleneck chokes on datetime64
if (not is_object_dtype(dt) and not is_datetime_or_timedelta_dtype(dt)):
if (not is_object_dtype(dt) and
not (is_datetime_or_timedelta_dtype(dt) or
is_datetime64tz_dtype(dt))):

# GH 15507
# bottleneck does not properly upcast during the sum
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/reshape/tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from pandas._libs.lib import infer_dtype

from pandas.core.dtypes.common import (
ensure_int64, is_categorical_dtype, is_datetime64_dtype,
_NS_DTYPE, ensure_int64, is_categorical_dtype, is_datetime64_dtype,
is_datetime64tz_dtype, is_datetime_or_timedelta_dtype, is_integer,
is_scalar, is_timedelta64_dtype)
from pandas.core.dtypes.missing import isna
Expand Down Expand Up @@ -226,7 +226,10 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
raise ValueError('Overlapping IntervalIndex is not accepted.')

else:
bins = np.asarray(bins)
if is_datetime64tz_dtype(bins):
jreback marked this conversation as resolved.
Show resolved Hide resolved
bins = np.asarray(bins, dtype=_NS_DTYPE)
else:
bins = np.asarray(bins)
bins = _convert_bin_to_numeric_type(bins, dtype)
if (np.diff(bins) < 0).any():
raise ValueError('bins must increase monotonically.')
Expand Down
66 changes: 61 additions & 5 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
is_extension_array_dtype, is_extension_type, is_hashable, is_integer,
is_iterator, is_list_like, is_scalar, is_string_like, is_timedelta64_dtype)
from pandas.core.dtypes.generic import (
ABCDataFrame, ABCDatetimeIndex, ABCSeries, ABCSparseArray, ABCSparseSeries)
ABCDataFrame, ABCDatetimeArray, ABCDatetimeIndex, ABCSeries,
ABCSparseArray, ABCSparseSeries)
from pandas.core.dtypes.missing import (
isna, na_value_for_dtype, notna, remove_na_arraylike)

Expand Down Expand Up @@ -661,11 +662,66 @@ def view(self, dtype=None):
# ----------------------------------------------------------------------
# NDArray Compat

def __array__(self, result=None):
def __array__(self, dtype=None):
"""
The array interface, return my values.
"""
return self.get_values()
Return the values as a NumPy array.

Users should not call this directly. Rather, it is invoked by
:func:`numpy.array` and :func:`numpy.asarray`.

Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to use for the resulting NumPy array. By default,
the dtype is inferred from the data.

Returns
-------
numpy.ndarray
The values in the series converted to a :class:`numpy.ndarary`
with the specified `dtype`.

See Also
--------
pandas.array : Create a new array from data.
Series.array : Zero-copy view to the array backing the Series.
Series.to_numpy : Series method for similar behavior.

TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> np.asarray(ser)
array([1, 2, 3])

For timezone-aware data, the timezones may be retained with
``dtype='object'``

>>> tzser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> np.asarray(tzser, dtype="object")
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET', freq='D'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET', freq='D')],
dtype=object)

Or the values may be localized to UTC and the tzinfo discared with
``dtype='datetime64[ns]'``

>>> np.asarray(tzser, dtype="datetime64[ns]") # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', ...],
dtype='datetime64[ns]')
"""
if (dtype is None and isinstance(self.array, ABCDatetimeArray)
and getattr(self.dtype, 'tz', None)):
msg = (
"Converting timezone-aware DatetimeArray to timezone-naive "
"ndarray with 'datetime64[ns]' dtype. In the future, this "
"will return an ndarray with 'object' dtype where each "
"element is a 'pandas.Timestamp' with the correct 'tz'.\n\t"
"To accept the future behavior, pass 'dtype=object'.\n\t"
"To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'."
)
warnings.warn(msg, FutureWarning, stacklevel=3)
dtype = 'M8[ns]'
return np.asarray(self.array, dtype)

def __array_wrap__(self, result, context=None):
"""
Expand Down
Loading