Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default repr for EAs #23601

Merged
merged 54 commits into from
Dec 4, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
0fdbfd3
wip
TomAugspurger Nov 9, 2018
ace62aa
Deprecate formatting_values
TomAugspurger Nov 9, 2018
6e76b51
test for warning
TomAugspurger Nov 9, 2018
fef04e6
compat
TomAugspurger Nov 9, 2018
1885a97
na formatter
TomAugspurger Nov 9, 2018
ecfcd72
clean
TomAugspurger Nov 9, 2018
4e0d91f
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 9, 2018
37638cc
wip
TomAugspurger Nov 9, 2018
6e64b7b
more cleanup
TomAugspurger Nov 9, 2018
193747e
update docs, type
TomAugspurger Nov 9, 2018
5a2e1e4
format
TomAugspurger Nov 9, 2018
1635b73
try this
TomAugspurger Nov 9, 2018
e2b1941
updates
TomAugspurger Nov 9, 2018
48e55cc
fixup interval
TomAugspurger Nov 10, 2018
d8e7ba4
py2 compat
TomAugspurger Nov 10, 2018
b312fe4
revert interval
TomAugspurger Nov 10, 2018
445736d
unicode, bytes
TomAugspurger Nov 10, 2018
60e0d02
isort
TomAugspurger Nov 10, 2018
5b07906
py3 fixup
TomAugspurger Nov 10, 2018
ff0c998
fixup
TomAugspurger Nov 10, 2018
2fd3d5d
unicode
TomAugspurger Nov 10, 2018
5d8d2fc
unicode
TomAugspurger Nov 10, 2018
baee6b2
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 10, 2018
4d343ea
unicode
TomAugspurger Nov 10, 2018
5b291d5
lint
TomAugspurger Nov 10, 2018
1b93bf0
update repr tests
TomAugspurger Nov 11, 2018
708dd75
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 12, 2018
0f4083e
remove periodarray
TomAugspurger Nov 12, 2018
9116930
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 12, 2018
ebadf6f
FutureWarning -> DeprecationWarning
TomAugspurger Nov 12, 2018
e5f6976
wip
TomAugspurger Nov 12, 2018
221cee9
use repr
TomAugspurger Nov 12, 2018
439f2f8
fixup! use repr
TomAugspurger Nov 12, 2018
2364546
fixup! fixup! use repr
TomAugspurger Nov 12, 2018
62b1e2f
remove bytes
TomAugspurger Nov 12, 2018
a926dca
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 14, 2018
fc4279d
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 15, 2018
27db397
simplify formatter
TomAugspurger Nov 15, 2018
5c253a4
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 19, 2018
ef390fc
Updates: misc
TomAugspurger Nov 19, 2018
2b5fe25
BUG: Fixed SparseArray formatter
TomAugspurger Nov 19, 2018
d84cc02
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 20, 2018
d9df6bf
correct boxing
TomAugspurger Nov 20, 2018
a35399e
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 20, 2018
740f9e5
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 28, 2018
e7cc2ac
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 28, 2018
c79ba0b
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 2, 2018
3825aeb
Use Array formatter in PeriodIndex
TomAugspurger Dec 2, 2018
2a60c15
Use repr / str
TomAugspurger Dec 2, 2018
bccf40d
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
a7ef104
Update for review
TomAugspurger Dec 3, 2018
a3b1c92
REF: removed trailing_comma argument
TomAugspurger Dec 3, 2018
e080023
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
6ad113b
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`).
- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`).
- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`).
- A default repr for ExtensionArrays is now provided (:issue:`23601`).
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

.. _whatsnew_0240.api.incompatibilities:

Expand Down Expand Up @@ -966,6 +967,7 @@ Deprecations
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :meth:`ExtensionArray._formatting_values` is deprecated. Use `ExtensionArray._formatter` instead. (:issue:`23601`)

.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Expand Down Expand Up @@ -1118,6 +1120,7 @@ Datetimelike
- Bug in rounding methods of :class:`DatetimeIndex` (:meth:`~DatetimeIndex.round`, :meth:`~DatetimeIndex.ceil`, :meth:`~DatetimeIndex.floor`) and :class:`Timestamp` (:meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, :meth:`~Timestamp.floor`) could give rise to loss of precision (:issue:`22591`)
- Bug in :func:`to_datetime` with an :class:`Index` argument that would drop the ``name`` from the result (:issue:`21697`)
- Bug in :class:`PeriodIndex` where adding or subtracting a :class:`timedelta` or :class:`Tick` object produced incorrect results (:issue:`22988`)
- Bug in the :class:`Series` repr with Period data missing a space before the data (:issue:`23601`)
- Bug in :func:`date_range` when decrementing a start date to a past end date by a negative frequency (:issue:`23270`)
- Bug in :meth:`Series.min` which would return ``NaN`` instead of ``NaT`` when called on a series of ``NaT`` (:issue:`23282`)
- Bug in :func:`DataFrame.combine` with datetimelike values raising a TypeError (:issue:`23079`)
Expand Down
54 changes: 53 additions & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ class ExtensionArray(object):

* _formatting_values
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

A default repr displaying the type, (truncated) data, length,
and dtype is provided. It can be customized or replaced by
by overriding:

* _formatter
* __repr__
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

Some methods require casting the ExtensionArray to an ndarray of Python
objects with ``self.astype(object)``, which may be expensive. When
performance is a concern, we highly recommend overriding the following
Expand Down Expand Up @@ -653,15 +660,60 @@ def copy(self, deep=False):
raise AbstractMethodError(self)

# ------------------------------------------------------------------------
# Block-related methods
# Printing
# ------------------------------------------------------------------------

def __repr__(self):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
from pandas.io.formats.printing import format_object_summary
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

template = (
'<{class_name}>\n'
'{data}\n'
'Length: {length}, dtype: {dtype}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to define the “unicode
we do this in Base for all pandas objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are writing new code here but this should be consistent as well (it’s ok to change that too)
but to have a completely different impl is odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed... I left the implementation in repr, and then encoded / decoded as needed in __unicdoe__ and __bytes__ if that's OK.

)
# the short repr has no trailing newline, while the truncated
# repr does. So we include a newline in our template, and strip
# any trailing newlines from format_object_summary
data = format_object_summary(self, self._formatter(), name=False,
trailing_comma=False).rstrip('\n')
name = self.__class__.__name__
return template.format(class_name=name, data=data,
length=len(self),
dtype=self.dtype)

def _formatter(self, formatter=None):
# type: (Optional[ExtensionArrayFormatter]) -> Callable[[Any], str]
"""Formatting function for scalar values.

This is used in the default '__repr__'. The formatting function
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
receives instances of your scalar type.

Parameters
----------
formatter: GenericArrayFormatter, optional
The formatter this array is being rendered with. The formatter
may have a `.formatter` method already defined. By default, this
will be used if a `formatter` is passed, otherwise the formatter
is ``str``.
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
Callable[[Any], str]
A callable that gets instances of the scalar type and
returns a string.
"""
return getattr(formatter, 'formatter', None) or str

def _formatting_values(self):
# type: () -> np.ndarray
# At the moment, this has to be an array since we use result.dtype
"""An array of values to be printed in, e.g. the Series repr"""
return np.array(self)

# ------------------------------------------------------------------------
# Reshaping
# ------------------------------------------------------------------------

@classmethod
def _concat_same_type(cls, to_concat):
# type: (Sequence[ExtensionArray]) -> ExtensionArray
Expand Down
9 changes: 6 additions & 3 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,10 @@ def _constructor(self):
def _from_sequence(cls, scalars, dtype=None, copy=False):
return Categorical(scalars, dtype=dtype)

def _formatter(self, formatter):
# backwards compat with old printing.
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
return None

def copy(self):
""" Copy constructor. """
return self._constructor(values=self._codes.copy(),
Expand Down Expand Up @@ -1986,6 +1990,8 @@ def __unicode__(self):

return result

__repr__ = __unicode__

def _maybe_coerce_indexer(self, indexer):
""" return an indexer coerced to the codes dtype """
if isinstance(indexer, np.ndarray) and indexer.dtype.kind == 'i':
Expand Down Expand Up @@ -2342,9 +2348,6 @@ def _concat_same_type(self, to_concat):

return _concat_categorical(to_concat)

def _formatting_values(self):
return self

def isin(self, values):
"""
Check whether `values` are contained in Categorical.
Expand Down
37 changes: 10 additions & 27 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from pandas._libs import lib
from pandas.util._decorators import cache_readonly
from pandas.compat import u, range, string_types
from pandas.compat import range, string_types
from pandas.compat import set_function_name

from pandas.core import nanops
Expand All @@ -24,9 +24,6 @@
from pandas.core.dtypes.dtypes import register_extension_dtype
from pandas.core.dtypes.missing import isna, notna

from pandas.io.formats.printing import (
format_object_summary, format_object_attrs, default_pprint)


class _IntegerDtype(ExtensionDtype):
"""
Expand Down Expand Up @@ -267,6 +264,15 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
def _from_factorized(cls, values, original):
return integer_array(values, dtype=original.dtype)

def _formatter(self, formatter=None):
if formatter is None:
def fmt(x):
if isna(x):
return 'NaN'
return str(x)
return fmt
return formatter.formatter

def __getitem__(self, item):
if is_integer(item):
if self._mask[item]:
Expand Down Expand Up @@ -300,10 +306,6 @@ def __iter__(self):
else:
yield self._data[i]

def _formatting_values(self):
# type: () -> np.ndarray
return self._coerce_to_ndarray()

def take(self, indexer, allow_fill=False, fill_value=None):
from pandas.api.extensions import take

Expand Down Expand Up @@ -353,25 +355,6 @@ def __setitem__(self, key, value):
def __len__(self):
return len(self._data)

def __repr__(self):
"""
Return a string representation for this object.

Invoked by unicode(df) in py2 only. Yields a Unicode String in both
py2/py3.
"""
klass = self.__class__.__name__
data = format_object_summary(self, default_pprint, False)
attrs = format_object_attrs(self)
space = " "

prepr = (u(",%s") %
space).join(u("%s=%s") % (k, v) for k, v in attrs)

res = u("%s(%s%s)") % (klass, data, prepr)

return res

@property
def nbytes(self):
return self._data.nbytes + self._mask.nbytes
Expand Down
3 changes: 0 additions & 3 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -689,9 +689,6 @@ def copy(self, deep=False):
# TODO: Could skip verify_integrity here.
return type(self).from_arrays(left, right, closed=closed)

def _formatting_values(self):
return np.asarray(self)

def isna(self):
return isna(self.left)

Expand Down
11 changes: 4 additions & 7 deletions pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,13 +330,10 @@ def start_time(self):
def end_time(self):
return self.to_timestamp(how='end')

def __repr__(self):
return '<{}>\n{}\nLength: {}, dtype: {}'.format(
self.__class__.__name__,
[str(s) for s in self],
len(self),
self.dtype
)
def _formatter(self, formatter=None):
if formatter:
return formatter.formatter or str
return "'{}'".format
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

def __setitem__(
self,
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -1661,6 +1661,16 @@ def __unicode__(self):
fill=printing.pprint_thing(self.fill_value),
index=printing.pprint_thing(self.sp_index))

def _formatter(self, formatter=None):
if formatter is None:
def fmt(x):
if isna(x) and isinstance(x, float):
return 'NaN'
return str(x)

return fmt
return formatter.formatter


SparseArray._add_arithmetic_ops()
SparseArray._add_comparison_ops()
Expand Down
16 changes: 14 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
import pandas.core.missing as missing
from pandas.core.base import PandasObject

from pandas.core.arrays import Categorical
from pandas.core.arrays import Categorical, ExtensionArray

from pandas.core.indexes.datetimes import DatetimeIndex
from pandas.core.indexes.timedeltas import TimedeltaIndex
Expand Down Expand Up @@ -1951,7 +1951,19 @@ def _slice(self, slicer):
return self.values[slicer]

def formatting_values(self):
return self.values._formatting_values()
# Deprecating the ability to override _formatting_values.
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
# Do the warning here, it's only user in pandas, since we
# have to check if the subclass overrode it.
fv = getattr(type(self.values), '_formatting_values', None)
if fv and fv != ExtensionArray._formatting_values:
msg = (
"'ExtensionArray._formatting_values' is deprecated. "
"Specify 'ExtensionArray._formatter' instead."
)
warnings.warn(msg, FutureWarning, stacklevel=10)
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
return self.values._formatting_values()

return self.values

def concat_same_type(self, to_concat, placement=None):
"""
Expand Down
67 changes: 23 additions & 44 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@
from pandas.compat import StringIO, lzip, map, u, zip

from pandas.core.dtypes.common import (
is_categorical_dtype, is_datetime64_dtype, is_datetimetz, is_float,
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype,
is_list_like, is_numeric_dtype, is_period_arraylike, is_scalar,
is_categorical_dtype, is_datetime64_dtype, is_datetimetz,
is_extension_array_dtype, is_float, is_float_dtype, is_integer,
is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar,
is_timedelta64_dtype)
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSparseArray
from pandas.core.dtypes.generic import (
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray)
from pandas.core.dtypes.missing import isna, notna

from pandas import compat
Expand All @@ -29,7 +30,6 @@
from pandas.core.config import get_option, set_option
from pandas.core.index import Index, ensure_index
from pandas.core.indexes.datetimes import DatetimeIndex
from pandas.core.indexes.period import PeriodIndex

from pandas.io.common import _expand_user, _stringify_path
from pandas.io.formats.printing import adjoin, justify, pprint_thing
Expand Down Expand Up @@ -849,22 +849,18 @@ def _get_column_name_list(self):
def format_array(values, formatter, float_format=None, na_rep='NaN',
digits=None, space=None, justify='right', decimal='.'):

if is_categorical_dtype(values):
fmt_klass = CategoricalArrayFormatter
elif is_interval_dtype(values):
fmt_klass = IntervalArrayFormatter
if is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
elif is_extension_array_dtype(values.dtype):
fmt_klass = ExtensionArrayFormatter
elif is_float_dtype(values.dtype):
fmt_klass = FloatArrayFormatter
elif is_period_arraylike(values):
fmt_klass = PeriodArrayFormatter
elif is_integer_dtype(values.dtype):
fmt_klass = IntArrayFormatter
elif is_datetimetz(values):
fmt_klass = Datetime64TZFormatter
elif is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
else:
fmt_klass = GenericArrayFormatter

Expand Down Expand Up @@ -1126,39 +1122,22 @@ def _format_strings(self):
return fmt_values.tolist()


class IntervalArrayFormatter(GenericArrayFormatter):

def __init__(self, values, *args, **kwargs):
GenericArrayFormatter.__init__(self, values, *args, **kwargs)

def _format_strings(self):
formatter = self.formatter or str
fmt_values = np.array([formatter(x) for x in self.values])
return fmt_values


class PeriodArrayFormatter(IntArrayFormatter):

class ExtensionArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
from pandas.core.indexes.period import IncompatibleFrequency
try:
values = PeriodIndex(self.values).to_native_types()
except IncompatibleFrequency:
# periods may contains different freq
values = Index(self.values, dtype='object').to_native_types()

formatter = self.formatter or (lambda x: '{x}'.format(x=x))
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
fmt_values = [formatter(x) for x in values]
return fmt_values

values = self.values
if isinstance(values, (ABCIndexClass, ABCSeries)):
values = values._values

class CategoricalArrayFormatter(GenericArrayFormatter):
formatter = values._formatter(self)

def __init__(self, values, *args, **kwargs):
GenericArrayFormatter.__init__(self, values, *args, **kwargs)
if is_categorical_dtype(values.dtype):
# Categorical is special for now, so that we can preserve tzinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a TODO here? this is until DatetimeArray is fully pushed?

Copy link
Contributor Author

@TomAugspurger TomAugspurger Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on whether we're willing to change __array__ for datetime-backed series / index (right now . I'm writing up an issue now to discuss that specific point.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#23569 (comment) for that.

array = values.get_values()
else:
array = np.asarray(values)

def _format_strings(self):
fmt_values = format_array(self.values.get_values(), self.formatter,
fmt_values = format_array(array,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger : i'm struggling to resolve some formatting issues. what is the reason for calling format_array here. As far as I can tell is looping back round to create a GenericArrayFormatter instance with a formatter specified to pick up the display options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess, to be more succinct, why is super()._format_strings() not used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not that familiar with this code, but from a quick look: calling super()._format_strings() would be different, as this would call GenericArrayFormatter._format_strings, while the generic format_array can still result in using custom formatters like Datetime64(TZ)Formatter or Timedelta64Formatter, depending on what the values of the underlying EA are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although that most of those custom Formatter classes don't do much special if formatter is specified.

Eg Datetime64Formatter has this in _format_strings:

if self.formatter is not None and callable(self.formatter):
return [self.formatter(x) for x in values]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if ExtensionArrayFormatter is not inheriting from GenericArrayFormatter but calling format_array to dispatch to another ...ArrayFormatter class, why wouldn't the logic in ExtensionArrayFormatter be in format_array?

formatter,
float_format=self.float_format,
na_rep=self.na_rep, digits=self.digits,
space=self.space, justify=self.justify)
Expand Down
Loading