-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: dispatch to EA.astype #22343
API: dispatch to EA.astype #22343
Changes from all commits
a7ba8f6
26993fe
6eeec11
f1b860f
5c44275
de1fb5b
f147635
767e3ee
5602330
2606d02
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -647,7 +647,16 @@ def conv(r, dtype): | |
|
||
def astype_nansafe(arr, dtype, copy=True): | ||
""" return a view if copy is False, but | ||
need to be very careful as the result shape could change! """ | ||
need to be very careful as the result shape could change! | ||
|
||
Parameters | ||
---------- | ||
arr : ndarray | ||
dtype : np.dtype | ||
copy : bool, default True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you specify here what happens with False? (will always do view (except for object), even if the itemsize is incompatible) |
||
If False, a view will be attempted but may fail, if | ||
e.g. the itemsizes don't align. | ||
""" | ||
|
||
# dispatch on extension dtype if needed | ||
if is_extension_array_dtype(dtype): | ||
|
@@ -733,8 +742,10 @@ def astype_nansafe(arr, dtype, copy=True): | |
FutureWarning, stacklevel=5) | ||
dtype = np.dtype(dtype.name + "[ns]") | ||
|
||
if copy: | ||
if copy or is_object_dtype(arr) or is_object_dtype(dtype): | ||
# Explicit copy, or required since NumPy can't view from / to object. | ||
return arr.astype(dtype, copy=True) | ||
|
||
return arr.view(dtype) | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -637,22 +637,25 @@ def _astype(self, dtype, copy=False, errors='raise', values=None, | |
# force the copy here | ||
if values is None: | ||
|
||
if issubclass(dtype.type, | ||
(compat.text_type, compat.string_types)): | ||
if self.is_extension: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger are you really really sure this is needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needed for #22325. Sparse has special semantics for In general though, it seems like EAs should have a say in how they're astyped, rather than always going through There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we already do this is my point There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not on master? That hit's https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals/blocks.py#L652, which converts to an ndarray, before ever calling the extension array's astype. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least, my new tests fail on master without these changes. I'm not sure if / how IntegerArray is being handled differently. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you mean the current code in master or this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this PR there is a ton of code in astype to dispatch to extension types already There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you point to this "ton of code"? I don't see any other dispatch to EAs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a 2 line change, this extra if condition. Which of those two lines is the convoluted one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, I could swear I add an |
||
values = self.values.astype(dtype) | ||
else: | ||
if issubclass(dtype.type, | ||
(compat.text_type, compat.string_types)): | ||
|
||
# use native type formatting for datetime/tz/timedelta | ||
if self.is_datelike: | ||
values = self.to_native_types() | ||
# use native type formatting for datetime/tz/timedelta | ||
if self.is_datelike: | ||
values = self.to_native_types() | ||
|
||
# astype formatting | ||
else: | ||
values = self.get_values() | ||
# astype formatting | ||
else: | ||
values = self.get_values() | ||
|
||
else: | ||
values = self.get_values(dtype=dtype) | ||
else: | ||
values = self.get_values(dtype=dtype) | ||
|
||
# _astype_nansafe works fine with 1-d only | ||
values = astype_nansafe(values.ravel(), dtype, copy=True) | ||
# _astype_nansafe works fine with 1-d only | ||
values = astype_nansafe(values.ravel(), dtype, copy=True) | ||
|
||
# TODO(extension) | ||
# should we make this attribute? | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,17 @@ class DecimalDtype(ExtensionDtype): | |
name = 'decimal' | ||
na_value = decimal.Decimal('NaN') | ||
|
||
def __init__(self, context=None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seemed like a good idea to have a parameterized ExtensionDtype in the tests folder. For |
||
self.context = context or decimal.getcontext() | ||
|
||
def __eq__(self, other): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our default eq is dangerous for parametrized dtypes, depending on whether the parameters appear in the name :/ |
||
if isinstance(other, type(self)): | ||
return self.context == other.context | ||
return super(DecimalDtype, self).__eq__(other) | ||
|
||
def __repr__(self): | ||
return 'DecimalDtype(context={})'.format(self.context) | ||
|
||
@classmethod | ||
def construct_array_type(cls): | ||
"""Return the array type associated with this dtype | ||
|
@@ -35,13 +46,12 @@ def construct_from_string(cls, string): | |
|
||
|
||
class DecimalArray(ExtensionArray, ExtensionScalarOpsMixin): | ||
dtype = DecimalDtype() | ||
|
||
def __init__(self, values, dtype=None, copy=False): | ||
def __init__(self, values, dtype=None, copy=False, context=None): | ||
for val in values: | ||
if not isinstance(val, self.dtype.type): | ||
if not isinstance(val, decimal.Decimal): | ||
raise TypeError("All values must be of type " + | ||
str(self.dtype.type)) | ||
str(decimal.Decimal)) | ||
values = np.asarray(values, dtype=object) | ||
|
||
self._data = values | ||
|
@@ -51,6 +61,11 @@ def __init__(self, values, dtype=None, copy=False): | |
# those aliases are currently not working due to assumptions | ||
# in internal code (GH-20735) | ||
# self._values = self.values = self.data | ||
self._dtype = DecimalDtype(context) | ||
|
||
@property | ||
def dtype(self): | ||
return self._dtype | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype=None, copy=False): | ||
|
@@ -82,6 +97,11 @@ def copy(self, deep=False): | |
return type(self)(self._data.copy()) | ||
return type(self)(self) | ||
|
||
def astype(self, dtype, copy=True): | ||
if isinstance(dtype, type(self.dtype)): | ||
return type(self)(self._data, context=dtype.context) | ||
return super(DecimalArray, self).astype(dtype, copy) | ||
|
||
def __setitem__(self, key, value): | ||
if pd.api.types.is_list_like(value): | ||
value = [decimal.Decimal(v) for v in value] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -205,6 +205,27 @@ def test_dataframe_constructor_with_dtype(): | |
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
@pytest.mark.parametrize("frame", [True, False]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why don't we add a test in base which raises NotImplementedError so that authors are forced to cover this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But is it useful to force EA authors to do it? This is basically checking the pandas dispatch (which we can do here), not the actual EA.astype implementation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or how about something like @pytest.mark.parametrize('typ, check', [
('category', 'is_categorical_dtype'),
...
])
def test_astype_category(self, data):
assert check(data.astype(dtype)) Would that make sense as a base test? I think our default implementation would need to be updated to not fail that. My main concern is that it would be difficult to override (e.g. skip) just some of the dtypes, so maybe we would have to write those as separate tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm mildly concerned that EA authors will not correctly handle extension types. Our base implementation currently fails to handle them. Although we document it as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would leave such a test until we better figure out how to handle those cross EA/non-EA astype calls (the discussion we were having above in this PR )
Yeah, probably yes in any case (since EAs authors that provide multiple dtypes already do that, like IntegerArray) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add an issue to make this test, not blocking this PR on it. |
||
def test_astype_dispatches(frame): | ||
# This is a dtype-specific test that ensures Series[decimal].astype | ||
# gets all the way through to ExtensionArray.astype | ||
# Designing a reliable smoke test that works for arbitrary data types | ||
# is difficult. | ||
data = pd.Series(DecimalArray([decimal.Decimal(2)]), name='a') | ||
ctx = decimal.Context() | ||
ctx.prec = 5 | ||
|
||
if frame: | ||
data = data.to_frame() | ||
|
||
result = data.astype(DecimalDtype(ctx)) | ||
|
||
if frame: | ||
result = result['a'] | ||
|
||
assert result.dtype.context.prec == ctx.prec | ||
|
||
|
||
class TestArithmeticOps(BaseDecimal, base.BaseArithmeticOpsTests): | ||
|
||
def check_opname(self, s, op_name, other, exc=None): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was necessary so that
still raises. Previously, it did
which I don't think we want. This was only tested at the
Series[IntegerArray].astype
level, which never called EA.astypeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this?