-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: decorator for ops boilerplate #24282
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
pandas/core/arrays/categorical.py
Outdated
# Note: using unpack_and_defer here doesn't break any tests, but the | ||
# behavior here is idiosyncratic enough that I'm not confident enough | ||
# to change it. | ||
# @ops.unpack_and_defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger are you the right person to ask about Categorical methods? In particular, why does this only return NotImplemented for ABCSeries and not for ABCDataFrame and ABCIndexClass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. Returning NotImplemented seems reasonable, as long as it ends up back here with the unboxed values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Categorical == DataFrame is weird both before and after
data = ["a", "b", 2, "a"]
cat = pd.Categorical(data)
idx = pd.CategoricalIndex(cat)
ser = pd.Series(cat)
df = pd.DataFrame(cat)
master
>>> cat == df
array([[ True, False, False, True],
[False, True, False, False],
[False, False, True, False],
[ True, False, False, True]])
PR
>>> cat == df
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 2127, in f
other = _align_method_FRAME(self, other, axis=None)
File "pandas/core/ops.py", line 2021, in _align_method_FRAME
right = to_series(right)
File "pandas/core/ops.py", line 1983, in to_series
given_len=len(right)))
ValueError: Unable to coerce to Series, length must be 1: given 4
If we transpose then we get all-True, the only difference being that in the PR the result is a DataFrame instead of an ndarray.
@@ -36,6 +36,8 @@ | |||
def _make_comparison_op(cls, op): | |||
# TODO: share code with indexes.base version? Main difference is that | |||
# the block for MultiIndex was removed here. | |||
|
|||
# @ops.unpack_and_defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using this here leads to recursion errors, related to the fact that this only returns NotImplemented for ABCDataFrame. I think this will be easier to resolve after the change to composition.
@@ -573,6 +564,8 @@ def _maybe_mask_result(self, result, mask, other, op_name): | |||
|
|||
@classmethod | |||
def _create_arithmetic_method(cls, op): | |||
|
|||
# @ops.unpack_and_defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback using the decorator here breaks a bunch of tests, specifically ones operating with DataFrame. Why does this return NotImplemented for Series/Index but not DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, why use other = other.item()
instead of item_from_zerodim
?
pandas/core/arrays/period.py
Outdated
@@ -51,6 +51,7 @@ def _period_array_cmp(cls, op): | |||
opname = '__{name}__'.format(name=op.__name__) | |||
nat_result = True if opname == '__ne__' else False | |||
|
|||
# @ops.unwrap_and_defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger any idea why not returning NotImplemented for DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure... I vaguely remember a broadcasting error, but that may have been user error.
@@ -1650,13 +1651,11 @@ def sparse_unary_method(self): | |||
|
|||
@classmethod | |||
def _create_arithmetic_method(cls, op): | |||
|
|||
@ops.unpack_and_defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both here and below this is technically a change since in the status quo this doesn't defer to DataFrame. Also doesn't call item_from_zerodim ATM.
if len(other) != len(self) and not is_timedelta64_dtype(other): | ||
# Exclude timedelta64 here so we correctly raise TypeError | ||
# for that instead of ValueError | ||
raise ValueError("Cannot multiply with unequal lengths") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small change (reflected in a test below). tdarr * tdarr[:-1]
checks for length mismatch before checking for dtype compat, so now raises ValueError instead of TypeError.
@@ -63,6 +63,8 @@ def _try_get_item(x): | |||
|
|||
|
|||
def _make_comparison_op(op, cls): | |||
|
|||
# @ops.unpack_and_defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index.__eq__(Series)
doesn't return NotImplemented, kind of inconvenient special case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a good idea.
Codecov Report
@@ Coverage Diff @@
## master #24282 +/- ##
==========================================
+ Coverage 92.22% 92.23% +<.01%
==========================================
Files 162 162
Lines 51828 51785 -43
==========================================
- Hits 47798 47763 -35
+ Misses 4030 4022 -8
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24282 +/- ##
==========================================
+ Coverage 92.22% 92.23% +0.01%
==========================================
Files 162 162
Lines 51824 51774 -50
==========================================
- Hits 47795 47756 -39
+ Misses 4029 4018 -11
Continue to review full report at Codecov.
|
After a bunch more poking at this, I think we should hold off on applying this decorator for the methods where it would change behavior, in particular for Categorical and PeriodArray comparisons. Implementing tests for these would be an afterthought, runs the risk of being half-baked. |
@jreback thoughts here? I'm increasingly leaning towards being even stricter, reverting usages that change behavior (basically everything outside of timedeltas) and incrementally re-implementing those in follow-ups with appropriate tests (i.e. things like ops with zero-dim arrays) |
not sure exactly what you mean here. with the decorator? |
The decorator is in fairly good shape, the question is where we want to actually use the decorator. AFAICT every place the decorator is used ATM implies some non-zero change to the behavior of the affected function. For e.g. But pretty much all the cases in between make not-quite-trivial changes to the methods' behaviors with no accompanying tests. The decision is where to draw then line on those, and my current thought is to be relatively strict for now. |
can you highlite this change via an example(s)? |
IntNA
Sparse
|
Also, thoughts on collecting categorical and sparse arithmetic/comparison tests in tests.arithmetic? I need to take a look at how to improve coverage for these without combinatorial explosion. |
When comparing with a generator, we want the behavior on master, right? That matches NumPy's behavior anyway. |
As an alternative, you could add an |
That's not a bad idea. Regardless, this can wait until the next test-parametrization pass. |
If that's the case, then we probably need to revert that part of the decorator anyway right? It isn't obvious to me that is the desired behavior. Is there a reason why would do unroll generators for arithmetic ops but not for comparisons? |
I'm not sure either. You could ask on the numpy mailing list to see if that
behavior is deliberate. I know that NumPy 1.16 is deprecating passing
generators to things like `np.concatenate`.
…On Wed, Dec 19, 2018 at 5:01 PM jbrockmendel ***@***.***> wrote:
When comparing with a generator, we want the behavior on master, right?
That matches NumPy's behavior anyway.
If that's the case, then we probably need to revert that part of the
decorator anyway right?
It isn't obvious to me that is the desired behavior. Is there a reason why
would do unroll generators for arithmetic ops but not for comparisons?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24282 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIs1MQjNWQ4UI55IktcmueKsaFW6bks5u6sVOgaJpZM4ZUHSY>
.
|
OK. It seems like this PR is going to be in a holding pattern for a while. I'll make a separate PR to implement the changes+tests for Categorical if you're OK with those. |
@TomAugspurger are you on board with the changes to Categorical comparison behavior? (easiest to look at the new test) |
IIUC the change in https://github.com/pandas-dev/pandas/pull/24282/files#diff-c859b3060cdd05f4d0f693b0a4b71fc5R130 is to raise on |
I still think this is worth doing, but I'm not comfortable with introducing lots of small changes in behavior without accompanying tests. Closing now, will try a new approach after the RC. |
git diff upstream/master -u -- "*.py" | flake8 --diff
There are a handful of places where this decorator is applied but commented-out. This is the "WIP" part of things.