WIP: decorator for ops boilerplate #24282

jbrockmendel · 2018-12-14T19:13:16Z

closes CLN: use float64_t consistently instead of double, double_t #23583
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

There are a handful of places where this decorator is applied but commented-out. This is the "WIP" part of things.

…ilerplate

pep8speaks · 2018-12-14T19:13:31Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/categorical.py !
There are no PEP8 issues in the file pandas/core/arrays/datetimelike.py !
There are no PEP8 issues in the file pandas/core/arrays/datetimes.py !
There are no PEP8 issues in the file pandas/core/arrays/integer.py !
There are no PEP8 issues in the file pandas/core/arrays/period.py !
There are no PEP8 issues in the file pandas/core/arrays/sparse.py !
There are no PEP8 issues in the file pandas/core/arrays/timedeltas.py !
There are no PEP8 issues in the file pandas/core/indexes/base.py !
There are no PEP8 issues in the file pandas/core/indexes/range.py !
There are no PEP8 issues in the file pandas/core/ops.py !
There are no PEP8 issues in the file pandas/tests/arithmetic/test_timedelta64.py !

jbrockmendel · 2018-12-14T19:14:00Z

pandas/core/arrays/categorical.py

+    # Note: using unpack_and_defer here doesn't break any tests, but the
+    #  behavior here is idiosyncratic enough that I'm not confident enough
+    #  to change it.
+    # @ops.unpack_and_defer


@TomAugspurger are you the right person to ask about Categorical methods? In particular, why does this only return NotImplemented for ABCSeries and not for ABCDataFrame and ABCIndexClass?

No idea. Returning NotImplemented seems reasonable, as long as it ends up back here with the unboxed values.

Categorical == DataFrame is weird both before and after

data = ["a", "b", 2, "a"] cat = pd.Categorical(data) idx = pd.CategoricalIndex(cat) ser = pd.Series(cat) df = pd.DataFrame(cat)

master

>>> cat == df array([[ True, False, False, True], [False, True, False, False], [False, False, True, False], [ True, False, False, True]])

PR

>>> cat == df Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pandas/core/ops.py", line 2127, in f other = _align_method_FRAME(self, other, axis=None) File "pandas/core/ops.py", line 2021, in _align_method_FRAME right = to_series(right) File "pandas/core/ops.py", line 1983, in to_series given_len=len(right))) ValueError: Unable to coerce to Series, length must be 1: given 4

If we transpose then we get all-True, the only difference being that in the PR the result is a DataFrame instead of an ndarray.

jbrockmendel · 2018-12-14T19:15:01Z

pandas/core/arrays/datetimelike.py

@@ -36,6 +36,8 @@
 def _make_comparison_op(cls, op):
    # TODO: share code with indexes.base version?  Main difference is that
    # the block for MultiIndex was removed here.
+
+    # @ops.unpack_and_defer


Using this here leads to recursion errors, related to the fact that this only returns NotImplemented for ABCDataFrame. I think this will be easier to resolve after the change to composition.

pandas/core/arrays/datetimes.py

jbrockmendel · 2018-12-14T19:16:53Z

pandas/core/arrays/integer.py

@@ -573,6 +564,8 @@ def _maybe_mask_result(self, result, mask, other, op_name):

    @classmethod
    def _create_arithmetic_method(cls, op):
+
+        # @ops.unpack_and_defer


@jreback using the decorator here breaks a bunch of tests, specifically ones operating with DataFrame. Why does this return NotImplemented for Series/Index but not DataFrame?

Below, why use other = other.item() instead of item_from_zerodim?

jbrockmendel · 2018-12-14T19:18:22Z

pandas/core/arrays/period.py

@@ -51,6 +51,7 @@ def _period_array_cmp(cls, op):
    opname = '__{name}__'.format(name=op.__name__)
    nat_result = True if opname == '__ne__' else False

+    # @ops.unwrap_and_defer


@TomAugspurger any idea why not returning NotImplemented for DataFrame?

Not sure... I vaguely remember a broadcasting error, but that may have been user error.

jbrockmendel · 2018-12-14T19:19:15Z

pandas/core/arrays/sparse.py

@@ -1650,13 +1651,11 @@ def sparse_unary_method(self):

    @classmethod
    def _create_arithmetic_method(cls, op):
+
+        @ops.unpack_and_defer


Both here and below this is technically a change since in the status quo this doesn't defer to DataFrame. Also doesn't call item_from_zerodim ATM.

jbrockmendel · 2018-12-14T19:20:26Z

pandas/core/arrays/timedeltas.py

-        if len(other) != len(self) and not is_timedelta64_dtype(other):
-            # Exclude timedelta64 here so we correctly raise TypeError
-            #  for that instead of ValueError
-            raise ValueError("Cannot multiply with unequal lengths")


This is a small change (reflected in a test below). tdarr * tdarr[:-1] checks for length mismatch before checking for dtype compat, so now raises ValueError instead of TypeError.

pandas/core/arrays/timedeltas.py

jbrockmendel · 2018-12-14T19:22:15Z

pandas/core/indexes/base.py

@@ -63,6 +63,8 @@ def _try_get_item(x):


 def _make_comparison_op(op, cls):
+
+    # @ops.unpack_and_defer


Index.__eq__(Series) doesn't return NotImplemented, kind of inconvenient special case

jreback

seems like a good idea.

codecov · 2018-12-14T20:04:06Z

Codecov Report

Merging #24282 into master will increase coverage by <.01%.
The diff coverage is 88.67%.

@@            Coverage Diff             @@
##           master   #24282      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +<.01%     
==========================================
  Files         162      162              
  Lines       51828    51785      -43     
==========================================
- Hits        47798    47763      -35     
+ Misses       4030     4022       -8

Flag	Coverage Δ
#multiple	`90.63% <88.67%> (ø)`	⬆️
#single	`43.06% <58.49%> (+0.06%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimelike.py	`96.44% <ø> (ø)`	⬆️
pandas/core/arrays/period.py	`98.5% <ø> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`98.23% <ø> (-0.01%)`	⬇️
pandas/core/arrays/categorical.py	`95.31% <ø> (ø)`	⬆️
pandas/core/arrays/integer.py	`95.78% <100%> (+0.24%)`	⬆️
pandas/core/arrays/sparse.py	`92.03% <100%> (-0.06%)`	⬇️
pandas/core/indexes/base.py	`96.16% <100%> (-0.12%)`	⬇️
pandas/core/indexes/range.py	`97.29% <100%> (-0.04%)`	⬇️
pandas/core/arrays/timedeltas.py	`87.65% <80.95%> (+0.49%)`	⬆️
pandas/core/ops.py	`94.51% <90.9%> (+0.25%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 040f06f...2f929af. Read the comment docs.

codecov · 2018-12-14T20:04:07Z

Codecov Report

Merging #24282 into master will increase coverage by 0.01%.
The diff coverage is 95.91%.

@@            Coverage Diff             @@
##           master   #24282      +/-   ##
==========================================
+ Coverage   92.22%   92.23%   +0.01%     
==========================================
  Files         162      162              
  Lines       51824    51774      -50     
==========================================
- Hits        47795    47756      -39     
+ Misses       4029     4018      -11

Flag	Coverage Δ
#multiple	`90.64% <95.91%> (+0.01%)`	⬆️
#single	`43.07% <71.42%> (+0.06%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/datetimelike.py	`96.44% <ø> (ø)`	⬆️
pandas/core/arrays/period.py	`98.48% <ø> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`98.23% <ø> (ø)`	⬆️
pandas/core/arrays/timedeltas.py	`88.41% <100%> (+1.25%)`	⬆️
pandas/core/arrays/integer.py	`95.78% <100%> (+0.24%)`	⬆️
pandas/core/arrays/categorical.py	`95.3% <100%> (-0.01%)`	⬇️
pandas/core/arrays/sparse.py	`92.03% <100%> (-0.06%)`	⬇️
pandas/core/indexes/base.py	`96.16% <100%> (-0.12%)`	⬇️
pandas/core/indexes/range.py	`97.29% <100%> (-0.04%)`	⬇️
pandas/core/ops.py	`94.51% <90.9%> (+0.25%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b0fa8e...4d48100. Read the comment docs.

pandas/core/ops.py

…h DataFrame

…ilerplate

jbrockmendel · 2018-12-15T00:01:42Z

After a bunch more poking at this, I think we should hold off on applying this decorator for the methods where it would change behavior, in particular for Categorical and PeriodArray comparisons. Implementing tests for these would be an afterthought, runs the risk of being half-baked.

jbrockmendel · 2018-12-18T20:27:29Z

@jreback thoughts here? I'm increasingly leaning towards being even stricter, reverting usages that change behavior (basically everything outside of timedeltas) and incrementally re-implementing those in follow-ups with appropriate tests (i.e. things like ops with zero-dim arrays)

jreback · 2018-12-18T22:37:07Z

@jreback thoughts here? I'm increasingly leaning towards being even stricter, reverting usages that change behavior (basically everything outside of timedeltas) and incrementally re-implementing those in follow-ups with appropriate tests (i.e. things like ops with zero-dim arrays)

not sure exactly what you mean here. with the decorator?

jbrockmendel · 2018-12-18T23:09:17Z

not sure exactly what you mean here. with the decorator?

The decorator is in fairly good shape, the question is where we want to actually use the decorator. AFAICT every place the decorator is used ATM implies some non-zero change to the behavior of the affected function. For e.g. TimedeltaArray.__mul__, div, etc the change is really small, just handles iterators more precisely. For Categorical comparisons on the other hand, the change is pretty big (with accompanying tests).

But pretty much all the cases in between make not-quite-trivial changes to the methods' behaviors with no accompanying tests. The decision is where to draw then line on those, and my current thought is to be relatively strict for now.

jreback · 2018-12-18T23:23:42Z

But pretty much all the cases in between make not-quite-trivial changes to the methods' behaviors with no accompanying tests. The decision is where to draw then line on those, and my current thought is to be relatively strict for now.

can you highlite this change via an example(s)?

jbrockmendel · 2018-12-19T00:14:14Z

can you highlite this change via an example(s)?

IntNA

arr = np.arange(5)
arr_na = pd.core.arrays.integer_array(arr)

other = (x for x in arr_na)
result = arr_na == other

# PR
>>> result
array([ True,  True,  True,  True,  True])

# master
>>> result
array([False, False, False, False, False])

Sparse


arr = np.arange(5)
df = pd.DataFrame(arr)
arr_sparse = pd.core.arrays.SparseArray(arr)

# PR
>>> arr_sparse == np.array(4)
[False, False, False, False, True]
Fill: False
IntIndex
Indices: array([1, 2, 3, 4], dtype=int32)

# master
>>> arr_sparse == np.array(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/arrays/sparse.py", line 1712, in cmp_method
    if len(self) != len(other):
TypeError: len() of unsized object

# PR
>>> arr_sparse + df.T
   0  1  2  3  4
0  0  2  4  6  8

# master
>>> arr_sparse + df.T
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/arrays/sparse.py", line 1684, in sparse_arithmetic_method
    self=len(self), other=len(other))))
AssertionError: length mismatch: 5 vs. 1

jbrockmendel · 2018-12-19T18:05:12Z

Also, thoughts on collecting categorical and sparse arithmetic/comparison tests in tests.arithmetic? I need to take a look at how to improve coverage for these without combinatorial explosion.

TomAugspurger · 2018-12-19T19:07:21Z

When comparing with a generator, we want the behavior on master, right? That matches NumPy's behavior anyway.

TomAugspurger · 2018-12-19T19:09:23Z

Also, thoughts on collecting categorical and sparse arithmetic/comparison tests in tests.arithmetic?

As an alternative, you could add an arithmetic pytest marker to the classes those tests are defined on. A dev might want "run all the sparse tests" or "run all the arithmetic tests". I'd favor just leaving them as is for now out of status quo bias.

jbrockmendel · 2018-12-19T19:53:30Z

As an alternative, you could add an arithmetic pytest marker to the classes those tests are defined on.

That's not a bad idea. Regardless, this can wait until the next test-parametrization pass.

jbrockmendel · 2018-12-19T23:01:30Z

When comparing with a generator, we want the behavior on master, right? That matches NumPy's behavior anyway.

If that's the case, then we probably need to revert that part of the decorator anyway right?

It isn't obvious to me that is the desired behavior. Is there a reason why would do unroll generators for arithmetic ops but not for comparisons?

TomAugspurger · 2018-12-20T14:29:18Z

I'm not sure either. You could ask on the numpy mailing list to see if that behavior is deliberate. I know that NumPy 1.16 is deprecating passing generators to things like `np.concatenate`.

…

On Wed, Dec 19, 2018 at 5:01 PM jbrockmendel ***@***.***> wrote: When comparing with a generator, we want the behavior on master, right? That matches NumPy's behavior anyway. If that's the case, then we probably need to revert that part of the decorator anyway right? It isn't obvious to me that is the desired behavior. Is there a reason why would do unroll generators for arithmetic ops but not for comparisons? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24282 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIs1MQjNWQ4UI55IktcmueKsaFW6bks5u6sVOgaJpZM4ZUHSY> .

jbrockmendel · 2018-12-20T17:42:14Z

You could ask on the numpy mailing list to see if that
behavior is deliberate.

OK. It seems like this PR is going to be in a holding pattern for a while. I'll make a separate PR to implement the changes+tests for Categorical if you're OK with those.

jbrockmendel · 2019-01-03T00:13:35Z

@TomAugspurger are you on board with the changes to Categorical comparison behavior? (easiest to look at the new test)

TomAugspurger · 2019-01-03T19:29:49Z

IIUC the change in https://github.com/pandas-dev/pandas/pull/24282/files#diff-c859b3060cdd05f4d0f693b0a4b71fc5R130 is to raise on cat == df, making the behavior consistent with how numpy? If so, then +1

jbrockmendel · 2019-01-05T03:56:11Z

I still think this is worth doing, but I'm not comfortable with introducing lots of small changes in behavior without accompanying tests.

Closing now, will try a new approach after the RC.

… (pandas-dev#24630)

jbrockmendel added 4 commits December 13, 2018 21:34

implement unpack_and_defer

c73f714

fix errors checked in test_td64arr_mul_too_short_raises

b33f456

Merge branch 'master' of https://github.com/pandas-dev/pandas into bo…

f516968

…ilerplate

enable for Integer comparisons

2f929af

jbrockmendel commented Dec 14, 2018

View reviewed changes

pandas/core/arrays/datetimes.py Outdated Show resolved Hide resolved

jbrockmendel commented Dec 14, 2018

View reviewed changes

pandas/core/arrays/timedeltas.py Outdated Show resolved Hide resolved

jbrockmendel commented Dec 14, 2018

View reviewed changes

jreback reviewed Dec 14, 2018

View reviewed changes

jbrockmendel mentioned this pull request Dec 14, 2018

CLN: standardize different freq message #24283

Merged

fixup remove unused imports

29212f2

TomAugspurger reviewed Dec 14, 2018

View reviewed changes

pandas/core/ops.py Show resolved Hide resolved

jreback added Clean Internals Related to non-user accessible pandas implementation labels Dec 14, 2018

jbrockmendel added 5 commits December 14, 2018 14:05

enable for categorical, revert non-central, tests for categorical wit…

fc0d413

…h DataFrame

Merge branch 'master' of https://github.com/pandas-dev/pandas into bo…

84cedf7

…ilerplate

revert not-yet-ready

8ec63eb

Merge branch 'master' of https://github.com/pandas-dev/pandas into bo…

30992b2

…ilerplate

use decorator in periodarray comparisons, update test messages

5522b6c

jbrockmendel added 2 commits December 14, 2018 16:16

revert application

5b81600

isort fixup

4d48100

jbrockmendel mentioned this pull request Dec 17, 2018

EA ops alignment with DataFrame #24301

Closed

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Jan 4, 2019

Have Categorical ops defer to dataframe; broken off of pandas-dev#24282

005d35d

jbrockmendel closed this Jan 5, 2019

jbrockmendel deleted the boilerplate branch January 5, 2019 03:56

jreback pushed a commit that referenced this pull request Jan 5, 2019

Have Categorical ops defer to DataFrame; broken off of #24282 (#24630)

dc703ce

jbrockmendel mentioned this pull request Jan 5, 2019

Make DTA/TDA/PA return NotImplemented on comparisons #24643

Merged

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Have Categorical ops defer to DataFrame; broken off of pandas-dev#24282…

1219147

… (pandas-dev#24630)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Have Categorical ops defer to DataFrame; broken off of pandas-dev#24282…

152b3f8

… (pandas-dev#24630)

jbrockmendel mentioned this pull request Mar 3, 2019

REF/CLN: ops boilerplate #23853 #24846

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: decorator for ops boilerplate #24282

WIP: decorator for ops boilerplate #24282

jbrockmendel commented Dec 14, 2018

pep8speaks commented Dec 14, 2018

jbrockmendel Dec 14, 2018

TomAugspurger Dec 14, 2018

jbrockmendel Dec 14, 2018

jbrockmendel Dec 14, 2018

jbrockmendel Dec 14, 2018

jbrockmendel Dec 14, 2018

jbrockmendel Dec 14, 2018

TomAugspurger Dec 14, 2018

jbrockmendel Dec 14, 2018

jbrockmendel Dec 14, 2018

jbrockmendel Dec 14, 2018

jreback left a comment

codecov bot commented Dec 14, 2018

codecov bot commented Dec 14, 2018 •

edited

Loading

jbrockmendel commented Dec 15, 2018

jbrockmendel commented Dec 18, 2018

jreback commented Dec 18, 2018

jbrockmendel commented Dec 18, 2018

jreback commented Dec 18, 2018

jbrockmendel commented Dec 19, 2018

jbrockmendel commented Dec 19, 2018

TomAugspurger commented Dec 19, 2018

TomAugspurger commented Dec 19, 2018

jbrockmendel commented Dec 19, 2018

jbrockmendel commented Dec 19, 2018

TomAugspurger commented Dec 20, 2018 via email

jbrockmendel commented Dec 20, 2018

jbrockmendel commented Jan 3, 2019

TomAugspurger commented Jan 3, 2019

jbrockmendel commented Jan 5, 2019

		@@ -63,6 +63,8 @@ def _try_get_item(x):


		def _make_comparison_op(op, cls):

		# @ops.unpack_and_defer

WIP: decorator for ops boilerplate #24282

WIP: decorator for ops boilerplate #24282

Conversation

jbrockmendel commented Dec 14, 2018

pep8speaks commented Dec 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 14, 2018

Codecov Report

codecov bot commented Dec 14, 2018 • edited Loading

Codecov Report

jbrockmendel commented Dec 15, 2018

jbrockmendel commented Dec 18, 2018

jreback commented Dec 18, 2018

jbrockmendel commented Dec 18, 2018

jreback commented Dec 18, 2018

jbrockmendel commented Dec 19, 2018

jbrockmendel commented Dec 19, 2018

TomAugspurger commented Dec 19, 2018

TomAugspurger commented Dec 19, 2018

jbrockmendel commented Dec 19, 2018

jbrockmendel commented Dec 19, 2018

TomAugspurger commented Dec 20, 2018 via email

jbrockmendel commented Dec 20, 2018

jbrockmendel commented Jan 3, 2019

TomAugspurger commented Jan 3, 2019

jbrockmendel commented Jan 5, 2019

codecov bot commented Dec 14, 2018 •

edited

Loading