dispatch scalar DataFrame ops to Series #22163

jbrockmendel · 2018-08-02T01:36:24Z

Many issues closed; will track them down and update. Will also need whatsnew.

closes #18874
closes #20088
closes #15697
closes #13128
closes #8554
closes #8932
closes #21610
closes #22005
closes #22047
closes #22242

This will be less verbose after #22068 implements ops.dispatch_to_series.

This still only dispatches a subset of ops. #22019 dispatches another (disjoint) subset. After that is another easy-ish case where alignment is known. Saved for last are cases with ambiguous alignment that is currently done in an ad-hoc best-guess way.

pep8speaks · 2018-08-02T01:36:30Z

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 10, 2018 at 14:11 Hours UTC

jbrockmendel · 2018-08-02T06:37:07Z

Uses too many kludges, should wait for fixes in series and index comparisons. Closing.

…bugs

jbrockmendel · 2018-08-03T05:28:00Z

Re-opening after scaling back an unreasonably ambitious py2/py3 compat goal. In particular consider:

df = pd.Series(['bar', 'bar'], name='foo').to_frame()
df < 0
df['foo'] < 0

In PY3 ATM this gives:

>>> df < 0
    foo
0  True
1  True
>>> df['foo'] < 0
[...]
TypeError: '<' not supported between instances of 'str' and 'int'

And in PY2:

>>> df < 0
     foo
0  False
1  False
>>> df['foo'] < 0
0    False
1    False
Name: foo, dtype: bool

Making the PY2/PY3 behavior identical is not feasible, but we can (and this PR does) ensure that the DataFrame/Series behavior matches. In PY2 this is unchanged, in PY3 the DataFrame comparison now correctly raises.

codecov · 2018-08-03T05:28:15Z

Codecov Report

Merging #22163 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22163      +/-   ##
==========================================
- Coverage   92.08%   92.05%   -0.04%     
==========================================
  Files         169      169              
  Lines       50694    50700       +6     
==========================================
- Hits        46682    46672      -10     
- Misses       4012     4028      +16

Flag	Coverage Δ
#multiple	`90.46% <100%> (-0.04%)`	⬇️
#single	`42.26% <80%> (-0.08%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/ops.py	`96.71% <100%> (+0.14%)`	⬆️
pandas/core/frame.py	`97.26% <100%> (ø)`	⬆️
pandas/core/internals/blocks.py	`93.83% <0%> (-0.81%)`	⬇️
pandas/core/dtypes/missing.py	`92.98% <0%> (-0.59%)`	⬇️
pandas/util/testing.py	`85.69% <0%> (-0.21%)`	⬇️
pandas/core/generic.py	`96.42% <0%> (-0.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc3ab4a...9b62135. Read the comment docs.

…bugs

jbrockmendel · 2018-08-07T19:13:33Z

Updated OP with issues this closes

jreback

i would like to have gh comments on the tests where appropriate. pls also do the whatsnew. its pretty important that we nail down and match the closed issues with the code.

jreback · 2018-08-08T10:10:38Z

pandas/core/frame.py

@@ -4949,6 +4949,14 @@ def _combine_match_columns(self, other, func, level=None, try_cast=True):
        return self._constructor(new_data)

    def _combine_const(self, other, func, errors='raise', try_cast=True):
+        if lib.is_scalar(other) or np.ndim(other) == 0:


is is pretty annoything that we have to do this, I would make an explict function maybe is_any_scalar I think as we have these types of checks all over. pls make an issue for this.

jreback · 2018-08-08T10:11:18Z

pandas/core/ops.py

@@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None):

        res_name = get_op_result_name(self, other)

+        if isinstance(other, list):


why is is_list_like (maybe after some other comparisons) enough here?

ATM the isinstance(other, list) check is done below the isinstance(other, (np.ndarray, pd.Index)) check. Wrapping lists earlier let us send lists through that same ndarray/Index block. Ideally the catchall else: block can be reduced to only-scalars, but we're not there yet.

jreback · 2018-08-08T10:11:44Z

pandas/core/ops.py

@@ -1706,7 +1708,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
            if fill_value is not None:
                self = self.fillna(fill_value)

-            return self._combine_const(other, na_op, try_cast=True)
+            pass_op = op if lib.is_scalar(other) else na_op


you are checking for a scalar here and above?

It's kind of annoying. If lib.is_scalar(other) then we will be dispatching to the Series op, in which case we want to pass the "raw" op (e.g. operator.add) and not the wrapped op na_op.

This PR handles only scalars since that is a relatively easy case. A few PRs down the road we'll have all these ops dispatch to series, at which point this won't be necessary.

jreback · 2018-08-08T10:12:46Z

pandas/tests/frame/test_indexing.py

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
        # test df[df > 0]
        for df in [self.tsframe, self.mixed_frame,
                   self.mixed_float, self.mixed_int]:
+            if compat.PY3 and df is self.mixed_frame:
+                continue


let's strip out the mixed_frame to another function (even though that duplicates some code), bonus can parametrize this test.

bonus can parametrize this test.

I don't think tsframe, mixed_frame, mixed_float, mixed_int are available in the namespace.

these need to be made fixtures. this becomes so much easier.

I agree, am starting to implement this in the test_arithmetic sequence of PRs. Will update this test when that lands.

jreback · 2018-08-08T10:13:54Z

pandas/tests/indexes/timedeltas/test_arithmetic.py

-        tm.assert_frame_equal(actual, dfn)
-        actual = df1 - NA
-        tm.assert_frame_equal(actual, dfn)
+        with pytest.raises(TypeError):


why is this raising? this is a big change if you don't allow nan to act as NaT in ops

This is the current behavior for Series and Index.

this needs a subsection in the whatsnew then, marked as an api change.

jreback · 2018-08-08T10:14:09Z

pandas/tests/internals/test_internals.py

-        expected = op(s, value).dtypes
-        assert_series_equal(result, expected)
+
+        invalid = {(operator.pow, '<M8[ns]'),


pull this out and parametrize

This test already has two layers of parametrization; it isn't clear how to pull this out without making it more verbose+repetitive. Let me give this some thought and circle back.

…bugs

jbrockmendel · 2018-08-08T18:47:44Z

Phew. Just added GH references to tests and a ton of Whats New.

jreback · 2018-08-09T10:36:42Z

needs a rebase and some comments.

…bugs

jreback · 2018-08-10T10:29:19Z

needs rebase again

…bugs

jreback

lgtm.

jreback · 2018-08-14T10:43:23Z

pandas/tests/frame/test_indexing.py

@@ -273,6 +273,8 @@ def test_getitem_boolean(self):
        # test df[df > 0]
        for df in [self.tsframe, self.mixed_frame,
                   self.mixed_float, self.mixed_int]:
+            if compat.PY3 and df is self.mixed_frame:
+                continue


jreback · 2018-08-14T10:49:25Z

thanks @jbrockmendel

nice squashings!

dispatch scalar DataFrame ops to Series

7681092

jbrockmendel added 2 commits August 1, 2018 18:39

flake8 fixup

b226abf

kludge-fix indexing errors

3fd46bc

jbrockmendel closed this Aug 2, 2018

jbrockmendel added 2 commits August 2, 2018 15:27

Merge branch 'master' of https://github.com/pandas-dev/pandas into df…

0dff3a1

…bugs

scale back py2/py3 compat goals

caf2da0

jbrockmendel reopened this Aug 3, 2018

jbrockmendel added 4 commits August 3, 2018 12:44

Merge branch 'master' of https://github.com/pandas-dev/pandas into df…

6ff8705

…bugs

update error message

0513e0b

try to fix test_expressions failure going down the wrong path

3a7b782

dummy commit to force CI

6636565

gfyoung added Refactor Internal refactoring of code Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Aug 7, 2018

gfyoung requested a review from jreback August 7, 2018 17:03

jbrockmendel added 2 commits August 7, 2018 11:37

Merge branch 'master' of https://github.com/pandas-dev/pandas into df…

edc3792

…bugs

post-merge cleanup

3c65f93

jbrockmendel mentioned this pull request Aug 7, 2018

Collect datetime64 and PeriodDtype arithmetic tests #22237

Merged

jreback added this to the 0.24.0 milestone Aug 8, 2018

jreback requested changes Aug 8, 2018

View reviewed changes

This was referenced Aug 8, 2018

Problem in comparisons for DataFrame with pd.NaT #22242

Closed

ENH: Implement is_any_scalar #22248

Closed

jbrockmendel added 3 commits August 8, 2018 10:33

Merge branch 'master' of https://github.com/pandas-dev/pandas into df…

2cdc58b

…bugs

mark tests with GH Issues

4703db7

whatsnew? everything is new

c090713

edit test for appveyor compat

d683cb3

Merge branch 'master' of https://github.com/pandas-dev/pandas into df…

8c64cf6

…bugs

jbrockmendel mentioned this pull request Aug 9, 2018

Continue Collecting Arithmetic Tests #22267

Merged

jbrockmendel added 2 commits August 9, 2018 12:45

API Changes section for DataFrame[timedelta64] - np.nan

dbdea1a

un-xfail

f1edec4

jbrockmendel mentioned this pull request Aug 10, 2018

implement masked_arith_op to de-duplicate ops code #22182

Merged

1 task

Merge branch 'master' of https://github.com/pandas-dev/pandas into df…

9b62135

…bugs

jbrockmendel mentioned this pull request Aug 13, 2018

[PERF] use numexpr in dispatch_to_series #22284

Merged

jreback approved these changes Aug 14, 2018

View reviewed changes

jreback mentioned this pull request Aug 14, 2018

DataFrame[datetime64].__sub__ non-nano datetime64 fails #18874

Closed

jreback merged commit f7f266c into pandas-dev:master Aug 14, 2018

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Aug 16, 2018

un-xfail tests fixed by pandas-dev#22163

05b2c50

This was referenced Aug 16, 2018

DataFrame vs Series vs Index arithmetic Roundup #18824

Closed

core: fix DatetimeBlock operated with timedelta #22007

Closed

core: try coerce result back to DatetimeBlock #22008

Closed

Fix arithmetic errors with timedelta64 dtypes #22390

Merged

jbrockmendel mentioned this pull request Aug 31, 2018

Timestamp comparison inconsistency #22017

Closed

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

dispatch scalar DataFrame ops to Series (pandas-dev#22163)

9ee7594

This was referenced Oct 23, 2018

Comparing DataFrame with columns with mixed types to a scalar should fail #20876

Closed

Unexpected exception on column with NaT #17559

Closed

BUG: datetimelike subtract incorrect when broadcasting #12437

Closed

mroeschke mentioned this pull request Jan 13, 2019

Inconsistent handling of NaN in Timedelta comparison #24726

Closed

jbrockmendel deleted the dfbugs branch April 5, 2020 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dispatch scalar DataFrame ops to Series #22163

dispatch scalar DataFrame ops to Series #22163

jbrockmendel commented Aug 2, 2018 •

edited

Loading

pep8speaks commented Aug 2, 2018 •

edited

Loading

jbrockmendel commented Aug 2, 2018

jbrockmendel commented Aug 3, 2018

codecov bot commented Aug 3, 2018 •

edited

Loading

jbrockmendel commented Aug 7, 2018

jreback left a comment

jreback Aug 8, 2018

jbrockmendel Aug 8, 2018

jreback Aug 8, 2018

jbrockmendel Aug 8, 2018

jreback Aug 8, 2018

jbrockmendel Aug 8, 2018

jreback Aug 8, 2018

jbrockmendel Aug 8, 2018

jreback Aug 9, 2018

jbrockmendel Aug 9, 2018

jreback Aug 14, 2018

jreback Aug 8, 2018

jbrockmendel Aug 8, 2018

jreback Aug 9, 2018

jreback Aug 8, 2018

jbrockmendel Aug 8, 2018

jbrockmendel commented Aug 8, 2018

jreback commented Aug 9, 2018

jreback commented Aug 10, 2018

jreback left a comment

jreback Aug 14, 2018

jreback commented Aug 14, 2018

		@@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None):

		res_name = get_op_result_name(self, other)

		if isinstance(other, list):

dispatch scalar DataFrame ops to Series #22163

dispatch scalar DataFrame ops to Series #22163

Conversation

jbrockmendel commented Aug 2, 2018 • edited Loading

pep8speaks commented Aug 2, 2018 • edited Loading

Comment last updated on August 10, 2018 at 14:11 Hours UTC

jbrockmendel commented Aug 2, 2018

jbrockmendel commented Aug 3, 2018

codecov bot commented Aug 3, 2018 • edited Loading

Codecov Report

jbrockmendel commented Aug 7, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Aug 8, 2018

jreback commented Aug 9, 2018

jreback commented Aug 10, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 14, 2018

jbrockmendel commented Aug 2, 2018 •

edited

Loading

pep8speaks commented Aug 2, 2018 •

edited

Loading

codecov bot commented Aug 3, 2018 •

edited

Loading