De-duplicate dispatch code, remove unreachable branches #22068

jbrockmendel · 2018-07-26T16:47:34Z

There are a couple of DataFrame methods that operate column-wise, dispatching to Series implementations. This de-duplicates that code by implementing ops.dispatch_to_series. Importantly, this function is going to be used a few more times as we move towards getting rid of BlockManager.eval and Block.eval, so putting it in place now makes for a more focused diff later.

Also removes a couple of no-longer-reachable cases in comparison ops.

passes git diff upstream/master -u -- "*.py" | flake8 --diff

…spatch

codecov · 2018-07-26T18:25:12Z

Codecov Report

Merging #22068 into master will increase coverage by <.01%.
The diff coverage is 95%.

@@            Coverage Diff             @@
##           master   #22068      +/-   ##
==========================================
+ Coverage   92.06%   92.06%   +<.01%     
==========================================
  Files         170      170              
  Lines       50720    50715       -5     
==========================================
- Hits        46694    46691       -3     
+ Misses       4026     4024       -2

Flag	Coverage Δ
#multiple	`90.47% <95%> (ø)`	⬆️
#single	`42.3% <20%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.24% <100%> (-0.02%)`	⬇️
pandas/core/ops.py	`96.46% <92.3%> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 647f3f0...c36e672. Read the comment docs.

jreback · 2018-07-28T13:08:03Z

pandas/core/ops.py

@@ -1114,6 +1114,7 @@ def na_op(x, y):
                result[mask] = op(x[mask], com.values_from_object(y[mask]))
            else:
                assert isinstance(x, np.ndarray)
+                assert lib.is_scalar(y)


prob should just import at the top

jreback · 2018-07-28T13:09:50Z

pandas/core/ops.py

-               not is_scalar(other))):
+              (is_extension_array_dtype(other) and not is_scalar(other))):
+            # Note: the `not is_scalar(other)` condition rules out
+            # e.g. other == "category"


the issue here is we need to be able to distinguish between an actual dtype comparison and a real comparison, e.g.
df.dtypes == 'category' (or df.dtypes == 'Int8' .

this is pretty thorny, e.g. how do you know when to convert a scalar string in a comparison op to an actual dtype for comparisons

Yah. A while ago there was a check is_categorical_dtype(y) and not is_scalar(y) and it took me a while to figure out that the is_scalar part was specifically to avoid letting "category" through, so I've gotten in the habit of adding this comment for future readers.

jreback · 2018-07-28T13:22:16Z

pls rebase

…spatch

jreback · 2018-08-01T22:24:07Z

looks fine. can you run an asv dataframe ops (or a subset of them) to ensure that perf is not materially different. and rebase on master (if its not already)

jbrockmendel · 2018-08-01T22:34:04Z

and rebase on master (if its not already)

Sure

can you run an asv dataframe ops (or a subset of them) to ensure that perf is not materially different

Will run, but there's nothing here that should affect anything.

…spatch

jbrockmendel · 2018-08-02T01:13:42Z

Looks like frame-frame comparisons are non-trivially slower due to numexpr being used less effectively. Specifically ATM dispatch-to-Series is done inside numexpr, while in the PR the dispatch is done outside numexpr.

asv continuous -f 1.1 -E virtualenv master HEAD -b dataframe -b DataFrame -b ops
[...]
       before           after         ratio
     [9c118668]       [94f168a8]
+      5.69±0.1ms         153±20ms    26.86  binary_ops.Ops.time_frame_comparison(True, 'default')
+      4.03±0.2ms         97.5±3ms    24.19  binary_ops.Ops.time_frame_comparison(True, 1)
+      17.8±0.2ms       27.3±0.3ms     1.54  join_merge.Join.time_join_dataframe_index_multi(False)
+      14.5±0.1ms         19.5±6ms     1.34  join_merge.Merge.time_merge_dataframe_integer_2key(True)
-        70.7±2ms         54.3±4ms     0.77  binary_ops.Ops.time_frame_comparison(False, 1)
-        72.7±4ms         51.0±2ms     0.70  binary_ops.Ops.time_frame_comparison(False, 'default')

For the time being I'll revert that part of the PR.

jbrockmendel · 2018-08-02T01:55:22Z

After reverting the numexpr thing:

taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b time_frame_comparison
[...]
       before           after         ratio
     [9c118668]       [c36e6729]
-        66.6±3ms         48.7±2ms     0.73  binary_ops.Ops.time_frame_comparison(False, 'default')
-        71.5±5ms         49.1±2ms     0.69  binary_ops.Ops.time_frame_comparison(False, 1)

taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b time_frame_comparison
[...]
       before           after         ratio
     [9c118668]       [c36e6729]
-        66.1±2ms         49.9±2ms     0.76  binary_ops.Ops.time_frame_comparison(False, 'default')

taskset 5 asv continuous -f 1.1 -E virtualenv master HEAD -b time_frame_comparison
[...]
       before           after         ratio
     [9c118668]       [c36e6729]
-        70.8±3ms         48.0±2ms     0.68  binary_ops.Ops.time_frame_comparison(False, 'default')
-        77.5±6ms         50.4±4ms     0.65  binary_ops.Ops.time_frame_comparison(False, 1)

jreback · 2018-08-08T10:16:38Z

see good thing we look at perf even when it shouldn't affect things :-D

jreback · 2018-08-08T10:18:04Z

thanks.

…2068)

jbrockmendel added 5 commits July 25, 2018 21:24

fix trailing whitespace

a13c161

implement dispatch_to_series, remove a layer of closure

c8948bc

revert incorrect change; remove no-longer-reachable cases

2f25223

Merge branch 'master' of https://github.com/pandas-dev/pandas into di…

04213ce

…spatch

Comment and assertion

8527bd2

jbrockmendel mentioned this pull request Jul 26, 2018

REF: move range-generation functions to EA mixin classes #22016

Merged

gfyoung added Refactor Internal refactoring of code Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 27, 2018

gfyoung requested a review from jreback July 27, 2018 04:35

jreback requested changes Jul 28, 2018

View reviewed changes

jbrockmendel added 2 commits July 28, 2018 09:08

Merge branch 'master' of https://github.com/pandas-dev/pandas into di…

92df36d

…spatch

use imported is_scalar

2ae4b2f

This was referenced Jul 28, 2018

[Bug] Fix various DatetimeIndex comparison bugs #22074

Merged

[CLN] Dispatch (some) Frame ops to Series, avoiding _data.eval #22019

Merged

Centralize m8[ns] Arithmetic Tests #22118

Merged

jreback added this to the 0.24.0 milestone Aug 1, 2018

jbrockmendel added 2 commits August 1, 2018 15:34

Merge branch 'master' of https://github.com/pandas-dev/pandas into di…

94f168a

…spatch

dummy commit to force CI

f5bb3b6

jbrockmendel mentioned this pull request Aug 2, 2018

dispatch scalar DataFrame ops to Series #22163

Merged

revert change that hurt perf

c36e672

jreback approved these changes Aug 8, 2018

View reviewed changes

jreback merged commit 81f386c into pandas-dev:master Aug 8, 2018

jbrockmendel deleted the dispatch branch August 8, 2018 15:50

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

De-duplicate dispatch code, remove unreachable branches (pandas-dev#2…

b18909b

…2068)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-duplicate dispatch code, remove unreachable branches #22068

De-duplicate dispatch code, remove unreachable branches #22068

jbrockmendel commented Jul 26, 2018

codecov bot commented Jul 26, 2018 •

edited

Loading

jreback Jul 28, 2018

jreback Jul 28, 2018

jbrockmendel Jul 28, 2018

jreback commented Jul 28, 2018

jreback commented Aug 1, 2018

jbrockmendel commented Aug 1, 2018

jbrockmendel commented Aug 2, 2018

jbrockmendel commented Aug 2, 2018

jreback commented Aug 8, 2018

jreback commented Aug 8, 2018

De-duplicate dispatch code, remove unreachable branches #22068

De-duplicate dispatch code, remove unreachable branches #22068

Conversation

jbrockmendel commented Jul 26, 2018

codecov bot commented Jul 26, 2018 • edited Loading

Codecov Report

jreback Jul 28, 2018

Choose a reason for hiding this comment

jreback Jul 28, 2018

Choose a reason for hiding this comment

jbrockmendel Jul 28, 2018

Choose a reason for hiding this comment

jreback commented Jul 28, 2018

jreback commented Aug 1, 2018

jbrockmendel commented Aug 1, 2018

jbrockmendel commented Aug 2, 2018

jbrockmendel commented Aug 2, 2018

jreback commented Aug 8, 2018

jreback commented Aug 8, 2018

codecov bot commented Jul 26, 2018 •

edited

Loading