-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate dispatch code, remove unreachable branches #22068
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22068 +/- ##
==========================================
+ Coverage 92.06% 92.06% +<.01%
==========================================
Files 170 170
Lines 50720 50715 -5
==========================================
- Hits 46694 46691 -3
+ Misses 4026 4024 -2
Continue to review full report at Codecov.
|
pandas/core/ops.py
Outdated
@@ -1114,6 +1114,7 @@ def na_op(x, y): | |||
result[mask] = op(x[mask], com.values_from_object(y[mask])) | |||
else: | |||
assert isinstance(x, np.ndarray) | |||
assert lib.is_scalar(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob should just import at the top
not is_scalar(other))): | ||
(is_extension_array_dtype(other) and not is_scalar(other))): | ||
# Note: the `not is_scalar(other)` condition rules out | ||
# e.g. other == "category" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue here is we need to be able to distinguish between an actual dtype comparison and a real comparison, e.g.
df.dtypes == 'category'
(or df.dtypes == 'Int8'
.
this is pretty thorny, e.g. how do you know when to convert a scalar string in a comparison op to an actual dtype for comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah. A while ago there was a check is_categorical_dtype(y) and not is_scalar(y)
and it took me a while to figure out that the is_scalar
part was specifically to avoid letting "category" through, so I've gotten in the habit of adding this comment for future readers.
pls rebase |
looks fine. can you run an asv dataframe ops (or a subset of them) to ensure that perf is not materially different. and rebase on master (if its not already) |
Sure
Will run, but there's nothing here that should affect anything. |
Looks like frame-frame comparisons are non-trivially slower due to numexpr being used less effectively. Specifically ATM dispatch-to-Series is done inside numexpr, while in the PR the dispatch is done outside numexpr.
For the time being I'll revert that part of the PR. |
After reverting the numexpr thing:
|
see good thing we look at perf even when it shouldn't affect things :-D |
thanks. |
There are a couple of
DataFrame
methods that operate column-wise, dispatching toSeries
implementations. This de-duplicates that code by implementingops.dispatch_to_series
. Importantly, this function is going to be used a few more times as we move towards getting rid ofBlockManager.eval
andBlock.eval
, so putting it in place now makes for a more focused diff later.Also removes a couple of no-longer-reachable cases in comparison ops.
git diff upstream/master -u -- "*.py" | flake8 --diff