-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dispatch scalar DataFrame ops to Series #22163
Conversation
Hello @jbrockmendel! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 10, 2018 at 14:11 Hours UTC |
Uses too many kludges, should wait for fixes in series and index comparisons. Closing. |
Re-opening after scaling back an unreasonably ambitious py2/py3 compat goal. In particular consider:
In PY3 ATM this gives:
And in PY2:
Making the PY2/PY3 behavior identical is not feasible, but we can (and this PR does) ensure that the DataFrame/Series behavior matches. In PY2 this is unchanged, in PY3 the DataFrame comparison now correctly raises. |
Codecov Report
@@ Coverage Diff @@
## master #22163 +/- ##
==========================================
- Coverage 92.08% 92.05% -0.04%
==========================================
Files 169 169
Lines 50694 50700 +6
==========================================
- Hits 46682 46672 -10
- Misses 4012 4028 +16
Continue to review full report at Codecov.
|
Updated OP with issues this closes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would like to have gh comments on the tests where appropriate. pls also do the whatsnew. its pretty important that we nail down and match the closed issues with the code.
@@ -4949,6 +4949,14 @@ def _combine_match_columns(self, other, func, level=None, try_cast=True): | |||
return self._constructor(new_data) | |||
|
|||
def _combine_const(self, other, func, errors='raise', try_cast=True): | |||
if lib.is_scalar(other) or np.ndim(other) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is is pretty annoything that we have to do this, I would make an explict function maybe is_any_scalar
I think as we have these types of checks all over. pls make an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None): | |||
|
|||
res_name = get_op_result_name(self, other) | |||
|
|||
if isinstance(other, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is is_list_like
(maybe after some other comparisons) enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM the isinstance(other, list)
check is done below the isinstance(other, (np.ndarray, pd.Index))
check. Wrapping lists earlier let us send lists through that same ndarray/Index block. Ideally the catchall else:
block can be reduced to only-scalars, but we're not there yet.
@@ -1706,7 +1708,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None): | |||
if fill_value is not None: | |||
self = self.fillna(fill_value) | |||
|
|||
return self._combine_const(other, na_op, try_cast=True) | |||
pass_op = op if lib.is_scalar(other) else na_op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are checking for a scalar here and above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of annoying. If lib.is_scalar(other)
then we will be dispatching to the Series
op, in which case we want to pass the "raw" op (e.g. operator.add
) and not the wrapped op na_op
.
This PR handles only scalars since that is a relatively easy case. A few PRs down the road we'll have all these ops dispatch to series, at which point this won't be necessary.
@@ -273,6 +273,8 @@ def test_getitem_boolean(self): | |||
# test df[df > 0] | |||
for df in [self.tsframe, self.mixed_frame, | |||
self.mixed_float, self.mixed_int]: | |||
if compat.PY3 and df is self.mixed_frame: | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's strip out the mixed_frame to another function (even though that duplicates some code), bonus can parametrize this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bonus can parametrize this test.
I don't think tsframe
, mixed_frame
, mixed_float
, mixed_int
are available in the namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need to be made fixtures. this becomes so much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, am starting to implement this in the test_arithmetic sequence of PRs. Will update this test when that lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k thanks
tm.assert_frame_equal(actual, dfn) | ||
actual = df1 - NA | ||
tm.assert_frame_equal(actual, dfn) | ||
with pytest.raises(TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this raising? this is a big change if you don't allow nan to act as NaT in ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the current behavior for Series and Index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a subsection in the whatsnew then, marked as an api change.
expected = op(s, value).dtypes | ||
assert_series_equal(result, expected) | ||
|
||
invalid = {(operator.pow, '<M8[ns]'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this out and parametrize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test already has two layers of parametrization; it isn't clear how to pull this out without making it more verbose+repetitive. Let me give this some thought and circle back.
Phew. Just added GH references to tests and a ton of Whats New. |
needs a rebase and some comments. |
needs rebase again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
@@ -273,6 +273,8 @@ def test_getitem_boolean(self): | |||
# test df[df > 0] | |||
for df in [self.tsframe, self.mixed_frame, | |||
self.mixed_float, self.mixed_int]: | |||
if compat.PY3 and df is self.mixed_frame: | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k thanks
thanks @jbrockmendel nice squashings! |
Many issues closed; will track them down and update. Will also need whatsnew.
closes #18874
closes #20088
closes #15697
closes #13128
closes #8554
closes #8932
closes #21610
closes #22005
closes #22047
closes #22242
This will be less verbose after #22068 implements
ops.dispatch_to_series
.This still only dispatches a subset of ops. #22019 dispatches another (disjoint) subset. After that is another easy-ish case where alignment is known. Saved for last are cases with ambiguous alignment that is currently done in an ad-hoc best-guess way.