-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standardize signature for Index reductions, implement nanmean for datetime64 dtypes #24293
Conversation
Hello @jbrockmendel! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 28, 2018 at 23:37 Hours UTC |
possibly closes #21583 |
test_nanops is a PITA in to troubleshoot. code de-duplication at the expense of clarity may need re-thinking at some point. |
Codecov Report
@@ Coverage Diff @@
## master #24293 +/- ##
==========================================
+ Coverage 92.22% 92.22% +<.01%
==========================================
Files 162 162
Lines 51824 51842 +18
==========================================
+ Hits 47795 47813 +18
Misses 4029 4029
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24293 +/- ##
==========================================
- Coverage 92.32% 92.31% -0.01%
==========================================
Files 166 166
Lines 52298 52320 +22
==========================================
+ Hits 48285 48301 +16
- Misses 4013 4019 +6
Continue to review full report at Codecov.
|
needs. parameterization pass |
@@ -286,7 +286,7 @@ def min(self, axis=None, *args, **kwargs): | |||
except ValueError: | |||
return self._na_value | |||
|
|||
def argmin(self, axis=None, *args, **kwargs): | |||
def argmin(self, axis=None, skipna=True, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these inherit the doc-string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look. Might also get rid of *args
while at it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm the docstrings here and for the corresponding methods in base.IndexOpsMixin are kind of clunky. This may merit a separate look, @datapythonista ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know well what's the class hierarchy and whether makes sense to inherit. But we'll have to add Parameters
, Returns
and Examples
section here if this is public.
@@ -297,12 +297,14 @@ def _minmax(self, meth): | |||
|
|||
return self._start + self._step * no_steps | |||
|
|||
def min(self): | |||
def min(self, axis=None, skipna=True): | |||
"""The minimum value of the RangeIndex""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same. why don't these inherit the doc-string
pandas/core/nanops.py
Outdated
@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None, | |||
mask = isna(values) | |||
|
|||
dtype = values.dtype | |||
if is_datetime64tz_dtype(orig_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very very odd place to do this as no other dtype specific things are here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi-agree. Most of nanops
is numpy dtype/ndarray specific, so making datetime64tz work is klunky. That's why I thought it made more sense to implement these directly in DTA in #23890. But conditional on implementing this here in nanops, this check is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is this is in the wrong place. nanops already handles dtypes just not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific as to where you want this? The place where dtype
is currently defined is one line above this, so doing this anywhere else seems weird to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this should go in _view_if_needed (or maybe just remove that function and inline it here is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't yet. view_if_needed is called ~25 lines below this. It isn't obvious to me that it is OK to move it up to here. If it is, then sure, its a one-liner that can be inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes move it, as I said above you have dtype inference in 2 places for no reason.
pandas/core/nanops.py
Outdated
@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None, | |||
mask = isna(values) | |||
|
|||
dtype = values.dtype | |||
if is_datetime64tz_dtype(orig_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is this is in the wrong place. nanops already handles dtypes just not here.
If everyone is happy with this, I can get started on a follow-up that will trim reductions off of #24024. |
pandas/core/base.py
Outdated
@@ -951,22 +957,36 @@ def max(self): | |||
>>> idx.max() | |||
('b', 2) | |||
""" | |||
nv.validate_minmax_axis(axis) | |||
return nanops.nanmax(self.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't skipna passed? Is that a followup item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had considered this a follow-up item, but am now leaning towards fixing these now. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turns out to be more involved than expected, requires some changes in Series._reduce and IndexOpsMixin._reduce. Neither of which is all that invasive, but changing them entails changing/fixing/testing behavior for other reductions I hadn't planned on doing in this PR.
Currently going through to get that all done here. I'm also OK with doing this in two phases so as to take things out of #24024 sooner rather than later.
Updated to pass skipna in the appropriate places in Index and DatetimeIndexOpsMixin methods. Updated a handful of tests, but I'm not confident we got thorough coverage |
Travis fail is hypothesis |
rebased |
""" | ||
Return the maximum value of the Index. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we have similar things in pandas/core/base, these should change as well.
pandas/core/indexes/datetimelike.py
Outdated
# quick check | ||
if len(i8) and self.is_monotonic: | ||
if i8[-1] != iNaT: | ||
return self._box_func(i8[-1]) | ||
|
||
if not len(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif
# quick check | ||
if len(i8) and self.is_monotonic: | ||
if i8[-1] != iNaT: | ||
return self._box_func(i8[-1]) | ||
|
||
if not len(self): | ||
return self._na_value | ||
|
||
if self.hasnans: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/core/nanops.py
Outdated
@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None, | |||
mask = isna(values) | |||
|
|||
dtype = values.dtype | |||
if is_datetime64tz_dtype(orig_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you do this?
@@ -558,6 +599,8 @@ def test_minmax_nat_series(self, nat_ser): | |||
# GH#23282 | |||
assert nat_ser.min() is pd.NaT | |||
assert nat_ser.max() is pd.NaT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can parameterize over skipna in a lot of the code you added to tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just implemented test_reductions, still need to move over the reduction tests from tests.indexes and tests.frames. After they're all in one place is when to really focus on parametrization.
pandas/core/nanops.py
Outdated
@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None, | |||
mask = isna(values) | |||
|
|||
dtype = values.dtype | |||
if is_datetime64tz_dtype(orig_values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes move it, as I said above you have dtype inference in 2 places for no reason.
@@ -3493,6 +3494,9 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None, | |||
# dispatch to ExtensionArray interface | |||
if isinstance(delegate, ExtensionArray): | |||
return delegate._reduce(name, skipna=skipna, **kwds) | |||
elif is_datetime64_dtype(delegate): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't clear to me that it would make a difference
Just pushed, collected all of that at the top of _get_values. |
pandas/core/base.py
Outdated
Parameters | ||
---------- | ||
axis : {None} | ||
Dummy argument for consistency with Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we use this terminology here
axis : {0)
Axis for the function to be applied on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jbrockmendel is listing the set of possible values, which we do use. In this case, that set is just the single value None
, so it looks strange.
I think
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger verbiage would be good
axis : {None} | ||
Dummy argument for consistency with Series | ||
skipna : bool, default True | ||
|
||
See Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be consisten about the See Also, e.g. make sure in all, and add a refernce to the Series.min function as well (as appropriate)
|
||
i8 = self.asi8 | ||
try: | ||
# quick check | ||
if len(i8) and self.is_monotonic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these quick checks make sense for Index as they are immutable, but may not make much sense here (but i guess can evaluate later)
code looks good. if you can update the doc-strings then would be good. |
I think docstring comments have been addressed; bears double-checking |
thanks @jbrockmendel I think we need a whatsnew for this #24265, if you can add in next PR |
@datapythonista @jorisvandenbossche @TomAugspurger its worth investigating if we can somehow share doc-strings with EA, Index, Series for some operations by templating. |
Same basic goal as #23890, but focusing on nanops (a module I'm not all that familiar with) first, with the array methods later, instead of vice-versa.
I think the only user-facing change (i.e. need for whatsnew) is that
Series[datetime64].mean()
now works.I need to see if there are any more tests from #23890 or #24024 that can be copied here.
git diff upstream/master -u -- "*.py" | flake8 --diff