-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: use float64_t consistently instead of double, double_t #23583
Conversation
…ad of python NaT lookups
Hello @jbrockmendel! Thanks for submitting the PR.
|
Closes #23371 |
Codecov Report
@@ Coverage Diff @@
## master #23583 +/- ##
=======================================
Coverage 92.25% 92.25%
=======================================
Files 161 161
Lines 51277 51277
=======================================
Hits 47305 47305
Misses 3972 3972
Continue to review full report at Codecov.
|
@cython.boundscheck(False) | ||
@cython.wraparound(False) | ||
cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil: | ||
def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think will have a big perf slowdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its never used in cython and nogil
isnt allowed for def
functions. There is a with nogil
block just below this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls test i once changed this (and tried to remove) and was all negative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indistinguishable:
master:
In [3]: arr = np.arange(10000, dtype=np.int64)
In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 165.65 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.42 µs per loop
In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.3 µs per loop
In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 4.21 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.2 µs per loop
PR:
In [3]: arr = np.arange(10000, dtype=np.int64)
In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 12.06 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.71 µs per loop
In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 9.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.6 µs per loop
In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.95 µs per loop
Similar for other dtypes
pandas/_libs/algos.pyx
Outdated
@@ -197,9 +195,10 @@ def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups): | |||
return result, counts | |||
|
|||
|
|||
# TODO: redundant with groupby.kth_smallest_c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The typing is more specific in the groupby version, but the code itself is nearly identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to remove and you will see why it’s here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get rid of the comment, but am kind of surprised: you're usually Holy Crusader against redundant code, and the body of this function is very nearly copy/pasted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is i don’t think it’s easy to remove
you are more than welcome to try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you're right. I removed the comment.
thanks. does isort work for .pyx? if so can you create an issue to enable it? |
did you need to remove anything from setup.cfg? |
nope, actually mangles cimports pretty badly
no |
* upstream/master: BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619) DOC: Fixes to docstring to add validation to CI (pandas-dev#23560) DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600) MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592) DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)
…fixed * upstream/master: DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620) BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539) BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619)
remove some commented-out or otherwise unused code
disable boundscheck/wraparound in a couple places in tslib where it is safe
add
const
modifiers to some of the memoryview functions so they don't raise if we ever pass read-only arrays to themstandardized NPY_NAT as always being the
cdef int64_t
version andiNaT
as being the python-namespace versionremove duplicated NaN/nan constants
remove non-standard imports of np.nan in some test files