CLN: use float64_t consistently instead of double, double_t #23583

jbrockmendel · 2018-11-08T22:03:03Z

remove some commented-out or otherwise unused code

disable boundscheck/wraparound in a couple places in tslib where it is safe

add const modifiers to some of the memoryview functions so they don't raise if we ever pass read-only arrays to them

standardized NPY_NAT as always being the cdef int64_t version and iNaT as being the python-namespace version

remove duplicated NaN/nan constants

remove non-standard imports of np.nan in some test files

…ad of python NaT lookups

pep8speaks · 2018-11-08T22:03:09Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/tests/arrays/sparse/test_array.py !
There are no PEP8 issues in the file pandas/tests/frame/test_operators.py !
There are no PEP8 issues in the file pandas/tests/frame/test_repr_info.py !
There are no PEP8 issues in the file pandas/tests/frame/test_timeseries.py !
There are no PEP8 issues in the file pandas/tests/frame/test_to_csv.py !
There are no PEP8 issues in the file pandas/tests/groupby/aggregate/test_cython.py !
There are no PEP8 issues in the file pandas/tests/series/test_operators.py !

jbrockmendel · 2018-11-08T23:39:50Z

Closes #23371

codecov · 2018-11-09T00:32:58Z

Codecov Report

Merging #23583 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23583   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files         161      161           
  Lines       51277    51277           
=======================================
  Hits        47305    47305           
  Misses       3972     3972

Flag	Coverage Δ
#multiple	`90.63% <ø> (ø)`	⬆️
#single	`42.32% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ed92ef...23344b2. Read the comment docs.

jreback · 2018-11-09T13:47:58Z

pandas/_libs/algos.pyx

 @cython.boundscheck(False)
 @cython.wraparound(False)
-cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil:
+def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric:


i think will have a big perf slowdown

its never used in cython and nogil isnt allowed for def functions. There is a with nogil block just below this

pls test i once changed this (and tried to remove) and was all negative

Indistinguishable:

master:

In [3]: arr = np.arange(10000, dtype=np.int64) In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4) The slowest run took 165.65 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 9.42 µs per loop In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4) The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 10.3 µs per loop In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4) The slowest run took 4.21 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 11.2 µs per loop

PR:

In [3]: arr = np.arange(10000, dtype=np.int64) In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4) The slowest run took 12.06 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 9.71 µs per loop In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4) The slowest run took 9.48 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 9.6 µs per loop In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4) The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 9.95 µs per loop

Similar for other dtypes

jreback · 2018-11-09T13:48:17Z

pandas/_libs/algos.pyx

@@ -197,9 +195,10 @@ def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups):
    return result, counts


+# TODO: redundant with groupby.kth_smallest_c


its not actually

OK. The typing is more specific in the groupby version, but the code itself is nearly identical

try to remove and you will see why it’s here

I'll get rid of the comment, but am kind of surprised: you're usually Holy Crusader against redundant code, and the body of this function is very nearly copy/pasted

my point is i don’t think it’s easy to remove
you are more than welcome to try

No you're right. I removed the comment.

jreback · 2018-11-11T15:08:36Z

thanks. does isort work for .pyx? if so can you create an issue to enable it?

jreback · 2018-11-11T15:09:29Z

did you need to remove anything from setup.cfg?

jbrockmendel · 2018-11-11T17:10:11Z

does isort work for .pyx?

nope, actually mangles cimports pretty badly

did you need to remove anything from setup.cfg?

no

* upstream/master: BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619) DOC: Fixes to docstring to add validation to CI (pandas-dev#23560) DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600) MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592) DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)

…fixed * upstream/master: DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620) BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539) BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619)

…ev#23583)

jbrockmendel added 16 commits November 6, 2018 19:17

use float64_t instead of double

fa38001

make memoryview arguments const where needed; use C NAT lookups inste…

af27870

…ad of python NaT lookups

Merge branch 'master' of https://github.com/pandas-dev/pandas into cln1

83ca237

remove boundschecks

367969e

Merge branch 'master' of https://github.com/pandas-dev/pandas into cln1

16d75a3

de-duplicate using checknull_with_nat

4f4d1cf

remove non-standard imports of np.nan

d1511f7

revert not-worth it NAT, remove extraneous nan

c9ef170

Merge branch 'master' of https://github.com/pandas-dev/pandas into cln1

1445830

standardize iNaT-->NPY_NAT

d46d516

comment cleanup

f2f4b8d

Merge branch 'master' of https://github.com/pandas-dev/pandas into cln1

e7adaf2

remove unncessary cpdef

03ae9cf

Merge branch 'master' of https://github.com/pandas-dev/pandas into cln1

9f51831

delete unused or commented-out

33fc12d

remove unused

8bc71b8

whitespace fixup

5e50897

jreback requested changes Nov 9, 2018

View reviewed changes

jbrockmendel added 2 commits November 10, 2018 08:26

Merge branch 'master' of https://github.com/pandas-dev/pandas into cln1

52ede45

remove unliked comment

23344b2

gfyoung added Internals Related to non-user accessible pandas implementation Clean labels Nov 11, 2018

jreback added this to the 0.24.0 milestone Nov 11, 2018

jreback approved these changes Nov 11, 2018

View reviewed changes

jreback merged commit 4c63f3e into pandas-dev:master Nov 11, 2018

jbrockmendel deleted the cln1 branch November 11, 2018 17:10

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

CLN: use float64_t consistently instead of double, double_t (pandas-d…

ff8130b

…ev#23583)

This was referenced Nov 16, 2018

CLN: commented-out code in _libs/sparse.pyx #23371

Closed

BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows #23688

Merged

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

CLN: use float64_t consistently instead of double, double_t (pandas-d…

f4f56bb

…ev#23583)

jbrockmendel mentioned this pull request Dec 14, 2018

WIP: decorator for ops boilerplate #24282

Closed

4 tasks

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

CLN: use float64_t consistently instead of double, double_t (pandas-d…

ad77427

…ev#23583)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

CLN: use float64_t consistently instead of double, double_t (pandas-d…

d8bf597

…ev#23583)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: use float64_t consistently instead of double, double_t #23583

CLN: use float64_t consistently instead of double, double_t #23583

jbrockmendel commented Nov 8, 2018

pep8speaks commented Nov 8, 2018

jbrockmendel commented Nov 8, 2018

codecov bot commented Nov 9, 2018 •

edited

Loading

jreback Nov 9, 2018

jbrockmendel Nov 9, 2018

jreback Nov 10, 2018

jbrockmendel Nov 10, 2018

jreback Nov 9, 2018

jbrockmendel Nov 9, 2018

jreback Nov 10, 2018

jbrockmendel Nov 10, 2018

jreback Nov 10, 2018

jbrockmendel Nov 10, 2018

jreback commented Nov 11, 2018

jreback commented Nov 11, 2018

jbrockmendel commented Nov 11, 2018

		@@ -197,9 +195,10 @@ def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups):
		return result, counts


		# TODO: redundant with groupby.kth_smallest_c

CLN: use float64_t consistently instead of double, double_t #23583

CLN: use float64_t consistently instead of double, double_t #23583

Conversation

jbrockmendel commented Nov 8, 2018

pep8speaks commented Nov 8, 2018

jbrockmendel commented Nov 8, 2018

codecov bot commented Nov 9, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 11, 2018

jreback commented Nov 11, 2018

jbrockmendel commented Nov 11, 2018

codecov bot commented Nov 9, 2018 •

edited

Loading