Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: use float64_t consistently instead of double, double_t #23583

Merged
merged 19 commits into from
Nov 11, 2018

Conversation

jbrockmendel
Copy link
Member

remove some commented-out or otherwise unused code

disable boundscheck/wraparound in a couple places in tslib where it is safe

add const modifiers to some of the memoryview functions so they don't raise if we ever pass read-only arrays to them

standardized NPY_NAT as always being the cdef int64_t version and iNaT as being the python-namespace version

remove duplicated NaN/nan constants

remove non-standard imports of np.nan in some test files

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@jbrockmendel
Copy link
Member Author

Closes #23371

@codecov
Copy link

codecov bot commented Nov 9, 2018

Codecov Report

Merging #23583 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23583   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files         161      161           
  Lines       51277    51277           
=======================================
  Hits        47305    47305           
  Misses       3972     3972
Flag Coverage Δ
#multiple 90.63% <ø> (ø) ⬆️
#single 42.32% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ed92ef...23344b2. Read the comment docs.

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil:
def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think will have a big perf slowdown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its never used in cython and nogil isnt allowed for def functions. There is a with nogil block just below this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls test i once changed this (and tried to remove) and was all negative

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indistinguishable:

master:

In [3]: arr = np.arange(10000, dtype=np.int64)

In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 165.65 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.42 µs per loop

In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.3 µs per loop

In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 4.21 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.2 µs per loop

PR:

In [3]: arr = np.arange(10000, dtype=np.int64)

In [4]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 12.06 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.71 µs per loop

In [5]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 9.48 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.6 µs per loop

In [6]: %timeit pd._libs.algos.kth_smallest(arr, 4)
The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.95 µs per loop

Similar for other dtypes

@@ -197,9 +195,10 @@ def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups):
return result, counts


# TODO: redundant with groupby.kth_smallest_c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The typing is more specific in the groupby version, but the code itself is nearly identical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to remove and you will see why it’s here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get rid of the comment, but am kind of surprised: you're usually Holy Crusader against redundant code, and the body of this function is very nearly copy/pasted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is i don’t think it’s easy to remove
you are more than welcome to try

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're right. I removed the comment.

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Clean labels Nov 11, 2018
@jreback jreback added this to the 0.24.0 milestone Nov 11, 2018
@jreback jreback merged commit 4c63f3e into pandas-dev:master Nov 11, 2018
@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

thanks. does isort work for .pyx? if so can you create an issue to enable it?

@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

did you need to remove anything from setup.cfg?

@jbrockmendel
Copy link
Member Author

does isort work for .pyx?

nope, actually mangles cimports pretty badly

did you need to remove anything from setup.cfg?

no

@jbrockmendel jbrockmendel deleted the cln1 branch November 11, 2018 17:10
thoo added a commit to thoo/pandas that referenced this pull request Nov 11, 2018
* upstream/master:
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
  DOC: Fixes to docstring to add validation to CI (pandas-dev#23560)
  DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600)
  MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592)
  DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)
thoo added a commit to thoo/pandas that referenced this pull request Nov 12, 2018
…fixed

* upstream/master:
  DOC: Enhancing pivot / reshape docs (pandas-dev#21038)
  TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)
  BUILD: Simplifying contributor dependencies (pandas-dev#23522)
  BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539)
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants