PERF: vectorize _interp_limit #16592

TomAugspurger · 2017-06-03T17:46:55Z

Here are the timings vs. master

       before           after         ratio
     [ff0d1f4d]       [6b1552f1]
-      12.3±0.3ms      2.28±0.04ms     0.18  frame_methods.Interpolate.time_interpolate_some_good_infer
-      12.6±0.7ms      1.15±0.01ms     0.09  frame_methods.Interpolate.time_interpolate_some_good
-           1.30s         53.8±1ms     0.04  frame_methods.Interpolate.time_interpolate

I still need to time vs 0.20.1 (will post those timings later, have to run for a bit now). I may have one more spot to optimize.

TomAugspurger · 2017-06-03T17:55:35Z

Also, the master and 0.20.1 versions scale poorly with limit, but that's not benchmarked anywhere. Here's some timings

limit / version	0.20.1	master	interp-perf
None	42.2 ms	1.37 s	43.8 ms
50	1.45 s	1.33 s	66.2 ms
500	1.40 s	1.36 s	62.5 ms

So we're slightly slower than 0.20.1 for the default, but much faster with any limit (notice the units).

I'll add a benchmark for limit and direction. Anyone know if there's a way to run new benchmarks against old commits in asv?

TomAugspurger · 2017-06-03T18:01:22Z

Just finished vs. 0.20.1, and they're essentially the same (I think the first one is my branch, 0.20.1 is second).

· Running 6 total benchmarks (2 commits * 1 environments * 3 benchmarks)
[  0.00%] · For pandas commit hash b2bbd1bf:
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...........................................................................................
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 16.67%] ··· Running frame_methods.Interpolate.time_interpolate                                                                                  44.8±0.3ms
[ 33.33%] ··· Running frame_methods.Interpolate.time_interpolate_some_good                                                                       1.09±0.03ms
[ 50.00%] ··· Running frame_methods.Interpolate.time_interpolate_some_good_infer                                                                 2.41±0.01ms
[ 50.00%] · For pandas commit hash 20baf972:
[ 50.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 66.67%] ··· Running frame_methods.Interpolate.time_interpolate                                                                                  46.7±0.9ms
[ 83.33%] ··· Running frame_methods.Interpolate.time_interpolate_some_good                                                                       1.08±0.02ms
[100.00%] ··· Running frame_methods.Interpolate.time_interpolate_some_good_infer                                                                 2.44±0.07ms
BENCHMARKS NOT SIGNIFICANTLY CHANGED.

jreback · 2017-06-03T18:17:40Z

pandas/core/missing.py

+def _interp_limit0(invalid, fw_limit, bw_limit):
+    "Get idx of values that won't be filled b/c they exceed the limits."
+    for x in np.where(invalid)[0]:
+        if invalid[max(0, x - fw_limit):x + bw_limit + 1].all():


should rewrite this as well

TomAugspurger · 2017-06-03T18:20:16Z

Oh sorry that's not used anymore. Just left it around for testing that my new implementation was the same. I'll remove it.

…

On Jun 3, 2017, at 13:17, Jeff Reback ***@***.***> wrote: @jreback commented on this pull request. In pandas/core/missing.py: > @@ -630,3 +632,65 @@ def fill_zeros(result, x, y, name, fill): result = result.reshape(shape) return result + + +def _interp_limit0(invalid, fw_limit, bw_limit): + "Get idx of values that won't be filled b/c they exceed the limits." + for x in np.where(invalid)[0]: + if invalid[max(0, x - fw_limit):x + bw_limit + 1].all(): should rewrite this as well — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

codecov · 2017-06-03T18:39:05Z

Codecov Report

Merging #16592 into master will increase coverage by 0.17%.
The diff coverage is 90.9%.

@@            Coverage Diff             @@
##           master   #16592      +/-   ##
==========================================
+ Coverage   90.75%   90.92%   +0.17%     
==========================================
  Files         161      161              
  Lines       51097    49265    -1832     
==========================================
- Hits        46372    44795    -1577     
+ Misses       4725     4470     -255

Flag	Coverage Δ
#multiple	`88.68% <90.9%> (+0.09%)`	⬆️
#single	`40.22% <9.09%> (+0.05%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/missing.py	`84.54% <90.9%> (+0.27%)`	⬆️
pandas/core/computation/expressions.py	`0% <0%> (-90.48%)`	⬇️
pandas/io/msgpack/_version.py	`44.65% <0%> (-0.22%)`	⬇️
pandas/io/formats/style.py	`100% <0%> (+4.07%)`	⬆️
pandas/core/util/hashing.py	`98.34% <0%> (+5.37%)`	⬆️
pandas/io/json/json.py	`100% <0%> (+9.63%)`	⬆️
pandas/util/testing.py	`100% <0%> (+19.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31e67d5...b2bbd1b. Read the comment docs.

codecov · 2017-06-03T18:39:05Z

Codecov Report

Merging #16592 into master will increase coverage by 0.17%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16592      +/-   ##
==========================================
+ Coverage   90.75%   90.93%   +0.17%     
==========================================
  Files         161      161              
  Lines       51097    49261    -1836     
==========================================
- Hits        46372    44794    -1578     
+ Misses       4725     4467     -258

Flag	Coverage Δ
#multiple	`88.69% <100%> (+0.1%)`	⬆️
#single	`40.22% <6.89%> (+0.05%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/missing.py	`85.25% <100%> (+0.97%)`	⬆️
pandas/core/computation/expressions.py	`0% <0%> (-90.48%)`	⬇️
pandas/io/msgpack/_version.py	`44.65% <0%> (-0.22%)`	⬇️
pandas/io/formats/style.py	`100% <0%> (+4.07%)`	⬆️
pandas/core/util/hashing.py	`98.34% <0%> (+5.37%)`	⬆️
pandas/io/json/json.py	`100% <0%> (+9.63%)`	⬆️
pandas/util/testing.py	`100% <0%> (+19.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31e67d5...bf88fc7. Read the comment docs.

jreback · 2017-06-03T19:46:32Z

pandas/core/missing.py

+    return f_idx & b_idx
+
+
+def rolling_window(a, window):


maybe move this inside _interp_limit to avoid confusion that this is public and/or used externally

jreback

this all looks fine
i suspect a pure cython impl might be easier to read though (maybe even faster)

can tackle later if interest

TomAugspurger · 2017-06-04T01:28:51Z

i suspect a pure cython impl might be easier to read though (maybe even faster)

Yeah, it's kind of unreadable, and I have no idea what that striding stuff is doing.

I'll leave this open if you want to take a look. I'll tag 0.20.2 early tomorrow morning.
We can also do the cython for 0.21

jreback · 2017-06-04T01:32:13Z

no this is all fine maybe make an issue for 0.21 to have a look

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation (cherry picked from commit 473615e)

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation

Version 0.20.2 * tag 'v0.20.2': (68 commits) RLS: v0.20.2 DOC: Update release.rst DOC: Whatsnew fixups (pandas-dev#16596) ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460) BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444) PERF: vectorize _interp_limit (pandas-dev#16592) DOC: whatsnew 0.20.2 edits (pandas-dev#16587) API: Make is_strictly_monotonic_* private (pandas-dev#16576) BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565) Strictly monotonic (pandas-dev#16555) ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026) fix linting BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244) BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317) return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486) BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549) BUG: Fixed tput output on windows (pandas-dev#16496) Strictly monotonic (pandas-dev#16555) BUG: fixed wrong order of ordered labels in pd.cut() BUG: Fixed to_html ignoring index_names parameter ...

PERF: vectorize _interp_limit

b2bbd1b

jreback reviewed Jun 3, 2017

View reviewed changes

CLN: remove old implementation

dc956e8

TomAugspurger added this to the 0.20.2 milestone Jun 3, 2017

TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance labels Jun 3, 2017

jreback reviewed Jun 3, 2017

View reviewed changes

jreback approved these changes Jun 4, 2017

View reviewed changes

fixup! CLN: remove old implementation

bf88fc7

TomAugspurger added the Needs Backport label Jun 4, 2017

TomAugspurger merged commit 473615e into pandas-dev:master Jun 4, 2017

TomAugspurger added a commit that referenced this pull request Jun 4, 2017

PERF: vectorize _interp_limit (#16592)

74d2ffa

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation (cherry picked from commit 473615e)

TomAugspurger removed the Needs Backport label Jun 4, 2017

TomAugspurger deleted the interp-perf branch June 4, 2017 20:29

TomAugspurger mentioned this pull request Jun 5, 2017

PERF: Regressions between 0.20.1 and 0.20.x #16584

Closed

Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017

PERF: vectorize _interp_limit (pandas-dev#16592)

9d0be9d

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

PERF: vectorize _interp_limit (pandas-dev#16592)

dd930d6

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: vectorize _interp_limit #16592

PERF: vectorize _interp_limit #16592

TomAugspurger commented Jun 3, 2017

TomAugspurger commented Jun 3, 2017 •

edited

Loading

TomAugspurger commented Jun 3, 2017 •

edited

Loading

jreback Jun 3, 2017

TomAugspurger commented Jun 3, 2017 via email

codecov bot commented Jun 3, 2017

codecov bot commented Jun 3, 2017 •

edited

Loading

jreback Jun 3, 2017

jreback left a comment

TomAugspurger commented Jun 4, 2017

jreback commented Jun 4, 2017

PERF: vectorize _interp_limit #16592

PERF: vectorize _interp_limit #16592

Conversation

TomAugspurger commented Jun 3, 2017

TomAugspurger commented Jun 3, 2017 • edited Loading

TomAugspurger commented Jun 3, 2017 • edited Loading

jreback Jun 3, 2017

Choose a reason for hiding this comment

TomAugspurger commented Jun 3, 2017 via email

codecov bot commented Jun 3, 2017

Codecov Report

codecov bot commented Jun 3, 2017 • edited Loading

Codecov Report

jreback Jun 3, 2017

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 4, 2017

jreback commented Jun 4, 2017

TomAugspurger commented Jun 3, 2017 •

edited

Loading

TomAugspurger commented Jun 3, 2017 •

edited

Loading

codecov bot commented Jun 3, 2017 •

edited

Loading