-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: vectorize _interp_limit #16592
Conversation
Also, the master and 0.20.1 versions scale poorly with
So we're slightly slower than 0.20.1 for the default, but much faster with any limit (notice the units). I'll add a benchmark for |
Just finished vs. 0.20.1, and they're essentially the same (I think the first one is my branch, 0.20.1 is second).
|
pandas/core/missing.py
Outdated
def _interp_limit0(invalid, fw_limit, bw_limit): | ||
"Get idx of values that won't be filled b/c they exceed the limits." | ||
for x in np.where(invalid)[0]: | ||
if invalid[max(0, x - fw_limit):x + bw_limit + 1].all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should rewrite this as well
Oh sorry that's not used anymore. Just left it around for testing that my new implementation was the same. I'll remove it.
… On Jun 3, 2017, at 13:17, Jeff Reback ***@***.***> wrote:
@jreback commented on this pull request.
In pandas/core/missing.py:
> @@ -630,3 +632,65 @@ def fill_zeros(result, x, y, name, fill):
result = result.reshape(shape)
return result
+
+
+def _interp_limit0(invalid, fw_limit, bw_limit):
+ "Get idx of values that won't be filled b/c they exceed the limits."
+ for x in np.where(invalid)[0]:
+ if invalid[max(0, x - fw_limit):x + bw_limit + 1].all():
should rewrite this as well
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Codecov Report
@@ Coverage Diff @@
## master #16592 +/- ##
==========================================
+ Coverage 90.75% 90.92% +0.17%
==========================================
Files 161 161
Lines 51097 49265 -1832
==========================================
- Hits 46372 44795 -1577
+ Misses 4725 4470 -255
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16592 +/- ##
==========================================
+ Coverage 90.75% 90.93% +0.17%
==========================================
Files 161 161
Lines 51097 49261 -1836
==========================================
- Hits 46372 44794 -1578
+ Misses 4725 4467 -258
Continue to review full report at Codecov.
|
pandas/core/missing.py
Outdated
return f_idx & b_idx | ||
|
||
|
||
def rolling_window(a, window): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move this inside _interp_limit to avoid confusion that this is public and/or used externally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all looks fine
i suspect a pure cython impl might be easier to read though (maybe even faster)
can tackle later if interest
Yeah, it's kind of unreadable, and I have no idea what that striding stuff is doing. I'll leave this open if you want to take a look. I'll tag 0.20.2 early tomorrow morning. |
no this is all fine maybe make an issue for 0.21 to have a look |
* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation (cherry picked from commit 473615e)
* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation (cherry picked from commit 473615e)
* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation
* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation
Version 0.20.2 * tag 'v0.20.2': (68 commits) RLS: v0.20.2 DOC: Update release.rst DOC: Whatsnew fixups (pandas-dev#16596) ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460) BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444) PERF: vectorize _interp_limit (pandas-dev#16592) DOC: whatsnew 0.20.2 edits (pandas-dev#16587) API: Make is_strictly_monotonic_* private (pandas-dev#16576) BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565) Strictly monotonic (pandas-dev#16555) ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026) fix linting BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244) BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317) return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486) BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549) BUG: Fixed tput output on windows (pandas-dev#16496) Strictly monotonic (pandas-dev#16555) BUG: fixed wrong order of ordered labels in pd.cut() BUG: Fixed to_html ignoring index_names parameter ...
xref #16584
Here are the timings vs. master
I still need to time vs 0.20.1 (will post those timings later, have to run for a bit now). I may have one more spot to optimize.