-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use partition_point instead of binary_search when looking up source lines #101999
Conversation
r? @davidtwco (rust-highfive has picked a reviewer for you, use r? to override) |
let's see if perf agrees. @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
This comment has been minimized.
This comment has been minimized.
…ines In local benchmarks this results in 0.4% fewer cycles in a critical sequential section when compiling libcore.
3619b0c
to
40b3726
Compare
@bors try |
⌛ Trying commit 40b3726 with merge 28a407643f8543f58b9b3cc67e70649d5827ae9c... |
☀️ Try build successful - checks-actions |
Queued 28a407643f8543f58b9b3cc67e70649d5827ae9c with parent a37499a, future comparison URL. |
Finished benchmarking commit (28a407643f8543f58b9b3cc67e70649d5827ae9c): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (3e50038): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
@the8472: Nice improvement! How did you find it? |
I'm also surprised it's such a good win, given that |
I noticed that libcore has a long sequential bottleneck and ran it under perf and filtered the results down to the bottlenecked thread. The binary search stood out as something that I was familiar with. Almost all of the bottleneck is still there if you want to take a stab 🗡️.
It turns an unpredictable 3-way branch (lt, eq, gt) into an also unpredictable 2-way branch. Which I think then get turned into cmovs, but I didn't check the assembly. |
@nnethercote (On average, the item is found only in the last iteration, since all the iterations before the last one only looked at half the items. Thus checking for equality to early-exit makes all the iterations slower but on average only saves one iteration. It also adds a second exit condition to the loop, which makes it harder to canonicalize and thus can keep LLVM from being able to unroll it.) |
…r=m-ou-se More slice::partition_point examples After seeing the discussion of `binary_search` vs `partition_point` in rust-lang#101999, I thought some more example code could be helpful.
…r=m-ou-se More slice::partition_point examples After seeing the discussion of `binary_search` vs `partition_point` in rust-lang#101999, I thought some more example code could be helpful.
In local benchmarks this results in 0.4% fewer cycles in a critical sequential section when compiling libcore.