-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Btreemap iter intertwined #71510
Btreemap iter intertwined #71510
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me -- while looking at this I also thought #71523 would be a nice cleanup, so posted that separately.
Not sure if the draft status is intended to be indicative that you're planning more work here. I'll hold off on approving until you ping me confirming you're happy with this. |
I made it draft partly because I was wondering whether the CI would succeed, since |
92d2969
to
8730227
Compare
Okay, I adapted the 2nd and 3rd commit to #71523 and squashed them, and it looks neat enough to me now. Performance now says:
so |
@bors r+ |
📌 Commit 8730227 has been approved by |
…r=Mark-Simulacrum Btreemap iter intertwined 3 commits: 1. Introduced benchmarks for `BTreeMap::iter()`. Benchmarks named `iter_20` were of the whole iteration process, so I renamed them. Also the benchmarks of `range` that I wrote earlier weren't very good. I included an (awkwardly named) one that compares `iter()` to `range(..)` on the same set, because the contrast is surprising: ``` name ns/iter btree::map::range_unbounded_unbounded 28,176 btree::map::range_unbounded_vs_iter 89,369 ``` Both dig up the same pair of leaf edges. `range(..)` also checks that some keys are correctly ordered, the only thing `iter()` does more is to copy the map's length. 2. Slightly refactoring the code to what I find more readable (not in chronological order of discovery), boosts performance: ``` >cargo-benchcmp.exe benchcmp a1 a2 --threshold 5 name a1 ns/iter a2 ns/iter diff ns/iter diff % speedup btree::map::find_rand_100 18 17 -1 -5.56% x 1.06 btree::map::first_and_last_10k 64 71 7 10.94% x 0.90 btree::map::iter_0 2,939 2,209 -730 -24.84% x 1.33 btree::map::iter_1 6,845 2,696 -4,149 -60.61% x 2.54 btree::map::iter_100 8,556 3,672 -4,884 -57.08% x 2.33 btree::map::iter_10k 9,292 5,884 -3,408 -36.68% x 1.58 btree::map::iter_1m 10,268 6,510 -3,758 -36.60% x 1.58 btree::map::iteration_mut_100000 478,575 453,050 -25,525 -5.33% x 1.06 btree::map::range_unbounded_unbounded 28,176 36,169 7,993 28.37% x 0.78 btree::map::range_unbounded_vs_iter 89,369 38,290 -51,079 -57.16% x 2.33 btree::set::clone_100_and_remove_all 4,801 4,245 -556 -11.58% x 1.13 btree::set::clone_10k_and_remove_all 529,450 496,030 -33,420 -6.31% x 1.07 ``` But you can tell from the `range_unbounded_*` lines that, despite an unwarranted, vengeful attack on the range_unbounded_unbounded benchmark, this change still doesn't allow `iter()` to catch up with `range(..)`. 3. I guess that `range(..)` copes so well because it intertwines the leftmost and rightmost descend towards leaf edges, doing the two root node accesses close together, perhaps exploiting a CPU's internal pipelining? So the third commit distils a version of `range_search` (which we can't use directly because of the `Ord` bound), and we get another boost: ``` cargo-benchcmp.exe benchcmp a2 a3 --threshold 5 name a2 ns/iter a3 ns/iter diff ns/iter diff % speedup btree::map::first_and_last_100 40 43 3 7.50% x 0.93 btree::map::first_and_last_10k 71 64 -7 -9.86% x 1.11 btree::map::iter_0 2,209 1,719 -490 -22.18% x 1.29 btree::map::iter_1 2,696 2,205 -491 -18.21% x 1.22 btree::map::iter_100 3,672 2,943 -729 -19.85% x 1.25 btree::map::iter_10k 5,884 3,929 -1,955 -33.23% x 1.50 btree::map::iter_1m 6,510 5,532 -978 -15.02% x 1.18 btree::map::iteration_mut_100000 453,050 476,667 23,617 5.21% x 0.95 btree::map::range_included_excluded 405,075 371,297 -33,778 -8.34% x 1.09 btree::map::range_included_included 427,577 397,440 -30,137 -7.05% x 1.08 btree::map::range_unbounded_unbounded 36,169 28,175 -7,994 -22.10% x 1.28 btree::map::range_unbounded_vs_iter 38,290 30,838 -7,452 -19.46% x 1.24 ``` But I think this is just fake news from the microbenchmarking media. `iter()` is still trying to catch up with `range(..)`. And we can sure do without another function. So I would skip this 3rd commit. r? @Mark-Simulacrum
…r=Mark-Simulacrum Btreemap iter intertwined 3 commits: 1. Introduced benchmarks for `BTreeMap::iter()`. Benchmarks named `iter_20` were of the whole iteration process, so I renamed them. Also the benchmarks of `range` that I wrote earlier weren't very good. I included an (awkwardly named) one that compares `iter()` to `range(..)` on the same set, because the contrast is surprising: ``` name ns/iter btree::map::range_unbounded_unbounded 28,176 btree::map::range_unbounded_vs_iter 89,369 ``` Both dig up the same pair of leaf edges. `range(..)` also checks that some keys are correctly ordered, the only thing `iter()` does more is to copy the map's length. 2. Slightly refactoring the code to what I find more readable (not in chronological order of discovery), boosts performance: ``` >cargo-benchcmp.exe benchcmp a1 a2 --threshold 5 name a1 ns/iter a2 ns/iter diff ns/iter diff % speedup btree::map::find_rand_100 18 17 -1 -5.56% x 1.06 btree::map::first_and_last_10k 64 71 7 10.94% x 0.90 btree::map::iter_0 2,939 2,209 -730 -24.84% x 1.33 btree::map::iter_1 6,845 2,696 -4,149 -60.61% x 2.54 btree::map::iter_100 8,556 3,672 -4,884 -57.08% x 2.33 btree::map::iter_10k 9,292 5,884 -3,408 -36.68% x 1.58 btree::map::iter_1m 10,268 6,510 -3,758 -36.60% x 1.58 btree::map::iteration_mut_100000 478,575 453,050 -25,525 -5.33% x 1.06 btree::map::range_unbounded_unbounded 28,176 36,169 7,993 28.37% x 0.78 btree::map::range_unbounded_vs_iter 89,369 38,290 -51,079 -57.16% x 2.33 btree::set::clone_100_and_remove_all 4,801 4,245 -556 -11.58% x 1.13 btree::set::clone_10k_and_remove_all 529,450 496,030 -33,420 -6.31% x 1.07 ``` But you can tell from the `range_unbounded_*` lines that, despite an unwarranted, vengeful attack on the range_unbounded_unbounded benchmark, this change still doesn't allow `iter()` to catch up with `range(..)`. 3. I guess that `range(..)` copes so well because it intertwines the leftmost and rightmost descend towards leaf edges, doing the two root node accesses close together, perhaps exploiting a CPU's internal pipelining? So the third commit distils a version of `range_search` (which we can't use directly because of the `Ord` bound), and we get another boost: ``` cargo-benchcmp.exe benchcmp a2 a3 --threshold 5 name a2 ns/iter a3 ns/iter diff ns/iter diff % speedup btree::map::first_and_last_100 40 43 3 7.50% x 0.93 btree::map::first_and_last_10k 71 64 -7 -9.86% x 1.11 btree::map::iter_0 2,209 1,719 -490 -22.18% x 1.29 btree::map::iter_1 2,696 2,205 -491 -18.21% x 1.22 btree::map::iter_100 3,672 2,943 -729 -19.85% x 1.25 btree::map::iter_10k 5,884 3,929 -1,955 -33.23% x 1.50 btree::map::iter_1m 6,510 5,532 -978 -15.02% x 1.18 btree::map::iteration_mut_100000 453,050 476,667 23,617 5.21% x 0.95 btree::map::range_included_excluded 405,075 371,297 -33,778 -8.34% x 1.09 btree::map::range_included_included 427,577 397,440 -30,137 -7.05% x 1.08 btree::map::range_unbounded_unbounded 36,169 28,175 -7,994 -22.10% x 1.28 btree::map::range_unbounded_vs_iter 38,290 30,838 -7,452 -19.46% x 1.24 ``` But I think this is just fake news from the microbenchmarking media. `iter()` is still trying to catch up with `range(..)`. And we can sure do without another function. So I would skip this 3rd commit. r? @Mark-Simulacrum
…r=Mark-Simulacrum Btreemap iter intertwined 3 commits: 1. Introduced benchmarks for `BTreeMap::iter()`. Benchmarks named `iter_20` were of the whole iteration process, so I renamed them. Also the benchmarks of `range` that I wrote earlier weren't very good. I included an (awkwardly named) one that compares `iter()` to `range(..)` on the same set, because the contrast is surprising: ``` name ns/iter btree::map::range_unbounded_unbounded 28,176 btree::map::range_unbounded_vs_iter 89,369 ``` Both dig up the same pair of leaf edges. `range(..)` also checks that some keys are correctly ordered, the only thing `iter()` does more is to copy the map's length. 2. Slightly refactoring the code to what I find more readable (not in chronological order of discovery), boosts performance: ``` >cargo-benchcmp.exe benchcmp a1 a2 --threshold 5 name a1 ns/iter a2 ns/iter diff ns/iter diff % speedup btree::map::find_rand_100 18 17 -1 -5.56% x 1.06 btree::map::first_and_last_10k 64 71 7 10.94% x 0.90 btree::map::iter_0 2,939 2,209 -730 -24.84% x 1.33 btree::map::iter_1 6,845 2,696 -4,149 -60.61% x 2.54 btree::map::iter_100 8,556 3,672 -4,884 -57.08% x 2.33 btree::map::iter_10k 9,292 5,884 -3,408 -36.68% x 1.58 btree::map::iter_1m 10,268 6,510 -3,758 -36.60% x 1.58 btree::map::iteration_mut_100000 478,575 453,050 -25,525 -5.33% x 1.06 btree::map::range_unbounded_unbounded 28,176 36,169 7,993 28.37% x 0.78 btree::map::range_unbounded_vs_iter 89,369 38,290 -51,079 -57.16% x 2.33 btree::set::clone_100_and_remove_all 4,801 4,245 -556 -11.58% x 1.13 btree::set::clone_10k_and_remove_all 529,450 496,030 -33,420 -6.31% x 1.07 ``` But you can tell from the `range_unbounded_*` lines that, despite an unwarranted, vengeful attack on the range_unbounded_unbounded benchmark, this change still doesn't allow `iter()` to catch up with `range(..)`. 3. I guess that `range(..)` copes so well because it intertwines the leftmost and rightmost descend towards leaf edges, doing the two root node accesses close together, perhaps exploiting a CPU's internal pipelining? So the third commit distils a version of `range_search` (which we can't use directly because of the `Ord` bound), and we get another boost: ``` cargo-benchcmp.exe benchcmp a2 a3 --threshold 5 name a2 ns/iter a3 ns/iter diff ns/iter diff % speedup btree::map::first_and_last_100 40 43 3 7.50% x 0.93 btree::map::first_and_last_10k 71 64 -7 -9.86% x 1.11 btree::map::iter_0 2,209 1,719 -490 -22.18% x 1.29 btree::map::iter_1 2,696 2,205 -491 -18.21% x 1.22 btree::map::iter_100 3,672 2,943 -729 -19.85% x 1.25 btree::map::iter_10k 5,884 3,929 -1,955 -33.23% x 1.50 btree::map::iter_1m 6,510 5,532 -978 -15.02% x 1.18 btree::map::iteration_mut_100000 453,050 476,667 23,617 5.21% x 0.95 btree::map::range_included_excluded 405,075 371,297 -33,778 -8.34% x 1.09 btree::map::range_included_included 427,577 397,440 -30,137 -7.05% x 1.08 btree::map::range_unbounded_unbounded 36,169 28,175 -7,994 -22.10% x 1.28 btree::map::range_unbounded_vs_iter 38,290 30,838 -7,452 -19.46% x 1.24 ``` But I think this is just fake news from the microbenchmarking media. `iter()` is still trying to catch up with `range(..)`. And we can sure do without another function. So I would skip this 3rd commit. r? @Mark-Simulacrum
…r=Mark-Simulacrum Btreemap iter intertwined 3 commits: 1. Introduced benchmarks for `BTreeMap::iter()`. Benchmarks named `iter_20` were of the whole iteration process, so I renamed them. Also the benchmarks of `range` that I wrote earlier weren't very good. I included an (awkwardly named) one that compares `iter()` to `range(..)` on the same set, because the contrast is surprising: ``` name ns/iter btree::map::range_unbounded_unbounded 28,176 btree::map::range_unbounded_vs_iter 89,369 ``` Both dig up the same pair of leaf edges. `range(..)` also checks that some keys are correctly ordered, the only thing `iter()` does more is to copy the map's length. 2. Slightly refactoring the code to what I find more readable (not in chronological order of discovery), boosts performance: ``` >cargo-benchcmp.exe benchcmp a1 a2 --threshold 5 name a1 ns/iter a2 ns/iter diff ns/iter diff % speedup btree::map::find_rand_100 18 17 -1 -5.56% x 1.06 btree::map::first_and_last_10k 64 71 7 10.94% x 0.90 btree::map::iter_0 2,939 2,209 -730 -24.84% x 1.33 btree::map::iter_1 6,845 2,696 -4,149 -60.61% x 2.54 btree::map::iter_100 8,556 3,672 -4,884 -57.08% x 2.33 btree::map::iter_10k 9,292 5,884 -3,408 -36.68% x 1.58 btree::map::iter_1m 10,268 6,510 -3,758 -36.60% x 1.58 btree::map::iteration_mut_100000 478,575 453,050 -25,525 -5.33% x 1.06 btree::map::range_unbounded_unbounded 28,176 36,169 7,993 28.37% x 0.78 btree::map::range_unbounded_vs_iter 89,369 38,290 -51,079 -57.16% x 2.33 btree::set::clone_100_and_remove_all 4,801 4,245 -556 -11.58% x 1.13 btree::set::clone_10k_and_remove_all 529,450 496,030 -33,420 -6.31% x 1.07 ``` But you can tell from the `range_unbounded_*` lines that, despite an unwarranted, vengeful attack on the range_unbounded_unbounded benchmark, this change still doesn't allow `iter()` to catch up with `range(..)`. 3. I guess that `range(..)` copes so well because it intertwines the leftmost and rightmost descend towards leaf edges, doing the two root node accesses close together, perhaps exploiting a CPU's internal pipelining? So the third commit distils a version of `range_search` (which we can't use directly because of the `Ord` bound), and we get another boost: ``` cargo-benchcmp.exe benchcmp a2 a3 --threshold 5 name a2 ns/iter a3 ns/iter diff ns/iter diff % speedup btree::map::first_and_last_100 40 43 3 7.50% x 0.93 btree::map::first_and_last_10k 71 64 -7 -9.86% x 1.11 btree::map::iter_0 2,209 1,719 -490 -22.18% x 1.29 btree::map::iter_1 2,696 2,205 -491 -18.21% x 1.22 btree::map::iter_100 3,672 2,943 -729 -19.85% x 1.25 btree::map::iter_10k 5,884 3,929 -1,955 -33.23% x 1.50 btree::map::iter_1m 6,510 5,532 -978 -15.02% x 1.18 btree::map::iteration_mut_100000 453,050 476,667 23,617 5.21% x 0.95 btree::map::range_included_excluded 405,075 371,297 -33,778 -8.34% x 1.09 btree::map::range_included_included 427,577 397,440 -30,137 -7.05% x 1.08 btree::map::range_unbounded_unbounded 36,169 28,175 -7,994 -22.10% x 1.28 btree::map::range_unbounded_vs_iter 38,290 30,838 -7,452 -19.46% x 1.24 ``` But I think this is just fake news from the microbenchmarking media. `iter()` is still trying to catch up with `range(..)`. And we can sure do without another function. So I would skip this 3rd commit. r? @Mark-Simulacrum
Rollup of 6 pull requests Successful merges: - rust-lang#71510 (Btreemap iter intertwined) - rust-lang#71727 (SipHasher with keys initialized to 0 should just use new()) - rust-lang#71889 (Explain our RwLock implementation) - rust-lang#71905 (Add command aliases from Cargo to x.py commands) - rust-lang#71914 (Backport 1.43.1 release notes to master) - rust-lang#71921 (explain the types used in the open64 call) Failed merges: r? @ghost
3 commits:
BTreeMap::iter()
. Benchmarks namediter_20
were of the whole iteration process, so I renamed them. Also the benchmarks ofrange
that I wrote earlier weren't very good. I included an (awkwardly named) one that comparesiter()
torange(..)
on the same set, because the contrast is surprising:Both dig up the same pair of leaf edges.
range(..)
also checks that some keys are correctly ordered, the only thingiter()
does more is to copy the map's length.But you can tell from the
range_unbounded_*
lines that, despite an unwarranted, vengeful attack on the range_unbounded_unbounded benchmark, this change still doesn't allowiter()
to catch up withrange(..)
.range(..)
copes so well because it intertwines the leftmost and rightmost descend towards leaf edges, doing the two root node accesses close together, perhaps exploiting a CPU's internal pipelining? So the third commit distils a version ofrange_search
(which we can't use directly because of theOrd
bound), and we get another boost:But I think this is just fake news from the microbenchmarking media.
iter()
is still trying to catch up withrange(..)
. And we can sure do without another function. So I would skip this 3rd commit.r? @Mark-Simulacrum