-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/char: Speed up to_digit()
for radix <= 10
#55932
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Consider whether there should also be a benchmark for non-constant radix. I assume this will make that path slower, which is probably a good trade-off, but sounds worth quantifying. |
@scottmcm yeah, I was wondering about that too. I'm not sure how to write such a benchmark though. I assume if I use the following code it would just inline the variable and use the constant radix path too: let radix = 8u32;
c.to_digit(radix); any clues on how I can make sure that the compiler does not consider this a constant? would |
@Xanewok good idea. I assume that was just for blackboxing the output, but it seems it might also work for input. I'll try it out and report back 🤔 |
### Before ``` # Run 1 test char::methods::bench_to_digit_radix_10 ... bench: 16,265 ns/iter (+/- 1,774) test char::methods::bench_to_digit_radix_16 ... bench: 13,938 ns/iter (+/- 2,479) test char::methods::bench_to_digit_radix_2 ... bench: 13,090 ns/iter (+/- 524) test char::methods::bench_to_digit_radix_36 ... bench: 14,236 ns/iter (+/- 1,949) # Run 2 test char::methods::bench_to_digit_radix_10 ... bench: 16,176 ns/iter (+/- 1,589) test char::methods::bench_to_digit_radix_16 ... bench: 13,896 ns/iter (+/- 3,140) test char::methods::bench_to_digit_radix_2 ... bench: 13,158 ns/iter (+/- 1,112) test char::methods::bench_to_digit_radix_36 ... bench: 14,206 ns/iter (+/- 1,312) # Run 3 test char::methods::bench_to_digit_radix_10 ... bench: 16,221 ns/iter (+/- 2,423) test char::methods::bench_to_digit_radix_16 ... bench: 14,361 ns/iter (+/- 3,926) test char::methods::bench_to_digit_radix_2 ... bench: 13,097 ns/iter (+/- 671) test char::methods::bench_to_digit_radix_36 ... bench: 14,388 ns/iter (+/- 1,068) ``` ### After ``` # Run 1 test char::methods::bench_to_digit_radix_10 ... bench: 11,521 ns/iter (+/- 552) test char::methods::bench_to_digit_radix_16 ... bench: 12,926 ns/iter (+/- 684) test char::methods::bench_to_digit_radix_2 ... bench: 11,266 ns/iter (+/- 1,085) test char::methods::bench_to_digit_radix_36 ... bench: 14,213 ns/iter (+/- 614) # Run 2 test char::methods::bench_to_digit_radix_10 ... bench: 11,424 ns/iter (+/- 1,042) test char::methods::bench_to_digit_radix_16 ... bench: 12,854 ns/iter (+/- 1,193) test char::methods::bench_to_digit_radix_2 ... bench: 11,193 ns/iter (+/- 716) test char::methods::bench_to_digit_radix_36 ... bench: 14,249 ns/iter (+/- 3,514) # Run 3 test char::methods::bench_to_digit_radix_10 ... bench: 11,469 ns/iter (+/- 685) test char::methods::bench_to_digit_radix_16 ... bench: 12,852 ns/iter (+/- 568) test char::methods::bench_to_digit_radix_2 ... bench: 11,275 ns/iter (+/- 1,356) test char::methods::bench_to_digit_radix_36 ... bench: 14,188 ns/iter (+/- 1,501) ```
@scottmcm @Xanewok these are the results including the new Before
After
as predicted the variable radix case is slightly slower than before, but that seems like a good tradeoff to me as variable radix is probably much more rare than a constant radix. |
This seems to perform equally well
📌 Commit 7843e27 has been approved by |
core/char: Speed up `to_digit()` for `radix <= 10` I noticed that `char::to_digit()` seemed to do a bit of extra work for handling `[a-zA-Z]` characters. Since `to_digit(10)` seems to be the most common case (at least in the `rust` codebase) I thought it might be valuable to create a fast path for that case, and according to the benchmarks that I added in one of the commits it seems to pay off. I also created another fast path for the `radix < 10` case, which also seems to have a positive effect. It is very well possible that I'm measuring something entirely unrelated though, so please verify these numbers and let me know if I missed something! ### Before ``` # Run 1 test char::methods::bench_to_digit_radix_10 ... bench: 16,265 ns/iter (+/- 1,774) test char::methods::bench_to_digit_radix_16 ... bench: 13,938 ns/iter (+/- 2,479) test char::methods::bench_to_digit_radix_2 ... bench: 13,090 ns/iter (+/- 524) test char::methods::bench_to_digit_radix_36 ... bench: 14,236 ns/iter (+/- 1,949) # Run 2 test char::methods::bench_to_digit_radix_10 ... bench: 16,176 ns/iter (+/- 1,589) test char::methods::bench_to_digit_radix_16 ... bench: 13,896 ns/iter (+/- 3,140) test char::methods::bench_to_digit_radix_2 ... bench: 13,158 ns/iter (+/- 1,112) test char::methods::bench_to_digit_radix_36 ... bench: 14,206 ns/iter (+/- 1,312) # Run 3 test char::methods::bench_to_digit_radix_10 ... bench: 16,221 ns/iter (+/- 2,423) test char::methods::bench_to_digit_radix_16 ... bench: 14,361 ns/iter (+/- 3,926) test char::methods::bench_to_digit_radix_2 ... bench: 13,097 ns/iter (+/- 671) test char::methods::bench_to_digit_radix_36 ... bench: 14,388 ns/iter (+/- 1,068) ``` ### After ``` # Run 1 test char::methods::bench_to_digit_radix_10 ... bench: 11,521 ns/iter (+/- 552) test char::methods::bench_to_digit_radix_16 ... bench: 12,926 ns/iter (+/- 684) test char::methods::bench_to_digit_radix_2 ... bench: 11,266 ns/iter (+/- 1,085) test char::methods::bench_to_digit_radix_36 ... bench: 14,213 ns/iter (+/- 614) # Run 2 test char::methods::bench_to_digit_radix_10 ... bench: 11,424 ns/iter (+/- 1,042) test char::methods::bench_to_digit_radix_16 ... bench: 12,854 ns/iter (+/- 1,193) test char::methods::bench_to_digit_radix_2 ... bench: 11,193 ns/iter (+/- 716) test char::methods::bench_to_digit_radix_36 ... bench: 14,249 ns/iter (+/- 3,514) # Run 3 test char::methods::bench_to_digit_radix_10 ... bench: 11,469 ns/iter (+/- 685) test char::methods::bench_to_digit_radix_16 ... bench: 12,852 ns/iter (+/- 568) test char::methods::bench_to_digit_radix_2 ... bench: 11,275 ns/iter (+/- 1,356) test char::methods::bench_to_digit_radix_36 ... bench: 14,188 ns/iter (+/- 1,501) ``` I ran the benchmark using: ```sh python x.py bench src/libcore --stage 1 --keep-stage 0 --test-args "bench_to_digit" ```
Rollup of 16 pull requests Successful merges: - #54906 (Reattach all grandchildren when constructing specialization graph.) - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55827 (A few tweaks to iterations/collecting) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55935 (appveyor: Use VS2017 for all our images) - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs) - #55948 (submodules: update clippy from d8b4269 to 7e0ddef) - #55956 (add tests for some fixed ICEs)
core/char: Speed up `to_digit()` for `radix <= 10` I noticed that `char::to_digit()` seemed to do a bit of extra work for handling `[a-zA-Z]` characters. Since `to_digit(10)` seems to be the most common case (at least in the `rust` codebase) I thought it might be valuable to create a fast path for that case, and according to the benchmarks that I added in one of the commits it seems to pay off. I also created another fast path for the `radix < 10` case, which also seems to have a positive effect. It is very well possible that I'm measuring something entirely unrelated though, so please verify these numbers and let me know if I missed something! ### Before ``` # Run 1 test char::methods::bench_to_digit_radix_10 ... bench: 16,265 ns/iter (+/- 1,774) test char::methods::bench_to_digit_radix_16 ... bench: 13,938 ns/iter (+/- 2,479) test char::methods::bench_to_digit_radix_2 ... bench: 13,090 ns/iter (+/- 524) test char::methods::bench_to_digit_radix_36 ... bench: 14,236 ns/iter (+/- 1,949) # Run 2 test char::methods::bench_to_digit_radix_10 ... bench: 16,176 ns/iter (+/- 1,589) test char::methods::bench_to_digit_radix_16 ... bench: 13,896 ns/iter (+/- 3,140) test char::methods::bench_to_digit_radix_2 ... bench: 13,158 ns/iter (+/- 1,112) test char::methods::bench_to_digit_radix_36 ... bench: 14,206 ns/iter (+/- 1,312) # Run 3 test char::methods::bench_to_digit_radix_10 ... bench: 16,221 ns/iter (+/- 2,423) test char::methods::bench_to_digit_radix_16 ... bench: 14,361 ns/iter (+/- 3,926) test char::methods::bench_to_digit_radix_2 ... bench: 13,097 ns/iter (+/- 671) test char::methods::bench_to_digit_radix_36 ... bench: 14,388 ns/iter (+/- 1,068) ``` ### After ``` # Run 1 test char::methods::bench_to_digit_radix_10 ... bench: 11,521 ns/iter (+/- 552) test char::methods::bench_to_digit_radix_16 ... bench: 12,926 ns/iter (+/- 684) test char::methods::bench_to_digit_radix_2 ... bench: 11,266 ns/iter (+/- 1,085) test char::methods::bench_to_digit_radix_36 ... bench: 14,213 ns/iter (+/- 614) # Run 2 test char::methods::bench_to_digit_radix_10 ... bench: 11,424 ns/iter (+/- 1,042) test char::methods::bench_to_digit_radix_16 ... bench: 12,854 ns/iter (+/- 1,193) test char::methods::bench_to_digit_radix_2 ... bench: 11,193 ns/iter (+/- 716) test char::methods::bench_to_digit_radix_36 ... bench: 14,249 ns/iter (+/- 3,514) # Run 3 test char::methods::bench_to_digit_radix_10 ... bench: 11,469 ns/iter (+/- 685) test char::methods::bench_to_digit_radix_16 ... bench: 12,852 ns/iter (+/- 568) test char::methods::bench_to_digit_radix_2 ... bench: 11,275 ns/iter (+/- 1,356) test char::methods::bench_to_digit_radix_36 ... bench: 14,188 ns/iter (+/- 1,501) ``` I ran the benchmark using: ```sh python x.py bench src/libcore --stage 1 --keep-stage 0 --test-args "bench_to_digit" ```
Rollup of 17 pull requests Successful merges: - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55750 (Make `NodeId` and `HirLocalId` `newtype_index`) - #55778 (Wrap some query results in `Lrc`.) - #55781 (More precise spans for temps and their drops) - #55785 (Add mem::forget_unsized() for forgetting unsized values) - #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint) - #55865 (Unix RwLock: avoid racy access to write_locked) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55956 (add tests for some fixed ICEs) Failed merges: r? @ghost
I noticed that
char::to_digit()
seemed to do a bit of extra work for handling[a-zA-Z]
characters. Sinceto_digit(10)
seems to be the most common case (at least in therust
codebase) I thought it might be valuable to create a fast path for that case, and according to the benchmarks that I added in one of the commits it seems to pay off. I also created another fast path for theradix < 10
case, which also seems to have a positive effect.It is very well possible that I'm measuring something entirely unrelated though, so please verify these numbers and let me know if I missed something!
Before
After
I ran the benchmark using:
python x.py bench src/libcore --stage 1 --keep-stage 0 --test-args "bench_to_digit"