-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A mish-mash of micro-optimizations #113116
Conversation
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
⌛ Trying commit 55e83afbd09a0533be2eb0be351787bb33cde518 with merge aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513... |
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
@rust-timer build aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513 |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 662.942s -> 662.822s (-0.02%) |
`lookup_debug_loc` calls `SourceMap::lookup_line`, which does a binary search over the files, and then a binary search over the lines within the found file. It then calls `SourceFile::line_begin_pos`, which redoes the binary search over the lines within the found file. This commit removes the second binary search over the lines, instead getting the line starting pos directly using the result of the first binary search over the lines. (And likewise for `get_span_loc`, in the cranelift backend.)
`lookup_debug_loc` finds a file, line, and column, which requires two binary searches. But this call site only needs the file. This commit replaces the call with `lookup_source_file`, which does a single binary search.
This makes it (a) a little simpler, and (b) more similar to `SourceFile::lookup_line`.
I don't know why `SmallStr` was used here; some ad hoc profiling showed this code is not that hot, the string is usually empty, and when it's not empty it's usually very short. However, the use of a `SmallStr<1024>` does result in 1024 byte `memcpy` call on each execution, which shows up when I do `memcpy` profiling. So using a normal string makes the code both simpler and very slightly faster.
It no longer has any uses. If it's needed in the future, it can be easily reinstated. Or a crate such as `smallstr` can be used, much like we use `smallvec`.
Other callsites already do this, but these two were missed. This avoids some allocations.
They never have a length of more than two. So this commit changes them to `SmallVec<[_; 2]>`. Also, we possibly push `None` values and then filter those `None` values out again with `retain`. So this commit removes the `retain` and instead only pushes the values if they are `Some(_)`.
After the last commit, they contain `Option<&OperandBundleDef<'a>>` but the values are always `Some(_)`. This commit removes the needless `Option` wrapper. This also simplifies the type signatures of `LLVMRustBuild{Invoke,Call}`, which were relying on the fact that the represention of `Option<&T>` is the same as `&T` for non-`None` values.
`DerefChecker` can just hold a reference instead. This avoids quite a lot of allocations for some benchmarks.
55e83af
to
7e786e8
Compare
The walltime/cycles improvements of up to 8% for |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8aed93d): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 662.146s -> 661.088s (-0.16%) |
A mish-mash of micro-optimizations These were aimed at speeding up LLVM codegen, but ended up affecting other places as well. r? `@bjorn3`
These were aimed at speeding up LLVM codegen, but ended up affecting other places as well.
r? @bjorn3