Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A mish-mash of micro-optimizations #113116

Merged
merged 9 commits into from
Jun 30, 2023
Merged

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Jun 28, 2023

These were aimed at speeding up LLVM codegen, but ended up affecting other places as well.

r? @bjorn3

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 28, 2023
@rustbot
Copy link
Collaborator

rustbot commented Jun 28, 2023

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@nnethercote nnethercote marked this pull request as draft June 28, 2023 07:30
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@bors
Copy link
Contributor

bors commented Jun 28, 2023

⌛ Trying commit 55e83afbd09a0533be2eb0be351787bb33cde518 with merge aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513...

@bors
Copy link
Contributor

bors commented Jun 28, 2023

☀️ Try build successful - checks-actions
Build commit: aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513 (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513)

1 similar comment
@bors
Copy link
Contributor

bors commented Jun 28, 2023

☀️ Try build successful - checks-actions
Build commit: aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513 (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513)

@oli-obk
Copy link
Contributor

oli-obk commented Jun 28, 2023

@rust-timer build aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (aa657f7fbe68ce69f72c77fb7c6b8d0c12aa4513): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.9% [-1.8%, -0.4%] 11
Improvements ✅
(secondary)
-1.4% [-2.0%, -0.9%] 19
All ❌✅ (primary) -0.9% [-1.8%, -0.4%] 11

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.2% [2.2%, 5.0%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) 3.2% [2.2%, 5.0%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.9% [1.9%, 1.9%] 1
Improvements ✅
(primary)
-1.9% [-2.9%, -1.4%] 5
Improvements ✅
(secondary)
-5.6% [-8.8%, -2.8%] 11
All ❌✅ (primary) -1.9% [-2.9%, -1.4%] 5

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 662.942s -> 662.822s (-0.02%)

`lookup_debug_loc` calls `SourceMap::lookup_line`, which does a binary
search over the files, and then a binary search over the lines within
the found file. It then calls `SourceFile::line_begin_pos`, which redoes
the binary search over the lines within the found file.

This commit removes the second binary search over the lines, instead
getting the line starting pos directly using the result of the first
binary search over the lines.

(And likewise for `get_span_loc`, in the cranelift backend.)
`lookup_debug_loc` finds a file, line, and column, which requires two
binary searches. But this call site only needs the file.

This commit replaces the call with `lookup_source_file`, which does a
single binary search.
This makes it (a) a little simpler, and (b) more similar to
`SourceFile::lookup_line`.
I don't know why `SmallStr` was used here; some ad hoc profiling showed
this code is not that hot, the string is usually empty, and when it's
not empty it's usually very short. However, the use of a
`SmallStr<1024>` does result in 1024 byte `memcpy` call on each
execution, which shows up when I do `memcpy` profiling. So using a
normal string makes the code both simpler and very slightly faster.
It no longer has any uses. If it's needed in the future, it can be
easily reinstated. Or a crate such as `smallstr` can be used, much like
we use `smallvec`.
Other callsites already do this, but these two were missed. This avoids
some allocations.
They never have a length of more than two. So this commit changes them
to `SmallVec<[_; 2]>`.

Also, we possibly push `None` values and then filter those `None` values
out again with `retain`. So this commit removes the `retain` and instead
only pushes the values if they are `Some(_)`.
After the last commit, they contain `Option<&OperandBundleDef<'a>>` but
the values are always `Some(_)`. This commit removes the needless
`Option` wrapper. This also simplifies the type signatures of
`LLVMRustBuild{Invoke,Call}`, which were relying on the fact that the
represention of `Option<&T>` is the same as `&T` for non-`None` values.
`DerefChecker` can just hold a reference instead. This avoids quite a
lot of allocations for some benchmarks.
@nnethercote
Copy link
Contributor Author

The walltime/cycles improvements of up to 8% for tt-muncher look spurious; I think it has just become noisy on those metrics. Everything else should be real, though. I think the last commit ("Avoid cloning LocalDecls) is the most effective change.

@nnethercote nnethercote marked this pull request as ready for review June 29, 2023 02:26
@nnethercote nnethercote changed the title Codegen optimizations A mish-mash of micro-optimizations Jun 29, 2023
@oli-obk
Copy link
Contributor

oli-obk commented Jun 29, 2023

r? @oli-obk

@bors r+

@bors
Copy link
Contributor

bors commented Jun 29, 2023

📌 Commit 7e786e8 has been approved by oli-obk

It is now in the queue for this repository.

@rustbot rustbot assigned oli-obk and unassigned bjorn3 Jun 29, 2023
@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 29, 2023
@bors
Copy link
Contributor

bors commented Jun 30, 2023

⌛ Testing commit 7e786e8 with merge 8aed93d...

@bors
Copy link
Contributor

bors commented Jun 30, 2023

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 8aed93d to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 30, 2023
@bors bors merged commit 8aed93d into rust-lang:master Jun 30, 2023
@rustbot rustbot added this to the 1.72.0 milestone Jun 30, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8aed93d): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-1.7%, -0.3%] 16
Improvements ✅
(secondary)
-1.4% [-2.3%, -0.8%] 15
All ❌✅ (primary) -0.8% [-1.7%, -0.3%] 16

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.4%, 1.0%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.0% [-2.0%, -2.0%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-2.0%, 1.0%] 5

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 662.146s -> 661.088s (-0.16%)

@nnethercote nnethercote deleted the codegen-opts branch July 3, 2023 06:22
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Jul 22, 2023
A mish-mash of micro-optimizations

These were aimed at speeding up LLVM codegen, but ended up affecting other places as well.

r? `@bjorn3`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants