Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink TyKind::FnPtr. #128812

Merged
merged 7 commits into from
Aug 14, 2024
Merged

Shrink TyKind::FnPtr. #128812

merged 7 commits into from
Aug 14, 2024

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Aug 8, 2024

By splitting the FnSig within TyKind::FnPtr into FnSigTys and FnHeader, which can be packed more efficiently. This reduces the size of the hot TyKind type from 32 bytes to 24 bytes on 64-bit platforms. This reduces peak memory usage by a few percent on some benchmarks. It also reduces cache misses and page faults similarly, though this doesn't translate to clear cycles or wall-time improvements on CI.

r? @compiler-errors

@rustbot rustbot added PG-exploit-mitigations Project group: Exploit mitigations S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver) labels Aug 8, 2024
@rustbot
Copy link
Collaborator

rustbot commented Aug 8, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred in exhaustiveness checking

cc @Nadrieril

changes to the core type system

cc @compiler-errors, @lcnr

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred in compiler/rustc_sanitizers

cc @rust-lang/project-exploit-mitigations, @rcvalle

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@nnethercote nnethercote marked this pull request as draft August 8, 2024 07:20
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 8, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 8, 2024
…try>

Shrink `TyKind::FnPtr`.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Aug 8, 2024

⌛ Trying commit f8ed8f1 with merge 5cf59d6...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Aug 8, 2024

☀️ Try build successful - checks-actions
Build commit: 5cf59d6 (5cf59d66c11c39fb631bbe241ebfe4cf84296da7)

@rust-timer

This comment has been minimized.

@compiler-errors
Copy link
Member

(premature note given that this is still in testing phase, and you already are likely planning on it, but if i don't note it now i will certainly forget: given that this touches rustc type ir, make sure that either lcnr or i get to review this when it's ready)

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5cf59d6): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.5% [-0.5%, -0.5%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -2.1%, secondary -0.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.9% [6.9%, 6.9%] 1
Regressions ❌
(secondary)
4.1% [1.9%, 5.6%] 5
Improvements ✅
(primary)
-3.4% [-5.9%, -0.5%] 7
Improvements ✅
(secondary)
-2.4% [-3.7%, -1.1%] 11
All ❌✅ (primary) -2.1% [-5.9%, 6.9%] 8

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 762.874s -> 762.449s (-0.06%)
Artifact size: 337.10 MiB -> 337.11 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 8, 2024
@nnethercote nnethercote force-pushed the shrink-TyKind-FnPtr branch 2 times, most recently from 790b78a to 3c7065d Compare August 9, 2024 00:57
@nnethercote
Copy link
Contributor Author

r? @compiler-errors

This is a peak memory use win in exchange for strictly worse ergonomics when dealing with TyKind::FnPtr. I think it's probably worth it: memory use wins are hard to find, TyKind is one of the most often instantiated types in the entire compiler, it has 21 variants and TyKind::FnPtr is the biggest one by eight bytes, and it's bigger only because the data it contains has too much padding. With my memory profiling hat on, it's screaming out to be optimized. If you check "Show non-relevant results" in the CI perf results you see it's a 0.4% peak memory reduction when averaged across every single benchmark, which might sound small but is actually pretty big. For several benchmarks the reduction is above 2%. Cache misses and page faults have similar numbers.

Here's the type size data:

print-type-size type: `rustc_middle::ty::ty_kind::TyKind<rustc_middle::ty::TyCtxt<'_>>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 1 bytes
print-type-size     variant `FnPtr`: 31 bytes
print-type-size         padding: 7 bytes
print-type-size         field `.0`: 24 bytes, alignment: 8 bytes

print-type-size type: `rustc_middle::infer::canonical::rustc_type_ir::Binder<rustc_middle::ty::TyCtxt<'_>, rustc_middle::ty::ty_kind::FnSig<rustc_middle::ty::TyCtxt<'_>>>`: 24 bytes, alignment: 8 bytes
print-type-size     field `.bound_vars`: 8 bytes
print-type-size     field `.value`: 16 bytes 

print-type-size type: `rustc_middle::ty::ty_kind::FnSig<rustc_middle::ty::TyCtxt<'_>>`: 16 bytes, alignment: 8 bytes
print-type-size     field `.inputs_and_output`: 8 bytes
print-type-size     field `.abi`: 2 bytes
print-type-size     field `.c_variadic`: 1 bytes
print-type-size     field `.safety`: 1 bytes
print-type-size     end padding: 4 bytes

The post-discriminant padding in TyKind is 7 bytes. When the FnSig in broken into two pieces, the compiler moves the four bytes abi/c_variadic/safety data into that 7 bytes.

I previously tried a different tack, interning FnSig and then putting a pointer to the interned value in TyKind::FnPtr, but the rustc_type_ir/rustc_middle split made it much uglier and more invasive than this approach.

@nnethercote nnethercote marked this pull request as ready for review August 9, 2024 03:41
@rustbot
Copy link
Collaborator

rustbot commented Aug 9, 2024

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

compiler/rustc_type_ir/src/ty_kind.rs Outdated Show resolved Hide resolved
compiler/rustc_type_ir/src/ty_kind.rs Outdated Show resolved Hide resolved
compiler/rustc_type_ir/src/ty_kind.rs Outdated Show resolved Hide resolved
compiler/rustc_type_ir/src/fast_reject.rs Outdated Show resolved Hide resolved
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 9, 2024
@compiler-errors
Copy link
Member

I'm generally of the opinion that the impact to the usability of ty::FnPtr is low enough, if you think from a performance perspective that this perf win is worthwhile. 0.4~2% peak memory reduction does seem pretty small to me lol, but I also don't know what those numbers mean in production.

I'm not certain if you're willing to waste time applying my naming nits now given that there's a chance that someone @rust-lang/types might have a differing opinion, given the nature of the change being more invasive than other "change the level of derefs that need to be applied to the type" type changes like interning data. I'd be happy to hold off on unnecessary churn until we can get anyone who wants to speak their mind a chance to do so.

@compiler-errors
Copy link
Member

Would be worthwhile also to fill out the PR description summarizing the motivation and effect of this change, for anyone who sees this PR tomorrow.

I think it's a little clearer and nicer that way.
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and
`FnHeader`, which can be packed more efficiently. This reduces the size
of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms.
This reduces peak memory usage by a few percent on some benchmarks. It
also reduces cache misses and page faults similarly, though this doesn't
translate to clear cycles or wall-time improvements on CI.
@nnethercote
Copy link
Contributor Author

I updated the names. It wasn't hard and your suggestions were a clear improvement on the original version.

@lcnr
Copy link
Contributor

lcnr commented Aug 9, 2024

I think this change is acceptable to decrease the type size by 8 bytes 👍

Copy link
Member

@compiler-errors compiler-errors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK one final round of tweaks

cx.fn_ptr_backend_type(cx.fn_abi_of_fn_ptr(sig, ty::List::empty()))
}
ty::FnPtr(ins_out, csa) => cx
.fn_ptr_backend_type(cx.fn_abi_of_fn_ptr(ins_out.with(csa), ty::List::empty())),
Copy link
Member

@compiler-errors compiler-errors Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.fn_ptr_backend_type(cx.fn_abi_of_fn_ptr(ins_out.with(csa), ty::List::empty())),
.fn_ptr_backend_type(cx.fn_abi_of_fn_ptr(ins_out.with(csa), ty::List::empty())),

One last bikeshed: I think that it's probably better if we expose the with function on the header -- i.e. hdr.with(sig_tys). What do you think?

Copy link
Contributor Author

@nnethercote nnethercote Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much difference either way. There's no obvious natural order. Having said that, this order (sig_tys first, then hdr) matches the field order I used in the TyKind::FnPtr variant, which matches the field order in FnSig. So I'm inclined to leave it as is.

compiler/rustc_middle/src/ty/flags.rs Outdated Show resolved Hide resolved
compiler/rustc_middle/src/ty/layout.rs Outdated Show resolved Hide resolved
compiler/rustc_middle/src/ty/walk.rs Outdated Show resolved Hide resolved
@nnethercote
Copy link
Contributor Author

There are four new commits. I made the changes you suggested, except for the with case, for the reasons explained above.

Copy link
Member

@compiler-errors compiler-errors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more change that I don't think got applied

@compiler-errors
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Aug 14, 2024

📌 Commit bbd1c3a has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 14, 2024
@bors
Copy link
Contributor

bors commented Aug 14, 2024

⌛ Testing commit bbd1c3a with merge e9c965d...

@bors
Copy link
Contributor

bors commented Aug 14, 2024

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing e9c965d to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 14, 2024
@bors bors merged commit e9c965d into rust-lang:master Aug 14, 2024
7 checks passed
@rustbot rustbot added this to the 1.82.0 milestone Aug 14, 2024
@bors bors mentioned this pull request Aug 14, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e9c965d): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.6%, -0.1%] 3
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -2.7%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
4.1% [2.4%, 5.3%] 4
Improvements ✅
(primary)
-3.6% [-6.1%, -1.0%] 4
Improvements ✅
(secondary)
-2.5% [-3.3%, -1.4%] 6
All ❌✅ (primary) -2.7% [-6.1%, 1.0%] 5

Cycles

Results (secondary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 752.981s -> 754.011s (0.14%)
Artifact size: 341.50 MiB -> 341.48 MiB (-0.01%)

Jarcho pushed a commit to Jarcho/rust that referenced this pull request Aug 24, 2024
…ompiler-errors

Shrink `TyKind::FnPtr`.

By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and `FnHeader`, which can be packed more efficiently. This reduces the size of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms. This reduces peak memory usage by a few percent on some benchmarks. It also reduces cache misses and page faults similarly, though this doesn't translate to clear cycles or wall-time improvements on CI.

r? `@compiler-errors`
github-merge-queue bot pushed a commit to model-checking/kani that referenced this pull request Aug 28, 2024
Upgrades toolchain to 08/28

Culprit upstream changes:
1. rust-lang/rust#128812
2. rust-lang/rust#128703
3. rust-lang/rust#127679
4. rust-lang/rust-clippy#12993
5. rust-lang/cargo#14370
6. rust-lang/rust#128806

Resolves #3429
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Sep 22, 2024
…ompiler-errors

Shrink `TyKind::FnPtr`.

By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and `FnHeader`, which can be packed more efficiently. This reduces the size of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms. This reduces peak memory usage by a few percent on some benchmarks. It also reduces cache misses and page faults similarly, though this doesn't translate to clear cycles or wall-time improvements on CI.

r? `@compiler-errors`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. PG-exploit-mitigations Project group: Exploit mitigations S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants