-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the niche optimization if other variant are small enough #70477
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
r? @eddyb |
src/librustc/ty/layout.rs
Outdated
// Rotate index array to put the largest niche first. | ||
// Since it is already the first amongst the types with the same alignement, | ||
// this will just move some of the potential padding within the structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh this is pretty clever.
This seems to address #46213, or at least partially? Something I've been meaning to do since last year is to solve #63866 by first computing the unoptimized That way we don't have to come up with clever conditions for when we should use a niche layout, as we will simply not use it if it's not helpful in size or in terms of its largest niche. Something else to keep in mind is that a niche layout has more complex codegen than a tagged one, so it might optimize worse, even if they are otherwise equal on size and largest niche. @ogoffart So would you be interested in changing the code to work like that (probably in a separate PR, that we can test on its own for perf regressions), i.e. only use an optimized (niche) layout if it's a benefit over the unoptimized (tagged) one? I think it might allow this PR to be simpler, or at the very least, reduce the risk of accidentally increasing some types' sizes, to 0. |
src/librustc/ty/layout.rs
Outdated
dataful_variant = None; | ||
break 'variants; | ||
if !max_size.is_aligned(align.abi) { | ||
// Don't perform niche optimisation if there is padding anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this fixes #63866. If it detects that the variant would contains padding, the padding will be used for the discriminant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but it seems harder to reason about than having two candidates for a layout and picking the best one out of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking computing the second candidate would be a waste of time.
right, this fixes #63866 and #46213, I did not look for these issue before.
I am not sure I understand this. I believe the current implementation will only use the Niche if it reduces the size of the enum. Anyway, I think there is more to do in the codegen because many test are failling. They passes in stage 1 (which i was running locally) but fail in stage 2 (on the CI) valgrind reports some use of uninitialized value. I need to investigate. |
I adjusted the ABI to fix the miscompilation issue, and added a test for issue #63866 |
☔ The latest upstream changes (presumably #70536) made this pull request unmergeable. Please resolve the merge conflicts. |
11f27ef
to
75f667c
Compare
☔ The latest upstream changes (presumably #70672) made this pull request unmergeable. Please resolve the merge conflicts. |
75f667c
to
e2b268f
Compare
Perhaps, but there's no formal proof of this and it's impossible to exhausively test, so we'd have to rely on convincing ourselves, and that's the main reason I kept putting this off: the conditions were too complex and hard to reason about, and they were also very easy to invalidate by other changes. I want to land this, but I'm not comfortable with it unless one of these two things happen:
We can measure this, it's not uncommon in rustc to check all possibilities. |
With this patch (when Maybe i should add a debug_assert that this is the case? |
(I added an assert that there was no padding in the layout, and so that the niche must be smaller than not using a niche) |
Ah so the logic here is that you're being conservative about niches and only using them when it would be impossible to add a tag without increasing the size of the type? I think this makes sense in the common case of byte tags, but what about this example? Am I right that this PR makes both |
Yes, that was right. I fixed that now. Now I understand better what you mean by "impossible to exhausively test". But I think it is fine to leave some corner case such as enum with huge amount of variants. For example, an Option<(NicheU16, u8)> would still not use the niche so its the bigger niche would "only" have 254 entries available. |
|
src/librustc_middle/ty/layout.rs
Outdated
if padding_bits >= 32 || count >> padding_bits == 0 { | ||
// Don't perform niche optimisation if there is enough padding bits for the discrimiment | ||
dataful_variant = None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a more correct version of the "only use niche if it's physically impossible to fit a tag anywhere", right?
It's still conservative in that it uses the tag layout unless it knows for sure that the niche layout would be beneficial, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's right.
I changed it again to be sure than the padding is actually bigger than the biggest niche so we'd still use the niche in Option2<NicheU16, u8>
(which i meant instead of Option<(NicheU16, u8)>
from my previous comment which is its own type as you said)
@ogoffart Okay, can you try this example? I think this is the first example that actually tackles the differences in alignment sorting between This is the kind of thing that I would have a hard time reasoning about in my head when I was looking into this, and why I believe it to be much saner to just try both layouts and pick the best. |
Same as before the patch. We use the niche because 32 bit is bigger than the 16 bits padding. Note that |
If you want to keep things simple to reason about, we could limit niche optimizations to situations in which the tag would be a single byte, because we know the fields are sorted ascending in that case, and so any padding would "absorb" the tag, so it's easier to reason about than the general case. It seems like a practical compromise to say that we don't optimize enums with too many variants (specifically >= 256), although I worry it might result in downstream (e.g. codegen or miri) hidden assumptions that will cause bugs if we ever generalize it again. cc @rust-lang/compiler we might want to discuss this more broadly. |
Why "32-bit"? Are you saying it's because the niche has a lot more invalid values? |
@eddyb can you summarize a bit what you'd like input on? There's a lot of backscroll here =) It sounds like it would make things easier to limit niche optimizations to cases with <256 variants, and you're asking if that's ok? We don't currently make many guarantees about the niche optimization that I'm aware of, but the unsafe-code-optimizations draft did require that "Option-like" enums would be optimized, but that's it. Option-like means exactly 2 variants. |
Note that the pr 71045 only compute the two layout for enum with one single data varient, While this compute them also for enum with several data variant. Also it first need to perform the estimate. |
There are some performance regressions. The difference could be attributed to:
If the reason are step 1 or 2, it can be mitigated or live with. |
Could also be
trying some benchmark on the generated code would give us more information about which of these hypotheses is right. |
So the results on that separate PR are in, and they're neutral: #71045 (comment).
|
Since i do early return, the intern_layout is only done at the very end (on the actual return line, for both case). What could makes a difference is that i compute the niche layout for more enums, while your test pr only computes it for the enum for which we were already doing niche optimization before (the ones with a single data variant) |
Reduce the size of enums which have one bigger variant with a niche
Since we do not care much about the case where we can use the niche, but that the fields cannot be re-ordered
Instead of using estimation euristic to find out if we should do one or the other layout, always perform the non-niche layout and, when possible, the niche layout, and compare the resulting layout. In my opinion, this is less efficient, but was requested in the review because it is hard to proof that the estimation is correct in every case.
2b5f9a7
to
0d43af1
Compare
I did some micro benchmark to understand the difference between the niche and the tag. The code to get the discriminant from a bool niche has this prologue: mov (%rdi),%al
add $0xfe,%al
movzbl %al,%ecx
lea 0x1(%rcx),%rdx
xor %eax,%eax
cmp $0x2,%cl
cmovb %rdx,%rax compared to simply But the cmovb and the "normalization" of the discriminant is useless. This could be something that LLVM could be improved on. Maybe the best would be to not generate a "get discriminant" when doing a match. This would also give simpler code to LLVM to optimize with less instructions (which may also explain some of the slowdown. But the result does not seem to be worse for optimized build than for check). |
This does not yet realize the space savings it promises in the test case due to missing optimizations[1][2] in Rust, but should eventually manage to be barely larger than the payload arrays. [1]: rust-lang/rust#46213 [2]: rust-lang/rust#70477
The benchmarks are compile times, not runtime. So any slowdowns they detect are the compiler doing more work, not necessarily any codegen'd code being inefficient. (I'm really sorry I haven't come back to this already, the amount of different things happening in Rust got really overwhelming over the past few weeks)
We used to have a variant-based |
I beg to differ. I believe it is both. AFAIK, the compiler is compiled with itself, so if the change include some big regression (or improvement) in code generation, it would therefore also be slower (or faster) to compile.
Yes, having a SwitchInt(Discriminant(...)) would allow the codegen to be smarter: currently, this is generated (pseudo code) let discriminent = if (niche in [begin...end]) { niche - offset } else { data_variant };
match(discriminent) {
data_variant => ...,
other_variant1 => ...
other_variant2 => ...
} And llvm is not good enough to remove the if or the What could be generated is directly match(niche) {
other_variant1+offset => ...
other_variant2+offset => ...
_ => ... // (for the data_variant)
// begin..end => ... // alternative if the switch is not exhaustive
} (where other_variantx+offset is a constant) Then the switch looks almost exactly the same as for the tag variant, (and maybe would also be easier or LLVM to process) Anyway, my free time period is over and i won't have time to continue on this patch. |
Reduce the size of enums which have one bigger variant with a niche
I guess this should maybe be behind a compiler flag at first?
cc @eddyb