Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression with niche optimization #101872

Open
mikebenfield opened this issue Sep 15, 2022 · 8 comments
Open

Performance regression with niche optimization #101872

mikebenfield opened this issue Sep 15, 2022 · 8 comments
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@mikebenfield
Copy link
Contributor

After PR #94075 for more niche optimizations was merged, there were some performance regressions.

Some performance regressions are possibly just due to the extra arithmetic and branch or cmov required when getting a discriminant out of a tag. But notably when compiling syn, the regression is largely due to extra time in LLVM_lto_optimize.

It would be nice to understand this better, and ideally do something about it.

@Rageking8
Copy link
Contributor

@rustbot label +T-compiler +regression-untriaged

@rustbot rustbot added regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 16, 2022
@nnethercote
Copy link
Contributor

#102035 is a related example, where increased use of niche-filling causes some small but widespread instruction count regressions.

@inquisitivecrystal inquisitivecrystal added I-slow Issue: Problems and improvements with respect to performance of generated code. WG-compiler-performance Working group: Compiler Performance labels Sep 20, 2022
@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 20, 2022
@inquisitivecrystal inquisitivecrystal added WG-compiler-performance Working group: Compiler Performance I-compiletime Issue: Problems and improvements with respect to compile times. and removed WG-compiler-performance Working group: Compiler Performance I-slow Issue: Problems and improvements with respect to performance of generated code. labels Sep 23, 2022
@mikebenfield
Copy link
Contributor Author

A little more data:

I did some local testing using the "Self profile" mode of the perf tool, compiling syn for the Opt profile and scenario IncrPatched (the same profile at the link above). I compared a recent master to a modifed master turning off niche optimizations when more than one variant has data.

The results I get locally make a lot more sense to me. The linked data shows the LLVM_lto_optimize step taking 42% of total time, which seems odd. Locally it takes only around 0.1% of total time. Instead, a lot of time is taken in various particular LLVM passes, which also seems to be where all/most of the increase comes in with more niche optimizations. Notably InstCombinePass.

@mikebenfield
Copy link
Contributor Author

In addition to #102872, I have in mind two other optimizations for switching on a discriminant:

First, rather than this llvmir-like pseudocode:

discr =  select is_untagged, untagged_discr, tag
select disc [
    ....
]

we could instead do this:

if is_untagged { jump straight to untagged basic block }
select tag [
   ...
]

Replacing a conditional move with a jump. I am fairly certain this would be a win, as it would allow to skip the tag calculations whenever we have the untagged variant, it would let us remove the cmov, which has data dependencies on a lot of the other instructions, and the new jump is at least as predictable as the existing one.

I don't think it's possible to make this happen just by modifying the current codegen_get_discr. I think there are 3 possibilities:

  1. introduce a MIR pass to do this (but it would need to happen after monomorphization)
  2. introduce a new TerminatorKind::SwitchDiscr, and then make this happen during codegen
  3. do it in LLVM during some pass

The second optimization I have in mind would need to happen in LLVM.

@mikebenfield
Copy link
Contributor Author

I filed this LLVM issue, which if addressed would lead to better code in niche match statements in many cases.

mikebenfield pushed a commit to mikebenfield/rust that referenced this issue Nov 11, 2022
In some cases we can avoid arithmetic before checking whether a niche
represents an untagged variant.

This is relevant to rust-lang#101872
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 11, 2022
rustc_codegen_ssa: Better code generation for niche discriminants.

In some cases we can avoid arithmetic before checking whether a niche is a tag.

Also rename some identifiers around niches.

This is relevant to rust-lang#101872
@apiraino
Copy link
Contributor

@mikebenfield did #102872 solved this issue? Checking progress to unblock #102035. Thanks!

@mikebenfield
Copy link
Contributor Author

@apiraino Honestly I don’t know. It definitely improved some things, but it’s not clear to be which benchmarks are particularly relevant, so I’m not sure whether we can say this issue is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests

6 participants