Performance regression with niche optimization #101872

mikebenfield · 2022-09-15T21:17:00Z

After PR #94075 for more niche optimizations was merged, there were some performance regressions.

Some performance regressions are possibly just due to the extra arithmetic and branch or cmov required when getting a discriminant out of a tag. But notably when compiling syn, the regression is largely due to extra time in LLVM_lto_optimize.

It would be nice to understand this better, and ideally do something about it.

The text was updated successfully, but these errors were encountered:

Rageking8 · 2022-09-16T05:58:27Z

@rustbot label +T-compiler +regression-untriaged

nnethercote · 2022-09-20T04:27:11Z

#102035 is a related example, where increased use of niche-filling causes some small but widespread instruction count regressions.

apiraino · 2022-09-20T16:17:10Z

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

mikebenfield · 2022-10-07T22:53:20Z

A little more data:

I did some local testing using the "Self profile" mode of the perf tool, compiling syn for the Opt profile and scenario IncrPatched (the same profile at the link above). I compared a recent master to a modifed master turning off niche optimizations when more than one variant has data.

The results I get locally make a lot more sense to me. The linked data shows the LLVM_lto_optimize step taking 42% of total time, which seems odd. Locally it takes only around 0.1% of total time. Instead, a lot of time is taken in various particular LLVM passes, which also seems to be where all/most of the increase comes in with more niche optimizations. Notably InstCombinePass.

mikebenfield · 2022-10-10T08:30:12Z

In addition to #102872, I have in mind two other optimizations for switching on a discriminant:

First, rather than this llvmir-like pseudocode:

discr =  select is_untagged, untagged_discr, tag
select disc [
    ....
]

we could instead do this:

if is_untagged { jump straight to untagged basic block }
select tag [
   ...
]

Replacing a conditional move with a jump. I am fairly certain this would be a win, as it would allow to skip the tag calculations whenever we have the untagged variant, it would let us remove the cmov, which has data dependencies on a lot of the other instructions, and the new jump is at least as predictable as the existing one.

I don't think it's possible to make this happen just by modifying the current codegen_get_discr. I think there are 3 possibilities:

introduce a MIR pass to do this (but it would need to happen after monomorphization)
introduce a new TerminatorKind::SwitchDiscr, and then make this happen during codegen
do it in LLVM during some pass

The second optimization I have in mind would need to happen in LLVM.

mikebenfield · 2022-10-12T20:34:44Z

I filed this LLVM issue, which if addressed would lead to better code in niche match statements in many cases.

In some cases we can avoid arithmetic before checking whether a niche represents an untagged variant. This is relevant to rust-lang#101872

rustc_codegen_ssa: Better code generation for niche discriminants. In some cases we can avoid arithmetic before checking whether a niche is a tag. Also rename some identifiers around niches. This is relevant to rust-lang#101872

apiraino · 2023-01-11T12:35:54Z

@mikebenfield did #102872 solved this issue? Checking progress to unblock #102035. Thanks!

mikebenfield · 2023-01-11T15:46:01Z

@apiraino Honestly I don’t know. It definitely improved some things, but it’s not clear to be which benchmarks are particularly relevant, so I’m not sure whether we can say this issue is solved.

rustbot added regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 16, 2022

inquisitivecrystal added I-slow Issue: Problems and improvements with respect to performance of generated code. WG-compiler-performance Working group: Compiler Performance labels Sep 20, 2022

rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 20, 2022

cjgillot mentioned this issue Sep 25, 2022

Tell rustc about unused bits in Span. #102035

Closed

mikebenfield mentioned this issue Oct 10, 2022

rustc_codegen_ssa: Better code generation for niche discriminants. #102872

Merged

krdln mentioned this issue Oct 25, 2022

Simplify codegen for niche-encoded enums in simple cases #102901

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression with niche optimization #101872

Performance regression with niche optimization #101872

mikebenfield commented Sep 15, 2022

Rageking8 commented Sep 16, 2022

nnethercote commented Sep 20, 2022

apiraino commented Sep 20, 2022

mikebenfield commented Oct 7, 2022

mikebenfield commented Oct 10, 2022

mikebenfield commented Oct 12, 2022

apiraino commented Jan 11, 2023

mikebenfield commented Jan 11, 2023

Performance regression with niche optimization #101872

Performance regression with niche optimization #101872

Comments

mikebenfield commented Sep 15, 2022

Rageking8 commented Sep 16, 2022

nnethercote commented Sep 20, 2022

apiraino commented Sep 20, 2022

mikebenfield commented Oct 7, 2022

mikebenfield commented Oct 10, 2022

mikebenfield commented Oct 12, 2022

apiraino commented Jan 11, 2023

mikebenfield commented Jan 11, 2023