Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

ClementTsang · 2023-05-15T08:19:53Z

Apologies if this has already been reported.

Let's say I have some code that looks like this (this is a simplified version of some code a friend was writing):

pub fn fast(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            if (z.to_bits() >> 8) & mask == 0 {
                if z % 0.0625 < 1e-13 {
                    println!("{}", z % 0.0625);
                    ret += 1;
                }
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

I might be tempted to collapse the if-statement in the middle, since it shouldn't change anything - in fact, clippy will even recommend that I change it to this:

pub fn slow(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            if (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13 {
                println!("{}", z % 0.0625);
                ret += 1;
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

However, if I pit these two against each other using criterion, then when I run a bench (on 1.69.0):

➜ cargo bench 2> out.txt

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

slow                    time:   [7.5115 ms 7.5313 ms 7.5583 ms]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

fast                    time:   [577.02 µs 578.91 µs 581.29 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

For some reason, collapsing the if branch leads to a massive performance regression! This is surprising as well since from my testing, where I set z = 0, the if branch should never run. Putting the two bits of code on Godbolt seems to also show that there's a bit of a difference in terms of assembly generation (fast, slow).

Furthermore, from some testing, commenting out either the eprintln or the println on both would result in them having similar performance.

~~I can set up a repo with my exact setup if that will be helpful.~~ Repo with code and benchmark: https://github.com/ClementTsang/collapse_if_slowdown

The text was updated successfully, but these errors were encountered:

workingjubilee · 2023-05-15T08:47:05Z

The usual suspect for this codegen difference, the distinction between & and &&, does not seem to be the culprit, as the following branch still gets different codegen than the original:

            if ((z.to_bits() >> 8) & mask == 0) & (z % 0.0625 < 1e-13) {

azizghuloum · 2023-05-15T11:49:28Z

Looks like rustc compiles the slow version as if you've written

pub fn slower(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            let tmp = (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13;
            if tmp {
                println!("{}", z % 0.0625);
                ret += 1;
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

I compared the mir graphs to confirm and the benchmark numbers also confirm.

That extra variable incurs an additional conditional check at runtime.

azizghuloum · 2023-05-16T11:23:56Z

It seems that during THIR -> MIR lowering, there is a case for handling && inside the if, but the && is obscured by a Use expression that is not handled (and gets deoptimized to a temporary variable).

azizghuloum · 2023-05-17T06:11:12Z

@ClementTsang with the PR that I opened, the bench output is

slow                    time:   [706.99 µs 716.80 µs 727.74 µs]
                        change: [-11.173% -8.6408% -6.1600%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

fast                    time:   [700.31 µs 714.30 µs 732.70 µs]
                        change: [-5.7267% -2.7412% +0.0544%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe

which is what you'd expect.

ClementTsang · 2023-05-17T06:43:56Z

Nice!

…=cjgillot Lower `Or` pattern without allocating place cc `@azizghuloum` `@cjgillot` Related to rust-lang#111583 and rust-lang#111644 While reviewing rust-lang#111644, it occurs to me that while we directly lower conjunctive predicates, which are connected with `&&`, into the desirable control flow, today we don't directly lower the disjunctive predicates, which are connected with `||`, in the similar fashion. Instead, we allocate a place for the boolean temporary to hold the result of evaluating the `||` expression. Usually I would expect optimization at later stages to "inline" the evaluation of boolean predicates into simple CFG, but rust-lang#111583 is an example where `&&` is failing to be optimized away and the assembly shows that both the expensive operands are evaluated. Therefore, I would like to make a small change to make the CFG a bit more straight-forward without invoking the `as_temp` machinery, and plus avoid allocating the place to hold the boolean result as well.

workingjubilee added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 15, 2023

azizghuloum mentioned this issue May 16, 2023

optimizing Use expressions inside if condition #111644

Closed

azizghuloum mentioned this issue May 18, 2023

Shorten drop scope of temporaries in conditions #111725

Closed

dingxiangfei2009 mentioned this issue May 19, 2023

Lower Or pattern without allocating place #111752

Merged

azizghuloum added a commit to azizghuloum/rust that referenced this issue May 19, 2023

added test case from issue rust-lang#111583

a1847f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

ClementTsang commented May 15, 2023 •

edited

Loading

workingjubilee commented May 15, 2023

azizghuloum commented May 15, 2023

azizghuloum commented May 16, 2023 •

edited

Loading

azizghuloum commented May 17, 2023

ClementTsang commented May 17, 2023

Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

Comments

ClementTsang commented May 15, 2023 • edited Loading

workingjubilee commented May 15, 2023

azizghuloum commented May 15, 2023

azizghuloum commented May 16, 2023 • edited Loading

azizghuloum commented May 17, 2023

ClementTsang commented May 17, 2023

ClementTsang commented May 15, 2023 •

edited

Loading

azizghuloum commented May 16, 2023 •

edited

Loading