Simplify logical operations CFG #83663

AngelicosPhosphoros · 2021-03-29T21:27:13Z

This is basically same commit as e38e954 which was reverted later in 676953f
In both cases, this changes weren't benchmarked.
e38e954 leads to missed optimization described in this issue
676953f leads to missed optimization described in this issue

rust-highfive · 2021-03-29T21:27:16Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @estebank (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

AngelicosPhosphoros · 2021-03-29T21:31:06Z

I would like to benchmark this changes.

This code like to fix the issue with failed SIMD (#83623)

Code:

pub struct Blueprint {
    pub fuel_tank_size: u32,
    pub payload: u32,
    pub wheel_diameter: u32,
    pub wheel_width: u32,
    pub storage: u32,
}
impl PartialEq for Blueprint{
   fn eq(&self, other: &Self)->bool{
      (self.fuel_tank_size == other.fuel_tank_size)
           && (self.payload == other.payload)
           && (self.wheel_diameter == other.wheel_diameter)
           && (self.wheel_width == other.wheel_width)
           && (self.storage == other.storage)
   }
}

LLVM IR:

; ModuleID = 'eq_test.3a1fbbbh-cgu.0'
source_filename = "eq_test.3a1fbbbh-cgu.0"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"

%Blueprint = type { [0 x i32], i32, [0 x i32], i32, [0 x i32], i32, [0 x i32], i32, [0 x i32], i32, [0 x i32] }

; <eq_test::Blueprint as core::cmp::PartialEq>::eq
; Function Attrs: norecurse nounwind readonly uwtable willreturn
define zeroext i1 @"_ZN59_$LT$eq_test..Blueprint$u20$as$u20$core..cmp..PartialEq$GT$2eq17he4e85086691c6c20E"(%Blueprint* noalias nocapture readonly align 4 dereferenceable(20) %self, %Blueprint* noalias nocapture readonly align 4 dereferenceable(20) %other) unnamed_addr #0 {
start:
  %0 = bitcast %Blueprint* %self to <4 x i32>*
  %1 = load <4 x i32>, <4 x i32>* %0, align 4
  %2 = bitcast %Blueprint* %other to <4 x i32>*
  %3 = load <4 x i32>, <4 x i32>* %2, align 4
  %4 = icmp eq <4 x i32> %1, %3
  %5 = getelementptr inbounds %Blueprint, %Blueprint* %self, i64 0, i32 9
  %_19 = load i32, i32* %5, align 4
  %6 = getelementptr inbounds %Blueprint, %Blueprint* %other, i64 0, i32 9
  %_20 = load i32, i32* %6, align 4
  %_18 = icmp eq i32 %_19, %_20
  %7 = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> %4)
  %8 = and i1 %7, %_18
  ret i1 %8
}

; Function Attrs: nofree nosync nounwind readnone willreturn
declare i1 @llvm.vector.reduce.and.v4i1(<4 x i1>) #1

attributes #0 = { norecurse nounwind readonly uwtable willreturn "target-cpu"="znver1" }
attributes #1 = { nofree nosync nounwind readnone willreturn }

!llvm.module.flags = !{!0}

!0 = !{i32 7, !"PIC Level", i32 2}

And ASM:

_ZN59_$LT$eq_test..Blueprint$u20$as$u20$core..cmp..PartialEq$GT$2eq17he4e85086691c6c20E:
	vmovdqu	(%rcx), %xmm0
	movl	16(%rcx), %eax
	vpcmpeqd	(%rdx), %xmm0, %xmm0
	cmpl	16(%rdx), %eax
	vmovmskps	%xmm0, %eax
	sete	%cl
	cmpb	$15, %al
	sete	%al
	andb	%cl, %al
	retq

bjorn3 · 2021-03-29T21:41:16Z

@bors try @rust-timer queue

rust-timer · 2021-03-29T21:41:17Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-03-29T21:41:25Z

⌛ Trying commit c343862c31c77abdb223df0c6de9a09f17c1557c with merge 6b7dccdb4b4fbb645bdfc176bf01a0f0b9941bb5...

nagisa · 2021-03-29T21:52:03Z

Before this lands an addition of a codegen test will be necessary so that we don't regress this case again.

bors · 2021-03-29T22:30:00Z

☀️ Try build successful - checks-actions
Build commit: 6b7dccdb4b4fbb645bdfc176bf01a0f0b9941bb5 (6b7dccdb4b4fbb645bdfc176bf01a0f0b9941bb5)

rust-timer · 2021-03-29T22:30:02Z

Queued 6b7dccdb4b4fbb645bdfc176bf01a0f0b9941bb5 with parent 2917eda, future comparison URL.

rust-timer · 2021-03-30T03:02:40Z

Finished benchmarking try commit (6b7dccdb4b4fbb645bdfc176bf01a0f0b9941bb5): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

mati865 · 2021-03-30T07:16:58Z

Benchmark result shows slight improvement in compilation time for MIR and LLVM passes.
It's too close to the noise to determine runtime changes.

estebank · 2021-03-30T16:34:05Z

I don't feel comfortable reviewing this. @nagisa would you mind taking this?

nagisa · 2021-03-30T16:37:45Z

r? @nagisa

AngelicosPhosphoros · 2021-03-30T18:31:26Z

Benchmark result shows slight improvement in compilation time for MIR and LLVM passes.
It's too close to the noise to determine runtime changes.

Should we merge this and should I continue to work on this then?
@nagisa

AngelicosPhosphoros · 2021-03-30T18:45:24Z

Hacky code from eq2 case (#83623) and #[derive(PartialEq)] cannot be vectorized when comparing arrays of structure on current compiler.
godbolt link

Code from this PR handles vectorizing of slices of such struct fine:

code

#[derive(PartialEq)]
pub struct Blueprint {
    pub fuel_tank_size: u32,
    pub payload: u32,
    pub wheel_diameter: u32,
    pub wheel_width: u32,
    pub storage: u32,
}

#[no_mangle]
pub fn compare_two_arrays(a: &[Blueprint], b: &[Blueprint])->bool{
    a==b
}

compare_two_arrays:
	cmpq	%r9, %rdx
	jne	.LBB0_1
	incq	%rdx
	movl	$16, %r9d
	.p2align	4, 0x90
.LBB0_3:
	decq	%rdx
	sete	%al
	je	.LBB0_6
	vmovdqu	-16(%rcx,%r9), %xmm0
	vpcmpeqd	-16(%r8,%r9), %xmm0, %xmm0
	vmovmskps	%xmm0, %r10d
	cmpb	$15, %r10b
	jne	.LBB0_6
	movl	(%r8,%r9), %r10d
	leaq	20(%r9), %r11
	cmpl	%r10d, (%rcx,%r9)
	movq	%r11, %r9
	je	.LBB0_3
.LBB0_6:
	retq
.LBB0_1:
	xorl	%eax, %eax
	retq

nagisa · 2021-03-30T18:47:51Z

I think it is worthwhile to pursue this based on the assembly & IR that gets produced once this is applied. I would be surprised if the compiler exercised the kinds of code paths we're seeing here to produce good motivation for or against merging this. Microbenchmarks would be a significantly better tool in this situation, I suspect.

I'm somewhat surprised that #62993 was sufficient motivation to revert this in the first place, but from what I can tell it might have happened because there was no information suggesting that the code derived by PartialEq becomes significantly worse without it ("reduces number of basic blocks" seems "meh" enough of an improvement if it regresses an actual code example)

AngelicosPhosphoros · 2021-03-30T21:39:21Z

I wrote a benchmark for more cases.
I run it on toolchains from HEAD~1 and HEAD for my working branch, and used target-cpu=znver1.

So, results:

Legend:

u32s struct:
   5 us32 fields struct
u32s struct/Self - comparison with same cloned vec
u32s struct/Random field - comparison with elements differ only in one random field
u32s struct/First field - comparison with elements differ only in first field
u32s struct/Last field - comparison with elements differ only in last field
u32s struct/Every Second - comparison with vec which every even element is equal and every odd is inequal

String struct:
  Struct with 4 u32 fields and one string field. String fields generated as random char from A to Z.
cmp/u32s struct/Self - with same cloned Vec
cmp/String struct/u32 field - with random u32 field
cmp/String struct/String field - with String field.

HEAD~1

     Running unittests (target\release\deps\bench_u32s-967e14898f5691eb.exe)

Gnuplot not found, using plotters backend
cmp/u32s struct/Self    time:   [6.7752 us 6.7880 us 6.8033 us]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
cmp/u32s struct/Random field
                        time:   [15.176 us 15.204 us 15.234 us]
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe
cmp/u32s struct/First field
                        time:   [2.6016 us 2.6084 us 2.6149 us]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
cmp/u32s struct/Last field
                        time:   [6.7269 us 6.7357 us 6.7459 us]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
cmp/u32s struct/Every Second
                        time:   [12.925 us 12.940 us 12.957 us]
Found 18 outliers among 100 measurements (18.00%)
  18 (18.00%) high severe

     Running unittests (target\release\deps\bench_with_strings-77b5d3b11469681f.exe)

Gnuplot not found, using plotters backend
cmp/String struct/Self  time:   [65.674 us 65.750 us 65.823 us]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cmp/String struct/u32 field
                        time:   [17.712 us 17.733 us 17.756 us]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
cmp/String struct/String field
                        time:   [10.087 us 10.102 us 10.116 us]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild

With PR commit

     Running unittests (target\release\deps\bench_u32s-967e14898f5691eb.exe)
Gnuplot not found, using plotters backend
cmp/u32s struct/Self    time:   [7.1867 us 7.2374 us 7.2920 us]
                        change: [+6.1698% +7.0979% +7.9865%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
cmp/u32s struct/Random field
                        time:   [6.7970 us 6.8126 us 6.8319 us]
                        change: [-54.664% -53.917% -53.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
cmp/u32s struct/First field
                        time:   [6.7780 us 6.7868 us 6.7966 us]
                        change: [+158.65% +159.32% +160.07%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
cmp/u32s struct/Last field
                        time:   [6.7810 us 6.7919 us 6.8034 us]
                        change: [+0.3407% +0.6019% +0.9169%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
cmp/u32s struct/Every Second
                        time:   [6.7663 us 6.7758 us 6.7861 us]
                        change: [-48.465% -48.193% -47.936%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

     Running unittests (target\release\deps\bench_with_strings-77b5d3b11469681f.exe)

Gnuplot not found, using plotters backend
cmp/String struct/Self  time:   [59.658 us 59.743 us 59.833 us]
                        change: [-9.1402% -8.9131% -8.6605%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
cmp/String struct/u32 field
                        time:   [3.9043 us 3.9140 us 3.9253 us]
                        change: [-77.864% -77.797% -77.726%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
cmp/String struct/String field
                        time:   [6.9662 us 6.9746 us 6.9833 us]
                        change: [-31.134% -30.958% -30.788%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

So, it looks like that code generated by trunk version is faster when it's execution finishes on first branch and branch predictor correctly predicts this. When branch predictor fails to do this, it is slower 7 times.
SIMD version shows much less variance and it is significantly slower in First field case and faster in worst Random field.
First field case slowed down by 4.1784 microseconds and Random field sped up by 8.3914

I think, it is a clear win.

For anyone who want to look benchmarks code or run it on own machine, I add zipped project.
bench_changes.zip

@nagisa I would add codegen tests now and I wonder if I need to squash my commits myself or it would be done automatically.

nagisa · 2021-03-30T21:53:10Z

bors merges commits as is so you do need to cleanup the history in your PR manually.

…

On Wed, 31 Mar 2021, 00:39 AngelicosPhosphoros, ***@***.***> wrote: I wrote a benchmark for more cases. I run it on toolchains from HEAD~1 and HEAD for my working branch, and used target-cpu=znver1. So, results: Legend: u32s struct: 5 us32 fields struct u32s struct/Self - comparison with same cloned vec u32s struct/Random field - comparison with elements differ only in one random field u32s struct/First field - comparison with elements differ only in first field u32s struct/Last field - comparison with elements differ only in last field u32s struct/Every Second - comparison with vec which every even element is equal and every odd is inequal String struct: Struct with 4 u32 fields and one string field. String fields generated as random char from A to Z. cmp/u32s struct/Self - with same cloned Vec cmp/String struct/u32 field - with random u32 field cmp/String struct/String field - with String field. HEAD~1 Running unittests (target\release\deps\bench_u32s-967e14898f5691eb.exe) Gnuplot not found, using plotters backend cmp/u32s struct/Self time: [6.7752 us 6.7880 us 6.8033 us] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild cmp/u32s struct/Random field time: [15.176 us 15.204 us 15.234 us] Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) high mild 1 (1.00%) high severe cmp/u32s struct/First field time: [2.6016 us 2.6084 us 2.6149 us] Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cmp/u32s struct/Last field time: [6.7269 us 6.7357 us 6.7459 us] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe cmp/u32s struct/Every Second time: [12.925 us 12.940 us 12.957 us] Found 18 outliers among 100 measurements (18.00%) 18 (18.00%) high severe Running unittests (target\release\deps\bench_with_strings-77b5d3b11469681f.exe) Gnuplot not found, using plotters backend cmp/String struct/Self time: [65.674 us 65.750 us 65.823 us] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cmp/String struct/u32 field time: [17.712 us 17.733 us 17.756 us] Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild cmp/String struct/String field time: [10.087 us 10.102 us 10.116 us] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 5 (5.00%) high mild With PR commit Running unittests (target\release\deps\bench_u32s-967e14898f5691eb.exe) Gnuplot not found, using plotters backend cmp/u32s struct/Self time: [7.1867 us 7.2374 us 7.2920 us] change: [+6.1698% +7.0979% +7.9865%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild cmp/u32s struct/Random field time: [6.7970 us 6.8126 us 6.8319 us] change: [-54.664% -53.917% -53.142%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe cmp/u32s struct/First field time: [6.7780 us 6.7868 us 6.7966 us] change: [+158.65% +159.32% +160.07%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe cmp/u32s struct/Last field time: [6.7810 us 6.7919 us 6.8034 us] change: [+0.3407% +0.6019% +0.9169%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe cmp/u32s struct/Every Second time: [6.7663 us 6.7758 us 6.7861 us] change: [-48.465% -48.193% -47.936%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high mild Running unittests (target\release\deps\bench_with_strings-77b5d3b11469681f.exe) Gnuplot not found, using plotters backend cmp/String struct/Self time: [59.658 us 59.743 us 59.833 us] change: [-9.1402% -8.9131% -8.6605%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 3 (3.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe cmp/String struct/u32 field time: [3.9043 us 3.9140 us 3.9253 us] change: [-77.864% -77.797% -77.726%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe cmp/String struct/String field time: [6.9662 us 6.9746 us 6.9833 us] change: [-31.134% -30.958% -30.788%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe So, it looks like that code generated by trunk version is faster when it's execution finishes on first branch and branch predictor correctly predicts this. When branch predictor fails to do this, it is slower 7 times. SIMD version shows much less variance and it is significantly slower in *First field* case and faster in worst *Random field*. *First field* case slowed down by 4.1784 microseconds and *Random field* sped up by 8.3914 I think, it is a clear win. For anyone who want to look benchmarks code or run it on own machine, I add zipped project. bench_changes.zip <https://github.com/rust-lang/rust/files/6232364/bench_changes.zip> @nagisa <https://github.com/nagisa> I would add codegen tests now and I wonder if I need to squash my commits myself or it would be done automatically. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#83663 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFFZUR5DSZ3U5K5PU2ZAN3TGJAJTANCNFSM42AOW7HA> .

AngelicosPhosphoros · 2021-03-30T22:06:44Z

src/test/codegen/issue-83623-SIMD-PartialEq.rs

@@ -0,0 +1,45 @@
+// This test checks that jumps generated by logical operators can be optimized away


I don't know how to write codegen tests so I made this by example.
I can miss something.

nagisa · 2021-03-30T22:23:58Z

src/test/codegen/issue-83623-SIMD-PartialEq.rs

@@ -0,0 +1,45 @@
+// This test checks that jumps generated by logical operators can be optimized away
+
+// compile-flags: -O3


-Copt-level=3 or perhaps just -O should work.

nagisa · 2021-03-30T22:25:53Z

src/test/codegen/issue-83623-SIMD-PartialEq.rs

+// This test checks that jumps generated by logical operators can be optimized away
+
+// compile-flags: -O3
+// only-64bit


This test might fail for some of the 64-bit targets, and may need to be limited to just x86_64, but lets see if this works out, first.

nagisa · 2021-03-30T22:32:22Z

@bors r+ rollup=never

richkadel · 2021-03-31T14:25:38Z

You may need to set profiler = true in your config.toml

Sorry. I wish that was a default.

richkadel · 2021-03-31T14:30:06Z

I recommend Ubuntu for generating the new test results. Not Windows.

And if it keeps ignoring the tests, you may need to remove the test results in your build directory and run again.

richkadel · 2021-03-31T14:36:34Z

Just thinking, you may need to build LLVM, and it has to build with 'optimize = false"

Some constraints I only recently realized. Thanks.

AngelicosPhosphoros · 2021-03-31T22:25:04Z

Can you help me with building LLVM please?

Missed headers

>grep 'No such file or directory'  build/x86_64-unknown-linux-gnu/llvm/buil
d/CMakeFiles/CMakeError.log
CheckIncludeFile.c:1:10: fatal error: malloc/malloc.h: No such file or directory
CheckIncludeFile.c:1:10: fatal error: valgrind/valgrind.h: No such file or directory
CheckIncludeFile.c:1:10: fatal error: mach/mach.h: No such file or directory
CheckIncludeFile.c:1:10: fatal error: histedit.h: No such file or directory
CheckIncludeFile.c:1:10: fatal error: CrashReporterClient.h: No such file or directory
CheckSymbolExists.c:2:10: fatal error: malloc_np.h: No such file or directory
CheckSymbolExists.c:2:10: fatal error: malloc/malloc.h: No such file or directory
CheckSymbolExists.c:2:10: fatal error: os/signpost.h: No such file or directory

richkadel · 2021-03-31T22:38:01Z

I don't normally have to do anything special, but best practice (for me) in this situation would be:

Completely delete your build directory
Copy the current config.toml.example to config.toml and apply the following:

[llvm]
...
optimize = false # required for `llvm-cov show --debug`
...
[build]
...
profiler = true

I also generally enable [rust] debug = true but that's probably not necessary.

Then:

$ ./x.py test src/test/run-make-fulldeps/coverage --bless

should work.

AngelicosPhosphoros · 2021-04-01T15:27:00Z

@richkadel @nagisa

I finally managed to make it compile.
I looks like that I got expected results:

It looks like the only problems are basic block numbers (bb) changed. Should be OK to re-bless if that's the case.

$ ./x.py test src/test/run-make-fulldeps/coverage --bless

As long as the only changed files are spanview files, you're probably good to go (adding the changed expected results to your PR).

If it changes any of the coverage-reports/* expected results, we can take a closer look, but I don't see that in the above failure.

JYI to make it work I changed this:

[llvm]
...
optimize = false
...
[build]
...
profiler = true
...
[target.x86_64-unknown-linux-gnu]
cc = "clang-11"
cxx = "clang++-11"

So, I failed to build LLVM because it used distribution provided C/C++ compiler (gcc) at first.
Your words about optimize = false are true too because using optimize = true deletes coverage reports.

Also, it was very incomfortable to build LLVM on my machine because I needed to limit parallelism since my build constantly killed by OOM on 55 gb of used memory (especially, during linking). Also, rustc compiles very long time (4 hours) if it uses LLVM without optimizations.

Are there any thoughts to move generation of such changes to some build agents? I think, we cut some possible contributors from work because they just haven't such good machines (I have 64 gb RAM on my PC but I don't know anyone other who has).
It woulds be nice if we can call bors to generate commit with such report.

nagisa · 2021-04-01T15:30:14Z

Also, it was very incomfortable to build LLVM on my machine because I needed to limit parallelism since my build constantly killed by OOM on 55 gb of used memory (especially, during linking). Also, rustc compiles very long time (4 hours) if it uses LLVM without optimizations.

Right. Linking will take a ton of memory because debug info is just that huge. There's definitely some improvements to be had around the coverage tests – having to build a unoptimized LLVM any time you need to bless a test suite potentially prevents contributors from making changes at all (what if they only have a machine with, say, 8GB of memory?)

nagisa · 2021-04-01T15:30:20Z

@bors r+

bors · 2021-04-01T15:30:21Z

📌 Commit 87264fa002169ac242f0c74f69a66032274cd88d has been approved by nagisa

This is basically same commit as e38e954 which was reverted later in 676953f In both cases, this changes weren't benchmarked. e38e954 leads to missed optimization described in [this issue](#62993) 676953f leads to missed optimization described in [this issue](#83623) Also it changes some src/test/run-make-fulldeps/coverage-spanview/expected_mir_dump* files automatically.

AngelicosPhosphoros · 2021-04-01T15:36:32Z

@nagisa
Just force-pushed with little change:
Removed #[no_mangle] here because it's not valid attribute on impl.

// && chains should not prevent SIMD optimizations for primitives
#[no_mangle]
impl PartialEq for Blueprint{

richkadel · 2021-04-01T16:08:22Z

having to build a unoptimized LLVM any time you need to bless a test suite potentially prevents contributors from making changes at all (what if they only have a machine with, say, 8GB of memory?)

I agree. I only just made the connection in my head. I'm going to push a PR later this week that removes spanview files and removes the debug output.

The spanview files are no longer as valuable as they used to be (in tests) so that doesn't hurt much.

It's frustrating that I can't call llvm-cov show --debug to show the counters without setting optimize = false for the whole LLVM build. There must be a way to be more granular than that but I'm not aware of it.

But that's not your problem. I think it's worth removing.

cc: @wesleywiser @tmandry

AngelicosPhosphoros · 2021-04-01T16:13:21Z

I also made a thread in forum for this: https://internals.rust-lang.org/t/blessing-tests-for-rustc-too-hard-for-average-pc/14396

nagisa · 2021-04-01T16:17:21Z

@bors r+

It's frustrating that I can't call llvm-cov show --debug to show the counters without setting optimize = false

In my experience RelWithDebInfo builds still retain --debug output at least in llc and opt, so that may be an option to pursue. Doesn't solve the issue with linkers though as debug info is a major contributor to the linkers' memory use.

bors · 2021-04-01T16:17:22Z

📌 Commit 4464cc2 has been approved by nagisa

richkadel · 2021-04-01T20:43:10Z

I assume #83663 will land before #83755, in which case, I'll need to rebase my changes.

But if for some reason #83755 lands first, you can just wipe out the spanview directory:

src/test/run-make-fulldeps/coverage-spanview/

You shouldn't need to re-bless since (AFAICT) only spanview files were changed in this PR. But if you ever do need to rebless, you no longer need to build your own LLVM (once my PR lands). The LLVM debug options are no longer required.

Thanks!

nikic · 2021-04-01T20:49:09Z

FYI usually debug flags in LLVM are controlled by assertions = true. It shouldn't be necessary to use a debug build of LLVM.

bors · 2021-04-02T00:48:35Z

⌛ Testing commit 4464cc2 with merge d1065e6...

bors · 2021-04-02T03:39:21Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing d1065e6 to master...

rust-highfive assigned estebank Mar 29, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 29, 2021

AngelicosPhosphoros mentioned this pull request Mar 29, 2021

&& operator chains (and ||, possibly) generates unoptimizable LLVM IR #83623

Open

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 29, 2021

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 30, 2021

rust-highfive assigned nagisa and unassigned estebank Mar 30, 2021

AngelicosPhosphoros commented Mar 30, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

nagisa reviewed Mar 30, 2021

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 1, 2021

richkadel mentioned this pull request Apr 1, 2021

Simplify coverage tests #83755

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 2, 2021

bors merged commit d1065e6 into rust-lang:master Apr 2, 2021

rustbot added this to the 1.53.0 milestone Apr 2, 2021

This was referenced Apr 2, 2021

2229: Fix diagnostic issue when using FakeReads in closures #83521

Merged

Rollup of 6 pull requests #83750

Closed

Rollup of 7 pull requests #83758

Closed

AngelicosPhosphoros deleted the simplify_binary_and_to_get_better_asm branch April 2, 2021 10:17

AngelicosPhosphoros mentioned this pull request Apr 2, 2021

Add benchmarks for known LLVM regressions anp/lolbench#81

Open

		@@ -0,0 +1,45 @@
		// This test checks that jumps generated by logical operators can be optimized away

		@@ -0,0 +1,45 @@
		// This test checks that jumps generated by logical operators can be optimized away

		// compile-flags: -O3

Simplify logical operations CFG #83663

Simplify logical operations CFG #83663

Conversation

AngelicosPhosphoros commented Mar 29, 2021

rust-highfive commented Mar 29, 2021

AngelicosPhosphoros commented Mar 29, 2021

bjorn3 commented Mar 29, 2021

rust-timer commented Mar 29, 2021

bors commented Mar 29, 2021

nagisa commented Mar 29, 2021

bors commented Mar 29, 2021

rust-timer commented Mar 29, 2021

rust-timer commented Mar 30, 2021

mati865 commented Mar 30, 2021

estebank commented Mar 30, 2021

nagisa commented Mar 30, 2021

AngelicosPhosphoros commented Mar 30, 2021

AngelicosPhosphoros commented Mar 30, 2021

nagisa commented Mar 30, 2021

AngelicosPhosphoros commented Mar 30, 2021

nagisa commented Mar 30, 2021 via email

AngelicosPhosphoros Mar 30, 2021

Choose a reason for hiding this comment

This comment has been minimized.

nagisa Mar 30, 2021

Choose a reason for hiding this comment

nagisa Mar 30, 2021

Choose a reason for hiding this comment

nagisa commented Mar 30, 2021

richkadel commented Mar 31, 2021

richkadel commented Mar 31, 2021

richkadel commented Mar 31, 2021

AngelicosPhosphoros commented Mar 31, 2021

richkadel commented Mar 31, 2021

AngelicosPhosphoros commented Apr 1, 2021 • edited Loading

nagisa commented Apr 1, 2021

nagisa commented Apr 1, 2021

bors commented Apr 1, 2021

AngelicosPhosphoros commented Apr 1, 2021

richkadel commented Apr 1, 2021

AngelicosPhosphoros commented Apr 1, 2021

nagisa commented Apr 1, 2021

bors commented Apr 1, 2021

richkadel commented Apr 1, 2021

nikic commented Apr 1, 2021 • edited Loading

bors commented Apr 2, 2021

bors commented Apr 2, 2021

AngelicosPhosphoros commented Apr 1, 2021 •

edited

Loading

nikic commented Apr 1, 2021 •

edited

Loading