[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers #96200

lukel97 · 2024-06-20T14:39:53Z

If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli.

This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles.

We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI.

We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges.

This patch does two interlinked things:

It adds this AVL "peeking" to getInfoForVSETVLI
It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL which can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6

Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI.

We also need to update how we update LiveIntervals in insertVSETVLI, as we can now end up needing to extend the LiveRange of the AVL across blocks.

llvmbot · 2024-06-20T14:40:24Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli.

This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles.

We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI.

We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges.

This patch does two interlinked things:

It adds this AVL "peeking" to getInfoForVSETVLI
It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6

Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI.

Full diff: https://github.com/llvm/llvm-project/pull/96200.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+21-21)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+25-23)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll (-1)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 877535513c721..7e53a8202ea53 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -962,6 +962,18 @@ RISCVInsertVSETVLI::getInfoForVSETVLI(const MachineInstr &MI) const {
   }
   NewInfo.setVTYPE(MI.getOperand(2).getImm());
 
+  // If AVL is defined by a vsetvli with the same VLMAX, we can replace the
+  // AVL operand with the AVL of the defining vsetvli.
+  if (NewInfo.hasAVLReg()) {
+    if (const MachineInstr *DefMI = NewInfo.getAVLDefMI(LIS);
+        DefMI && isVectorConfigInstr(*DefMI)) {
+      VSETVLIInfo DefInstrInfo = getInfoForVSETVLI(*DefMI);
+      if (DefInstrInfo.hasSameVLMAX(NewInfo)/* &&
+          (DefInstrInfo.hasAVLImm() || DefInstrInfo.hasAVLVLMAX())*/)
+        NewInfo.setAVL(DefInstrInfo);
+    }
+  }
+
   return NewInfo;
 }
 
@@ -1050,15 +1062,12 @@ RISCVInsertVSETVLI::computeInfoForInstr(const MachineInstr &MI) const {
   InstrInfo.setVTYPE(VLMul, SEW, TailAgnostic, MaskAgnostic);
 
   // If AVL is defined by a vsetvli with the same VLMAX, we can replace the
-  // AVL operand with the AVL of the defining vsetvli.  We avoid general
-  // register AVLs to avoid extending live ranges without being sure we can
-  // kill the original source reg entirely.
+  // AVL operand with the AVL of the defining vsetvli.
   if (InstrInfo.hasAVLReg()) {
     if (const MachineInstr *DefMI = InstrInfo.getAVLDefMI(LIS);
         DefMI && isVectorConfigInstr(*DefMI)) {
       VSETVLIInfo DefInstrInfo = getInfoForVSETVLI(*DefMI);
-      if (DefInstrInfo.hasSameVLMAX(InstrInfo) &&
-          (DefInstrInfo.hasAVLImm() || DefInstrInfo.hasAVLVLMAX()))
+      if (DefInstrInfo.hasSameVLMAX(InstrInfo))
         InstrInfo.setAVL(DefInstrInfo);
     }
   }
@@ -1146,9 +1155,13 @@ void RISCVInsertVSETVLI::insertVSETVLI(MachineBasicBlock &MBB,
     LIS->InsertMachineInstrInMaps(*MI);
     // Normally the AVL's live range will already extend past the inserted
     // vsetvli because the pseudos below will already use the AVL. But this
-    // isn't always the case, e.g. PseudoVMV_X_S doesn't have an AVL operand.
-    LIS->getInterval(AVLReg).extendInBlock(
-        LIS->getMBBStartIdx(&MBB), LIS->getInstructionIndex(*MI).getRegSlot());
+    // isn't always the case, e.g. PseudoVMV_X_S doesn't have an AVL operand or
+    // we've taken the AVL from the VL output of another vsetvli.
+    LiveInterval &LI = LIS->getInterval(AVLReg);
+    // Need to get non-const VNInfo
+    VNInfo *VNI = LI.getValNumInfo(Info.getAVLVNInfo()->id);
+    LI.addSegment(LiveInterval::Segment(
+        VNI->def, LIS->getInstructionIndex(*MI).getRegSlot(), VNI));
   }
 }
 
@@ -1163,19 +1176,6 @@ bool RISCVInsertVSETVLI::needVSETVLI(const DemandedFields &Used,
   if (CurInfo.isCompatible(Used, Require, LIS))
     return false;
 
-  // We didn't find a compatible value. If our AVL is a virtual register,
-  // it might be defined by a VSET(I)VLI. If it has the same VLMAX we need
-  // and the last VL/VTYPE we observed is the same, we don't need a
-  // VSETVLI here.
-  if (Require.hasAVLReg() && CurInfo.hasCompatibleVTYPE(Used, Require)) {
-    if (const MachineInstr *DefMI = Require.getAVLDefMI(LIS);
-        DefMI && isVectorConfigInstr(*DefMI)) {
-      VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
-      if (DefInfo.hasSameAVL(CurInfo) && DefInfo.hasSameVLMAX(CurInfo))
-        return false;
-    }
-  }
-
   return true;
 }
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
index 7eb6cacf1ca43..5a6364967eba2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
@@ -234,24 +234,24 @@ if.end6:                                          ; preds = %if.else5, %if.then4
 define <vscale x 1 x double> @test6(i64 %avl, i8 zeroext %cond, <vscale x 1 x double> %a, <vscale x 1 x double> %b) nounwind {
 ; CHECK-LABEL: test6:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    andi a3, a1, 1
-; CHECK-NEXT:    vsetvli a2, a0, e64, m1, ta, ma
-; CHECK-NEXT:    bnez a3, .LBB5_3
+; CHECK-NEXT:    andi a2, a1, 1
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:    bnez a2, .LBB5_3
 ; CHECK-NEXT:  # %bb.1: # %if.else
 ; CHECK-NEXT:    vfsub.vv v8, v8, v9
 ; CHECK-NEXT:    andi a1, a1, 2
 ; CHECK-NEXT:    beqz a1, .LBB5_4
 ; CHECK-NEXT:  .LBB5_2: # %if.then4
-; CHECK-NEXT:    lui a0, %hi(.LCPI5_0)
-; CHECK-NEXT:    addi a0, a0, %lo(.LCPI5_0)
-; CHECK-NEXT:    vlse64.v v9, (a0), zero
-; CHECK-NEXT:    lui a0, %hi(.LCPI5_1)
-; CHECK-NEXT:    addi a0, a0, %lo(.LCPI5_1)
-; CHECK-NEXT:    vlse64.v v10, (a0), zero
+; CHECK-NEXT:    lui a1, %hi(.LCPI5_0)
+; CHECK-NEXT:    addi a1, a1, %lo(.LCPI5_0)
+; CHECK-NEXT:    vlse64.v v9, (a1), zero
+; CHECK-NEXT:    lui a1, %hi(.LCPI5_1)
+; CHECK-NEXT:    addi a1, a1, %lo(.LCPI5_1)
+; CHECK-NEXT:    vlse64.v v10, (a1), zero
 ; CHECK-NEXT:    vfadd.vv v9, v9, v10
-; CHECK-NEXT:    lui a0, %hi(scratch)
-; CHECK-NEXT:    addi a0, a0, %lo(scratch)
-; CHECK-NEXT:    vse64.v v9, (a0)
+; CHECK-NEXT:    lui a1, %hi(scratch)
+; CHECK-NEXT:    addi a1, a1, %lo(scratch)
+; CHECK-NEXT:    vse64.v v9, (a1)
 ; CHECK-NEXT:    j .LBB5_5
 ; CHECK-NEXT:  .LBB5_3: # %if.then
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
@@ -259,16 +259,16 @@ define <vscale x 1 x double> @test6(i64 %avl, i8 zeroext %cond, <vscale x 1 x do
 ; CHECK-NEXT:    bnez a1, .LBB5_2
 ; CHECK-NEXT:  .LBB5_4: # %if.else5
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m1, ta, ma
-; CHECK-NEXT:    lui a0, 260096
-; CHECK-NEXT:    vmv.v.x v9, a0
-; CHECK-NEXT:    lui a0, 262144
-; CHECK-NEXT:    vmv.v.x v10, a0
+; CHECK-NEXT:    lui a1, 260096
+; CHECK-NEXT:    vmv.v.x v9, a1
+; CHECK-NEXT:    lui a1, 262144
+; CHECK-NEXT:    vmv.v.x v10, a1
 ; CHECK-NEXT:    vfadd.vv v9, v9, v10
-; CHECK-NEXT:    lui a0, %hi(scratch)
-; CHECK-NEXT:    addi a0, a0, %lo(scratch)
-; CHECK-NEXT:    vse32.v v9, (a0)
+; CHECK-NEXT:    lui a1, %hi(scratch)
+; CHECK-NEXT:    addi a1, a1, %lo(scratch)
+; CHECK-NEXT:    vse32.v v9, (a1)
 ; CHECK-NEXT:  .LBB5_5: # %if.end10
-; CHECK-NEXT:    vsetvli zero, a2, e64, m1, ta, ma
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    vfmul.vv v8, v8, v8
 ; CHECK-NEXT:    ret
 entry:
@@ -328,7 +328,8 @@ define <vscale x 1 x double> @test8(i64 %avl, i8 zeroext %cond, <vscale x 1 x do
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 1
 ; CHECK-NEXT:    sub sp, sp, a2
-; CHECK-NEXT:    vsetvli s0, a0, e64, m1, ta, ma
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    beqz a1, .LBB6_2
 ; CHECK-NEXT:  # %bb.1: # %if.then
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
@@ -387,7 +388,8 @@ define <vscale x 1 x double> @test9(i64 %avl, i8 zeroext %cond, <vscale x 1 x do
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 1
 ; CHECK-NEXT:    sub sp, sp, a2
-; CHECK-NEXT:    vsetvli s0, a0, e64, m1, ta, ma
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    beqz a1, .LBB7_2
 ; CHECK-NEXT:  # %bb.1: # %if.then
 ; CHECK-NEXT:    vfadd.vv v9, v8, v9
@@ -722,7 +724,7 @@ define void @vector_init_vsetvli_N(i64 %N, ptr %c) {
 ; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:  .LBB14_2: # %for.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vsetvli zero, a3, e64, m1, ta, ma
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    vse64.v v8, (a1)
 ; CHECK-NEXT:    add a2, a2, a3
 ; CHECK-NEXT:    add a1, a1, a4
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
index da0c1cfb50097..7f01fd4d945c6 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
@@ -258,7 +258,6 @@ entry:
 define <vscale x 1 x double> @test14(i64 %avl, <vscale x 1 x double> %a, <vscale x 1 x double> %b) nounwind {
 ; CHECK-LABEL: test14:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetvli a0, a0, e32, mf2, ta, ma
 ; CHECK-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
 ; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma

lukel97 · 2024-06-20T14:44:16Z

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll

@@ -328,7 +328,8 @@ define <vscale x 1 x double> @test8(i64 %avl, i8 zeroext %cond, <vscale x 1 x do
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    slli a2, a2, 1
 ; CHECK-NEXT:    sub sp, sp, a2
-; CHECK-NEXT:    vsetvli s0, a0, e64, m1, ta, ma
+; CHECK-NEXT:    mv s0, a0
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma


The mv here comes from the call below which clobbers a0, but I don't think this is a regression anyway since we're removing the true dependency

Agree. And I think the hardware may implement mv elimination.

in-order cores can't really implement mv elimination since it requires register renaming.

Stacked on llvm#96200 Currently we try and detect when the VL doesn't change between two vsetvlis in emitVSETVLIs, and insert a VL-preserving vsetvli x0,x0 then and there. Doing it in situ has some drawbacks: - We lose information about what the VL is which can prevent doLocalPostpass from coalescing some vsetvlis further down the line - We have to explicitly handle x0,x0 form vsetvlis in coalesceVSETVLIs, whereas we don't in the top-down passes - This prevents us from sharing the VSETVLIInfo compatibility logic between the two, hence why we have canMutatePriorConfig This patch changes emitVSETVLIs to just emit regular vsetvlis, and adds a separate pass after coalesceVSETVLIs to convert vsetvlis to x0,x0 when possible. By removing the edge cases needed to handle x0,x0s, we can unify how we check vsetvli compatibility between coalesceVSETVLIs and emitVSETVLIs, and remove the duplicated logic in areCompatibleVTYPEs and canMutatePriorConfig. Note that when converting to x0,x0, we reuse the block data computed from the dataflow analysis despite it taking place after coalesceVSETVLIs. This turns out to be fine since coalesceVSETVLI never changes the exit state (only the local state within the block), and so the entry states stay the same too.

BeMg · 2024-06-21T04:05:56Z

Seem LLVM :: Transforms/LoopStrengthReduce/RISCV/lsr-drop-solution.ll also need update?

@test6

If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli. This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles. We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI. We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges. This patch does two interlinked things: 1) It adds this AVL "peeking" to getInfoForVSETVLI 2) It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6 Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI.

lukel97 · 2024-06-21T05:07:34Z

Seem LLVM :: Transforms/LoopStrengthReduce/RISCV/lsr-drop-solution.ll also need update?

Whoops yes, should be updated now

BeMg

LGTM

llvm-ci · 2024-06-23T13:22:52Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-qemu running on sanitizer-buildbot3 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/139/builds/375

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
1 warning generated.
[72/76] Generating ScudoCUnitTest-mips64-Test
[73/76] Generating ScudoCxxUnitTest-mips64-Test
[74/76] Generating ScudoUnitTestsObjects.combined_test.cpp.mips64.o
[75/76] Generating ScudoUnitTest-mips64-Test
[75/76] Running Scudo Standalone tests
llvm-lit: /b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 152 tests, 80 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: ScudoStandalone-Unit :: ./ScudoUnitTest-mips64-Test/72/134 (152 of 152)
******************** TEST 'ScudoStandalone-Unit :: ./ScudoUnitTest-mips64-Test/72/134' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64-Test-ScudoStandalone-Unit-1337413-72-134.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=134 GTEST_SHARD_INDEX=72 /b/sanitizer-x86_64-linux-qemu/build/qemu_build/qemu-mips64 -L /usr/mips64-linux-gnuabi64 /b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64-Test
--

Note: This is test shard 73 of 134.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from ScudoCombinedDeathTestBasicCombined15_AndroidConfig
[ RUN      ] ScudoCombinedDeathTestBasicCombined15_AndroidConfig.BasicCombined15
Stats: SizeClassAllocator64: 0M mapped (0M rss) in 15 allocations; remains 15; ReleaseToOsIntervalMs = 1000
  00 (    64): mapped:    256K popped:      13 pushed:       0 inuse:     13 total:    104 releases:      0 last released:      0K latest pushed bytes:      5K region: 0x5555d680e000 (0x5555d6808000)
  31 ( 33296): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      7 releases:      0 last released:      0K latest pushed bytes:    195K region: 0x555656816000 (0x555656808000)
  32 ( 65552): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      3 releases:      0 last released:      0K latest pushed bytes:    128K region: 0x5556b6817000 (0x5556b6808000)
Stats: MapAllocator: allocated 81 times (7848K), freed 81 times (7848K), remains 0 (0K) max 0M, Fragmented 0K
Stats: MapAllocatorCache: EntriesCount: 7, MaxEntriesCount: 32, MaxEntrySize: 2097152, ReleaseToOsIntervalMs = 1000
Stats: CacheRetrievalStats: SuccessRate: 0/0 (100.00%)
StartBlockAddress: 0x55576722f000, EndBlockAddress: 0x555767249000, BlockSize: 106496 
StartBlockAddress: 0x55576715f000, EndBlockAddress: 0x55576717d000, BlockSize: 122880 
StartBlockAddress: 0x55576717f000, EndBlockAddress: 0x55576719f000, BlockSize: 131072 
StartBlockAddress: 0x5557671af000, EndBlockAddress: 0x5557671c1000, BlockSize: 73728 
StartBlockAddress: 0x5557671cf000, EndBlockAddress: 0x5557671e3000, BlockSize: 81920 
StartBlockAddress: 0x5557671ef000, EndBlockAddress: 0x555767205000, BlockSize: 90112 
StartBlockAddress: 0x55576720f000, EndBlockAddress: 0x555767227000, BlockSize: 98304 
Stats: Quarantine: batches: 0; bytes: 0 (user: 0); chunks: 0 (capacity: 0); 0% chunks used; 0% memory overhead
Quarantine limits: global: 256K; thread local: 128K
Stats: SharedTSDs: 2 available; total 8
  Shared TSD[0]:
    00 (    64): cached:    9 max:   26
    31 ( 33296): cached:    1 max:    2
    32 ( 65552): cached:    1 max:    2
  Shared TSD[1]:
    No block is cached.
Fragmentation Stats: SizeClassAllocator64: page size = 4096 bytes
  01 (    32): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  02 (    48): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  03 (    64): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  04 (    80): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
Step 24 (scudo debug_mips64_qemu) failure: scudo debug_mips64_qemu (failure)
...
1 warning generated.
[72/76] Generating ScudoCUnitTest-mips64-Test
[73/76] Generating ScudoCxxUnitTest-mips64-Test
[74/76] Generating ScudoUnitTestsObjects.combined_test.cpp.mips64.o
[75/76] Generating ScudoUnitTest-mips64-Test
[75/76] Running Scudo Standalone tests
llvm-lit: /b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 152 tests, 80 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
TIMEOUT: ScudoStandalone-Unit :: ./ScudoUnitTest-mips64-Test/72/134 (152 of 152)
******************** TEST 'ScudoStandalone-Unit :: ./ScudoUnitTest-mips64-Test/72/134' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64-Test-ScudoStandalone-Unit-1337413-72-134.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=134 GTEST_SHARD_INDEX=72 /b/sanitizer-x86_64-linux-qemu/build/qemu_build/qemu-mips64 -L /usr/mips64-linux-gnuabi64 /b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-mips64-Test
--

Note: This is test shard 73 of 134.
[==========] Running 2 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from ScudoCombinedDeathTestBasicCombined15_AndroidConfig
[ RUN      ] ScudoCombinedDeathTestBasicCombined15_AndroidConfig.BasicCombined15
Stats: SizeClassAllocator64: 0M mapped (0M rss) in 15 allocations; remains 15; ReleaseToOsIntervalMs = 1000
  00 (    64): mapped:    256K popped:      13 pushed:       0 inuse:     13 total:    104 releases:      0 last released:      0K latest pushed bytes:      5K region: 0x5555d680e000 (0x5555d6808000)
  31 ( 33296): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      7 releases:      0 last released:      0K latest pushed bytes:    195K region: 0x555656816000 (0x555656808000)
  32 ( 65552): mapped:    256K popped:       1 pushed:       0 inuse:      1 total:      3 releases:      0 last released:      0K latest pushed bytes:    128K region: 0x5556b6817000 (0x5556b6808000)
Stats: MapAllocator: allocated 81 times (7848K), freed 81 times (7848K), remains 0 (0K) max 0M, Fragmented 0K
Stats: MapAllocatorCache: EntriesCount: 7, MaxEntriesCount: 32, MaxEntrySize: 2097152, ReleaseToOsIntervalMs = 1000
Stats: CacheRetrievalStats: SuccessRate: 0/0 (100.00%)
StartBlockAddress: 0x55576722f000, EndBlockAddress: 0x555767249000, BlockSize: 106496 
StartBlockAddress: 0x55576715f000, EndBlockAddress: 0x55576717d000, BlockSize: 122880 
StartBlockAddress: 0x55576717f000, EndBlockAddress: 0x55576719f000, BlockSize: 131072 
StartBlockAddress: 0x5557671af000, EndBlockAddress: 0x5557671c1000, BlockSize: 73728 
StartBlockAddress: 0x5557671cf000, EndBlockAddress: 0x5557671e3000, BlockSize: 81920 
StartBlockAddress: 0x5557671ef000, EndBlockAddress: 0x555767205000, BlockSize: 90112 
StartBlockAddress: 0x55576720f000, EndBlockAddress: 0x555767227000, BlockSize: 98304 
Stats: Quarantine: batches: 0; bytes: 0 (user: 0); chunks: 0 (capacity: 0); 0% chunks used; 0% memory overhead
Quarantine limits: global: 256K; thread local: 128K
Stats: SharedTSDs: 2 available; total 8
  Shared TSD[0]:
    00 (    64): cached:    9 max:   26
    31 ( 33296): cached:    1 max:    2
    32 ( 65552): cached:    1 max:    2
  Shared TSD[1]:
    No block is cached.
Fragmentation Stats: SizeClassAllocator64: page size = 4096 bytes
  01 (    32): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  02 (    48): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  03 (    64): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%
  04 (    80): inuse/total blocks:      0/     0 inuse/total pages:      0/     0 inuse bytes:      0K util: 100.00%

llvm-ci · 2024-06-23T14:47:19Z

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/96

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: googletest-timeout.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 9
not env -u FILECHECK_OPTS "/opt/freeware/bin/python3.9" /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout    --param gtest_filter=InfiniteLoopSubTest --timeout=1 > /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out
# executed command: not env -u FILECHECK_OPTS /opt/freeware/bin/python3.9 /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --timeout=1
# .---command stderr------------
# | lit.py: /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# | Traceback (most recent call last):
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 304, in post_process_shard_results
# |     testsuites = json.load(f)["testsuites"]
# |   File "/opt/freeware/lib64/python3.9/json/__init__.py", line 293, in load
# |     return loads(fp.read(),
# |   File "/opt/freeware/lib64/python3.9/json/__init__.py", line 346, in loads
# |     return _default_decoder.decode(s)
# |   File "/opt/freeware/lib64/python3.9/json/decoder.py", line 337, in decode
# |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
# |   File "/opt/freeware/lib64/python3.9/json/decoder.py", line 355, in raw_decode
# |     raise JSONDecodeError("Expecting value", s, err.value) from None
# | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
# | 
# | During handling of the above exception, another exception occurred:
# | 
# | Traceback (most recent call last):
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py", line 6, in <module>
# |     main()
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py", line 129, in main
# |     selected_tests, discovered_tests = GoogleTest.post_process_shard_results(
# |   File "/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 306, in post_process_shard_results
# |     raise RuntimeError(
# | RuntimeError: Failed to parse json file: /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Inputs/googletest-timeout/DummySubDir/OneTest.py-googletest-timeout-393872-1-2.json
# | 
# `-----------------------------
# RUN: at line 11
FileCheck --check-prefix=CHECK-INF < /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py
# .---command stderr------------
# | /home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/googletest-timeout.py:34:14: error: CHECK-INF: expected string not found in input
# | # CHECK-INF: Timed Out: 1
# |              ^
# | <stdin>:13:29: note: scanning from here
# | Reached timeout of 1 seconds
# |                             ^
# | <stdin>:15:21: note: possible intended match here
# | TIMEOUT: googletest-timeout :: DummySubDir/OneTest.py/1/2 (2 of 2)
# |                     ^
# | 
# | Input file: <stdin>
...

This adds a separate test case for an existing issue fixed in #96200, where we failing to extend the live range of an AVL when inserting a vsetvli if the AVL was from a different block.

…TVLI In llvm#96200 we handled extending AVL LiveIntervals across basic blocks, which fixed a crash in a test case in 133ab9a. This was done by manually adding a single segment to the LiveInterval to extend it from AVL def -> inserted vsetvli, but in hindsight this was too simple and fails to handle cases where the vsetlvi is located before the AVL def. This patch fixes this by using LiveIntervals::extendToIndices instead which can handle these cases. (The crash that this fixes is separate from the crash in llvm#97264)

…TVLI (#97512) In #96200 we handled extending AVL LiveIntervals across basic blocks, which fixed a crash in a test case in 133ab9a. This was done by manually adding a single segment to the LiveInterval to extend it from AVL def -> inserted vsetvli, but in hindsight this was too simple and fails to handle cases where the vsetlvi is located before the AVL def. This patch fixes this by using LiveIntervals::extendToIndices instead which can handle these cases. (The crash that this fixes is separate from the crash in #97264)

This adds a separate test case for an existing issue fixed in llvm#96200, where we failing to extend the live range of an AVL when inserting a vsetvli if the AVL was from a different block.

…TVLI (llvm#97512) In llvm#96200 we handled extending AVL LiveIntervals across basic blocks, which fixed a crash in a test case in 133ab9a. This was done by manually adding a single segment to the LiveInterval to extend it from AVL def -> inserted vsetvli, but in hindsight this was too simple and fails to handle cases where the vsetlvi is located before the AVL def. This patch fixes this by using LiveIntervals::extendToIndices instead which can handle these cases. (The crash that this fixes is separate from the crash in llvm#97264)

A regression, but only since llvm#96200. This just brings us back to square one.

…TVLI (llvm#97512) In llvm#96200 we handled extending AVL LiveIntervals across basic blocks, which fixed a crash in a test case in 133ab9a. This was done by manually adding a single segment to the LiveInterval to extend it from AVL def -> inserted vsetvli, but in hindsight this was too simple and fails to handle cases where the vsetlvi is located before the AVL def. This patch fixes this by using LiveIntervals::extendToIndices instead which can handle these cases. (The crash that this fixes is separate from the crash in llvm#97264)

@test6

…llvm#96200) If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli. This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles. We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI. We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges. This patch does two interlinked things: 1) It adds this AVL "peeking" to getInfoForVSETVLI 2) It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL which can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6 Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI. We also need to update how we update LiveIntervals in insertVSETVLI, as we can now end up needing to extend the LiveRange of the AVL across blocks.

lukel97 requested review from BeMg, preames and wangpc-pp June 20, 2024 14:39

llvmbot added the backend:RISC-V label Jun 20, 2024

lukel97 commented Jun 20, 2024

View reviewed changes

lukel97 mentioned this pull request Jun 20, 2024

[RISCV] Defer forming x0,x0 vsetvlis until after insertion #89089

Draft

lukel97 force-pushed the insert-vsetvli-loosen-avl-peep branch from bac3107 to c9b4345 Compare June 20, 2024 15:04

lukel97 added 2 commits June 21, 2024 12:46

Update lsr-drop-solution.ll

eaa1f53

lukel97 force-pushed the insert-vsetvli-loosen-avl-peep branch from c9b4345 to eaa1f53 Compare June 21, 2024 04:59

llvmbot added the llvm:transforms label Jun 21, 2024

BeMg approved these changes Jun 22, 2024

View reviewed changes

lukel97 merged commit eb76bc3 into llvm:main Jun 23, 2024
5 of 8 checks passed

lukel97 mentioned this pull request Jul 3, 2024

[RISCV] Use LiveIntervals::extendToIndices to extend AVL in insertVSETVLI #97512

Merged

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 4, 2024

Update LSR test case

5d4745a

A regression, but only since llvm#96200. This just brings us back to square one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers #96200

[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers #96200

lukel97 commented Jun 20, 2024 •

edited

Loading

llvmbot commented Jun 20, 2024 •

edited

Loading

lukel97 Jun 20, 2024

wangpc-pp Jun 21, 2024

topperc Jun 24, 2024

BeMg commented Jun 21, 2024

lukel97 commented Jun 21, 2024

BeMg left a comment

llvm-ci commented Jun 23, 2024

llvm-ci commented Jun 23, 2024

[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers #96200

[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers #96200

Conversation

lukel97 commented Jun 20, 2024 • edited Loading

llvmbot commented Jun 20, 2024 • edited Loading

lukel97 Jun 20, 2024

Choose a reason for hiding this comment

wangpc-pp Jun 21, 2024

Choose a reason for hiding this comment

topperc Jun 24, 2024

Choose a reason for hiding this comment

BeMg commented Jun 21, 2024

lukel97 commented Jun 21, 2024

BeMg left a comment

Choose a reason for hiding this comment

llvm-ci commented Jun 23, 2024

llvm-ci commented Jun 23, 2024

lukel97 commented Jun 20, 2024 •

edited

Loading

llvmbot commented Jun 20, 2024 •

edited

Loading