-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Emit VP strided loads/stores in RISCVGatherScatterLowering #98111
Conversation
This combine is a duplication of the transform in RISCVGatherScatterLowering but at the SelectionDAG level, so similarly to llvm#98111 we can replace the use of riscv_masked_strided_load with a VP strided load. Unlike llvm#98111 we don't require llvm#97800 or llvm#97798 since it only operates on fixed vectors with a non-zero stride.
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
Outdated
Show resolved
Hide resolved
This combine is a duplication of the transform in RISCVGatherScatterLowering but at the SelectionDAG level, so similarly to #98111 we can replace the use of riscv_masked_strided_load with a VP strided load. Unlike #98111 we don't require #97800 or #97798 since it only operates on fixed vectors with a non-zero stride.
After llvm#98112 and llvm#98111 this should be the last use of riscv_masked_strided_load. The diff is due to vp_load not having the same generic combine for bitcasts where `(conv (load x)) -> (load (conv*)x)`. I don't think this makes much of a difference on RVV, and it doesn't seem to affect other patterns.
It looks like all the mentioned changes have landed, is this ready for a rebase? |
#98140 was reverted due to the tests getting out sync, once that's relanded though this should be good to go. |
This combine is a duplication of the transform in RISCVGatherScatterLowering but at the SelectionDAG level, so similarly to llvm#98111 we can replace the use of riscv_masked_strided_load with a VP strided load. Unlike llvm#98111 we don't require llvm#97800 or llvm#97798 since it only operates on fixed vectors with a non-zero stride.
This patch makes zero strided VP loads always be expanded to a scalar load and splat even if +optimized-zero-stride-load is present. Expanding it allows more .vx splat patterns to be matched, which is needed to prevent regressions in llvm#98111. If the feature is present, RISCVISelDAGToDAG will combine it back to a zero strided load. The RV32 test diff also shows how need to emit a zero strided load either way after expanding an SEW=64 strided load. We could maybe fix this in a later patch by not doing the expand if SEW>XLEN.
This patch makes zero strided VP loads always be expanded to a scalar load and splat even if +optimized-zero-stride-load is present. Expanding it allows more .vx splat patterns to be matched, which is needed to prevent regressions in #98111. If the feature is present, RISCVISelDAGToDAG will combine it back to a zero strided load. The RV32 test diff also shows how need to emit a zero strided load either way after expanding an SEW=64 strided load. We could maybe fix this in a later patch by not doing the expand if SEW>XLEN.
d0899ba
to
e223af2
Compare
@llvm/pr-subscribers-llvm-ir Author: Luke Lau (lukel97) ChangesRISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800 Patch is 69.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98111.diff 8 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsRISCV.td b/llvm/include/llvm/IR/IntrinsicsRISCV.td
index 2da154c300344..57702aa50774b 100644
--- a/llvm/include/llvm/IR/IntrinsicsRISCV.td
+++ b/llvm/include/llvm/IR/IntrinsicsRISCV.td
@@ -1710,20 +1710,6 @@ let TargetPrefix = "riscv" in {
defm vsuxseg # nf : RISCVISegStore<nf>;
}
- // Strided loads/stores for fixed vectors.
- def int_riscv_masked_strided_load
- : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
- [LLVMMatchType<0>, llvm_anyptr_ty,
- llvm_anyint_ty,
- LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
- [NoCapture<ArgIndex<1>>, IntrReadMem]>;
- def int_riscv_masked_strided_store
- : DefaultAttrsIntrinsic<[],
- [llvm_anyvector_ty, llvm_anyptr_ty,
- llvm_anyint_ty,
- LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
- [NoCapture<ArgIndex<1>>, IntrWriteMem]>;
-
// Segment loads/stores for fixed vectors.
foreach nf = [2, 3, 4, 5, 6, 7, 8] in {
def int_riscv_seg # nf # _load
diff --git a/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp b/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
index d9971791a2cfa..881be28bfe79e 100644
--- a/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
@@ -515,17 +515,23 @@ bool RISCVGatherScatterLowering::tryCreateStridedLoadStore(IntrinsicInst *II,
Builder.SetInsertPoint(II);
+ Value *EVL = Builder.CreateElementCount(
+ IntegerType::get(Ctx, 32), cast<VectorType>(DataType)->getElementCount());
+
CallInst *Call;
- if (II->getIntrinsicID() == Intrinsic::masked_gather)
+ if (II->getIntrinsicID() == Intrinsic::masked_gather) {
Call = Builder.CreateIntrinsic(
- Intrinsic::riscv_masked_strided_load,
+ Intrinsic::experimental_vp_strided_load,
{DataType, BasePtr->getType(), Stride->getType()},
- {II->getArgOperand(3), BasePtr, Stride, II->getArgOperand(2)});
- else
+ {BasePtr, Stride, II->getArgOperand(2), EVL});
+ Call = Builder.CreateIntrinsic(
+ Intrinsic::vp_select, {DataType},
+ {II->getOperand(2), Call, II->getArgOperand(3), EVL});
+ } else
Call = Builder.CreateIntrinsic(
- Intrinsic::riscv_masked_strided_store,
+ Intrinsic::experimental_vp_strided_store,
{DataType, BasePtr->getType(), Stride->getType()},
- {II->getArgOperand(0), BasePtr, Stride, II->getArgOperand(3)});
+ {II->getArgOperand(0), BasePtr, Stride, II->getArgOperand(3), EVL});
Call->takeName(II);
II->replaceAllUsesWith(Call);
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 8b5e56bff4097..23185ec5116de 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1621,12 +1621,6 @@ bool RISCVTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
Info.flags = MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
MachineMemOperand::MOVolatile;
return true;
- case Intrinsic::riscv_masked_strided_load:
- return SetRVVLoadStoreInfo(/*PtrOp*/ 1, /*IsStore*/ false,
- /*IsUnitStrided*/ false);
- case Intrinsic::riscv_masked_strided_store:
- return SetRVVLoadStoreInfo(/*PtrOp*/ 1, /*IsStore*/ true,
- /*IsUnitStrided*/ false);
case Intrinsic::riscv_seg2_load:
case Intrinsic::riscv_seg3_load:
case Intrinsic::riscv_seg4_load:
@@ -9401,81 +9395,6 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
switch (IntNo) {
default:
break;
- case Intrinsic::riscv_masked_strided_load: {
- SDLoc DL(Op);
- MVT XLenVT = Subtarget.getXLenVT();
-
- // If the mask is known to be all ones, optimize to an unmasked intrinsic;
- // the selection of the masked intrinsics doesn't do this for us.
- SDValue Mask = Op.getOperand(5);
- bool IsUnmasked = ISD::isConstantSplatVectorAllOnes(Mask.getNode());
-
- MVT VT = Op->getSimpleValueType(0);
- MVT ContainerVT = VT;
- if (VT.isFixedLengthVector())
- ContainerVT = getContainerForFixedLengthVector(VT);
-
- SDValue PassThru = Op.getOperand(2);
- if (!IsUnmasked) {
- MVT MaskVT = getMaskTypeFor(ContainerVT);
- if (VT.isFixedLengthVector()) {
- Mask = convertToScalableVector(MaskVT, Mask, DAG, Subtarget);
- PassThru = convertToScalableVector(ContainerVT, PassThru, DAG, Subtarget);
- }
- }
-
- auto *Load = cast<MemIntrinsicSDNode>(Op);
- SDValue VL = getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget).second;
- SDValue Ptr = Op.getOperand(3);
- SDValue Stride = Op.getOperand(4);
- SDValue Result, Chain;
-
- // TODO: We restrict this to unmasked loads currently in consideration of
- // the complexity of handling all falses masks.
- MVT ScalarVT = ContainerVT.getVectorElementType();
- if (IsUnmasked && isNullConstant(Stride) && ContainerVT.isInteger()) {
- SDValue ScalarLoad =
- DAG.getExtLoad(ISD::EXTLOAD, DL, XLenVT, Load->getChain(), Ptr,
- ScalarVT, Load->getMemOperand());
- Chain = ScalarLoad.getValue(1);
- Result = lowerScalarSplat(SDValue(), ScalarLoad, VL, ContainerVT, DL, DAG,
- Subtarget);
- } else if (IsUnmasked && isNullConstant(Stride) && isTypeLegal(ScalarVT)) {
- SDValue ScalarLoad = DAG.getLoad(ScalarVT, DL, Load->getChain(), Ptr,
- Load->getMemOperand());
- Chain = ScalarLoad.getValue(1);
- Result = DAG.getSplat(ContainerVT, DL, ScalarLoad);
- } else {
- SDValue IntID = DAG.getTargetConstant(
- IsUnmasked ? Intrinsic::riscv_vlse : Intrinsic::riscv_vlse_mask, DL,
- XLenVT);
-
- SmallVector<SDValue, 8> Ops{Load->getChain(), IntID};
- if (IsUnmasked)
- Ops.push_back(DAG.getUNDEF(ContainerVT));
- else
- Ops.push_back(PassThru);
- Ops.push_back(Ptr);
- Ops.push_back(Stride);
- if (!IsUnmasked)
- Ops.push_back(Mask);
- Ops.push_back(VL);
- if (!IsUnmasked) {
- SDValue Policy =
- DAG.getTargetConstant(RISCVII::TAIL_AGNOSTIC, DL, XLenVT);
- Ops.push_back(Policy);
- }
-
- SDVTList VTs = DAG.getVTList({ContainerVT, MVT::Other});
- Result =
- DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, VTs, Ops,
- Load->getMemoryVT(), Load->getMemOperand());
- Chain = Result.getValue(1);
- }
- if (VT.isFixedLengthVector())
- Result = convertFromScalableVector(VT, Result, DAG, Subtarget);
- return DAG.getMergeValues({Result, Chain}, DL);
- }
case Intrinsic::riscv_seg2_load:
case Intrinsic::riscv_seg3_load:
case Intrinsic::riscv_seg4_load:
@@ -9555,47 +9474,6 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_VOID(SDValue Op,
switch (IntNo) {
default:
break;
- case Intrinsic::riscv_masked_strided_store: {
- SDLoc DL(Op);
- MVT XLenVT = Subtarget.getXLenVT();
-
- // If the mask is known to be all ones, optimize to an unmasked intrinsic;
- // the selection of the masked intrinsics doesn't do this for us.
- SDValue Mask = Op.getOperand(5);
- bool IsUnmasked = ISD::isConstantSplatVectorAllOnes(Mask.getNode());
-
- SDValue Val = Op.getOperand(2);
- MVT VT = Val.getSimpleValueType();
- MVT ContainerVT = VT;
- if (VT.isFixedLengthVector()) {
- ContainerVT = getContainerForFixedLengthVector(VT);
- Val = convertToScalableVector(ContainerVT, Val, DAG, Subtarget);
- }
- if (!IsUnmasked) {
- MVT MaskVT = getMaskTypeFor(ContainerVT);
- if (VT.isFixedLengthVector())
- Mask = convertToScalableVector(MaskVT, Mask, DAG, Subtarget);
- }
-
- SDValue VL = getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget).second;
-
- SDValue IntID = DAG.getTargetConstant(
- IsUnmasked ? Intrinsic::riscv_vsse : Intrinsic::riscv_vsse_mask, DL,
- XLenVT);
-
- auto *Store = cast<MemIntrinsicSDNode>(Op);
- SmallVector<SDValue, 8> Ops{Store->getChain(), IntID};
- Ops.push_back(Val);
- Ops.push_back(Op.getOperand(3)); // Ptr
- Ops.push_back(Op.getOperand(4)); // Stride
- if (!IsUnmasked)
- Ops.push_back(Mask);
- Ops.push_back(VL);
-
- return DAG.getMemIntrinsicNode(ISD::INTRINSIC_VOID, DL, Store->getVTList(),
- Ops, Store->getMemoryVT(),
- Store->getMemOperand());
- }
case Intrinsic::riscv_seg2_store:
case Intrinsic::riscv_seg3_store:
case Intrinsic::riscv_seg4_store:
@@ -17509,43 +17387,6 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
// By default we do not combine any intrinsic.
default:
return SDValue();
- case Intrinsic::riscv_masked_strided_load: {
- MVT VT = N->getSimpleValueType(0);
- auto *Load = cast<MemIntrinsicSDNode>(N);
- SDValue PassThru = N->getOperand(2);
- SDValue Base = N->getOperand(3);
- SDValue Stride = N->getOperand(4);
- SDValue Mask = N->getOperand(5);
-
- // If the stride is equal to the element size in bytes, we can use
- // a masked.load.
- const unsigned ElementSize = VT.getScalarStoreSize();
- if (auto *StrideC = dyn_cast<ConstantSDNode>(Stride);
- StrideC && StrideC->getZExtValue() == ElementSize)
- return DAG.getMaskedLoad(VT, DL, Load->getChain(), Base,
- DAG.getUNDEF(XLenVT), Mask, PassThru,
- Load->getMemoryVT(), Load->getMemOperand(),
- ISD::UNINDEXED, ISD::NON_EXTLOAD);
- return SDValue();
- }
- case Intrinsic::riscv_masked_strided_store: {
- auto *Store = cast<MemIntrinsicSDNode>(N);
- SDValue Value = N->getOperand(2);
- SDValue Base = N->getOperand(3);
- SDValue Stride = N->getOperand(4);
- SDValue Mask = N->getOperand(5);
-
- // If the stride is equal to the element size in bytes, we can use
- // a masked.store.
- const unsigned ElementSize = Value.getValueType().getScalarStoreSize();
- if (auto *StrideC = dyn_cast<ConstantSDNode>(Stride);
- StrideC && StrideC->getZExtValue() == ElementSize)
- return DAG.getMaskedStore(Store->getChain(), DL, Value, Base,
- DAG.getUNDEF(XLenVT), Mask,
- Value.getValueType(), Store->getMemOperand(),
- ISD::UNINDEXED, false);
- return SDValue();
- }
case Intrinsic::riscv_vcpop:
case Intrinsic::riscv_vcpop_mask:
case Intrinsic::riscv_vfirst:
diff --git a/llvm/test/CodeGen/RISCV/pr89833.ll b/llvm/test/CodeGen/RISCV/pr89833.ll
deleted file mode 100644
index 54a985040e758..0000000000000
--- a/llvm/test/CodeGen/RISCV/pr89833.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v | FileCheck %s
-
-declare void @llvm.riscv.masked.strided.store.nxv16i8.p0.i64(<vscale x 16 x i8>, ptr, i64, <vscale x 16 x i1>)
-
-define void @test(<vscale x 16 x i16> %value, <vscale x 16 x i1> %mask) {
-; CHECK-LABEL: test:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v12, v8, 0
-; CHECK-NEXT: vse8.v v12, (zero), v0.t
-; CHECK-NEXT: ret
- %trunc = trunc <vscale x 16 x i16> %value to <vscale x 16 x i8>
- call void @llvm.riscv.masked.strided.store.nxv16i8.p0.i64(<vscale x 16 x i8> %trunc, ptr null, i64 1, <vscale x 16 x i1> %mask)
- ret void
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll
index ab5885a604443..d723c2f6df1af 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll
@@ -16,7 +16,8 @@ define void @gather(ptr noalias nocapture %A, ptr noalias nocapture readonly %B)
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> undef, ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> undef, i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -58,7 +59,8 @@ define void @gather_masked(ptr noalias nocapture %A, ptr noalias nocapture reado
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> [[MASKEDOFF:%.*]], ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> [[MASKEDOFF:%.*]], i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -100,7 +102,8 @@ define void @gather_negative_stride(ptr noalias nocapture %A, ptr noalias nocapt
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 155, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> undef, ptr [[TMP0]], i64 -5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 -5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> undef, i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -142,7 +145,8 @@ define void @gather_zero_stride(ptr noalias nocapture %A, ptr noalias nocapture
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> undef, ptr [[TMP0]], i64 0, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 0, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> undef, i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -190,9 +194,10 @@ define void @scatter(ptr noalias nocapture %A, ptr noalias nocapture readonly %B
; CHECK-NEXT: [[I:%.*]] = getelementptr inbounds i8, ptr [[B:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I]], align 1
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[W...
[truncated]
|
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesRISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800 Patch is 69.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98111.diff 8 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsRISCV.td b/llvm/include/llvm/IR/IntrinsicsRISCV.td
index 2da154c300344..57702aa50774b 100644
--- a/llvm/include/llvm/IR/IntrinsicsRISCV.td
+++ b/llvm/include/llvm/IR/IntrinsicsRISCV.td
@@ -1710,20 +1710,6 @@ let TargetPrefix = "riscv" in {
defm vsuxseg # nf : RISCVISegStore<nf>;
}
- // Strided loads/stores for fixed vectors.
- def int_riscv_masked_strided_load
- : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
- [LLVMMatchType<0>, llvm_anyptr_ty,
- llvm_anyint_ty,
- LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
- [NoCapture<ArgIndex<1>>, IntrReadMem]>;
- def int_riscv_masked_strided_store
- : DefaultAttrsIntrinsic<[],
- [llvm_anyvector_ty, llvm_anyptr_ty,
- llvm_anyint_ty,
- LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
- [NoCapture<ArgIndex<1>>, IntrWriteMem]>;
-
// Segment loads/stores for fixed vectors.
foreach nf = [2, 3, 4, 5, 6, 7, 8] in {
def int_riscv_seg # nf # _load
diff --git a/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp b/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
index d9971791a2cfa..881be28bfe79e 100644
--- a/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
@@ -515,17 +515,23 @@ bool RISCVGatherScatterLowering::tryCreateStridedLoadStore(IntrinsicInst *II,
Builder.SetInsertPoint(II);
+ Value *EVL = Builder.CreateElementCount(
+ IntegerType::get(Ctx, 32), cast<VectorType>(DataType)->getElementCount());
+
CallInst *Call;
- if (II->getIntrinsicID() == Intrinsic::masked_gather)
+ if (II->getIntrinsicID() == Intrinsic::masked_gather) {
Call = Builder.CreateIntrinsic(
- Intrinsic::riscv_masked_strided_load,
+ Intrinsic::experimental_vp_strided_load,
{DataType, BasePtr->getType(), Stride->getType()},
- {II->getArgOperand(3), BasePtr, Stride, II->getArgOperand(2)});
- else
+ {BasePtr, Stride, II->getArgOperand(2), EVL});
+ Call = Builder.CreateIntrinsic(
+ Intrinsic::vp_select, {DataType},
+ {II->getOperand(2), Call, II->getArgOperand(3), EVL});
+ } else
Call = Builder.CreateIntrinsic(
- Intrinsic::riscv_masked_strided_store,
+ Intrinsic::experimental_vp_strided_store,
{DataType, BasePtr->getType(), Stride->getType()},
- {II->getArgOperand(0), BasePtr, Stride, II->getArgOperand(3)});
+ {II->getArgOperand(0), BasePtr, Stride, II->getArgOperand(3), EVL});
Call->takeName(II);
II->replaceAllUsesWith(Call);
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 8b5e56bff4097..23185ec5116de 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1621,12 +1621,6 @@ bool RISCVTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
Info.flags = MachineMemOperand::MOLoad | MachineMemOperand::MOStore |
MachineMemOperand::MOVolatile;
return true;
- case Intrinsic::riscv_masked_strided_load:
- return SetRVVLoadStoreInfo(/*PtrOp*/ 1, /*IsStore*/ false,
- /*IsUnitStrided*/ false);
- case Intrinsic::riscv_masked_strided_store:
- return SetRVVLoadStoreInfo(/*PtrOp*/ 1, /*IsStore*/ true,
- /*IsUnitStrided*/ false);
case Intrinsic::riscv_seg2_load:
case Intrinsic::riscv_seg3_load:
case Intrinsic::riscv_seg4_load:
@@ -9401,81 +9395,6 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_W_CHAIN(SDValue Op,
switch (IntNo) {
default:
break;
- case Intrinsic::riscv_masked_strided_load: {
- SDLoc DL(Op);
- MVT XLenVT = Subtarget.getXLenVT();
-
- // If the mask is known to be all ones, optimize to an unmasked intrinsic;
- // the selection of the masked intrinsics doesn't do this for us.
- SDValue Mask = Op.getOperand(5);
- bool IsUnmasked = ISD::isConstantSplatVectorAllOnes(Mask.getNode());
-
- MVT VT = Op->getSimpleValueType(0);
- MVT ContainerVT = VT;
- if (VT.isFixedLengthVector())
- ContainerVT = getContainerForFixedLengthVector(VT);
-
- SDValue PassThru = Op.getOperand(2);
- if (!IsUnmasked) {
- MVT MaskVT = getMaskTypeFor(ContainerVT);
- if (VT.isFixedLengthVector()) {
- Mask = convertToScalableVector(MaskVT, Mask, DAG, Subtarget);
- PassThru = convertToScalableVector(ContainerVT, PassThru, DAG, Subtarget);
- }
- }
-
- auto *Load = cast<MemIntrinsicSDNode>(Op);
- SDValue VL = getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget).second;
- SDValue Ptr = Op.getOperand(3);
- SDValue Stride = Op.getOperand(4);
- SDValue Result, Chain;
-
- // TODO: We restrict this to unmasked loads currently in consideration of
- // the complexity of handling all falses masks.
- MVT ScalarVT = ContainerVT.getVectorElementType();
- if (IsUnmasked && isNullConstant(Stride) && ContainerVT.isInteger()) {
- SDValue ScalarLoad =
- DAG.getExtLoad(ISD::EXTLOAD, DL, XLenVT, Load->getChain(), Ptr,
- ScalarVT, Load->getMemOperand());
- Chain = ScalarLoad.getValue(1);
- Result = lowerScalarSplat(SDValue(), ScalarLoad, VL, ContainerVT, DL, DAG,
- Subtarget);
- } else if (IsUnmasked && isNullConstant(Stride) && isTypeLegal(ScalarVT)) {
- SDValue ScalarLoad = DAG.getLoad(ScalarVT, DL, Load->getChain(), Ptr,
- Load->getMemOperand());
- Chain = ScalarLoad.getValue(1);
- Result = DAG.getSplat(ContainerVT, DL, ScalarLoad);
- } else {
- SDValue IntID = DAG.getTargetConstant(
- IsUnmasked ? Intrinsic::riscv_vlse : Intrinsic::riscv_vlse_mask, DL,
- XLenVT);
-
- SmallVector<SDValue, 8> Ops{Load->getChain(), IntID};
- if (IsUnmasked)
- Ops.push_back(DAG.getUNDEF(ContainerVT));
- else
- Ops.push_back(PassThru);
- Ops.push_back(Ptr);
- Ops.push_back(Stride);
- if (!IsUnmasked)
- Ops.push_back(Mask);
- Ops.push_back(VL);
- if (!IsUnmasked) {
- SDValue Policy =
- DAG.getTargetConstant(RISCVII::TAIL_AGNOSTIC, DL, XLenVT);
- Ops.push_back(Policy);
- }
-
- SDVTList VTs = DAG.getVTList({ContainerVT, MVT::Other});
- Result =
- DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, VTs, Ops,
- Load->getMemoryVT(), Load->getMemOperand());
- Chain = Result.getValue(1);
- }
- if (VT.isFixedLengthVector())
- Result = convertFromScalableVector(VT, Result, DAG, Subtarget);
- return DAG.getMergeValues({Result, Chain}, DL);
- }
case Intrinsic::riscv_seg2_load:
case Intrinsic::riscv_seg3_load:
case Intrinsic::riscv_seg4_load:
@@ -9555,47 +9474,6 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_VOID(SDValue Op,
switch (IntNo) {
default:
break;
- case Intrinsic::riscv_masked_strided_store: {
- SDLoc DL(Op);
- MVT XLenVT = Subtarget.getXLenVT();
-
- // If the mask is known to be all ones, optimize to an unmasked intrinsic;
- // the selection of the masked intrinsics doesn't do this for us.
- SDValue Mask = Op.getOperand(5);
- bool IsUnmasked = ISD::isConstantSplatVectorAllOnes(Mask.getNode());
-
- SDValue Val = Op.getOperand(2);
- MVT VT = Val.getSimpleValueType();
- MVT ContainerVT = VT;
- if (VT.isFixedLengthVector()) {
- ContainerVT = getContainerForFixedLengthVector(VT);
- Val = convertToScalableVector(ContainerVT, Val, DAG, Subtarget);
- }
- if (!IsUnmasked) {
- MVT MaskVT = getMaskTypeFor(ContainerVT);
- if (VT.isFixedLengthVector())
- Mask = convertToScalableVector(MaskVT, Mask, DAG, Subtarget);
- }
-
- SDValue VL = getDefaultVLOps(VT, ContainerVT, DL, DAG, Subtarget).second;
-
- SDValue IntID = DAG.getTargetConstant(
- IsUnmasked ? Intrinsic::riscv_vsse : Intrinsic::riscv_vsse_mask, DL,
- XLenVT);
-
- auto *Store = cast<MemIntrinsicSDNode>(Op);
- SmallVector<SDValue, 8> Ops{Store->getChain(), IntID};
- Ops.push_back(Val);
- Ops.push_back(Op.getOperand(3)); // Ptr
- Ops.push_back(Op.getOperand(4)); // Stride
- if (!IsUnmasked)
- Ops.push_back(Mask);
- Ops.push_back(VL);
-
- return DAG.getMemIntrinsicNode(ISD::INTRINSIC_VOID, DL, Store->getVTList(),
- Ops, Store->getMemoryVT(),
- Store->getMemOperand());
- }
case Intrinsic::riscv_seg2_store:
case Intrinsic::riscv_seg3_store:
case Intrinsic::riscv_seg4_store:
@@ -17509,43 +17387,6 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
// By default we do not combine any intrinsic.
default:
return SDValue();
- case Intrinsic::riscv_masked_strided_load: {
- MVT VT = N->getSimpleValueType(0);
- auto *Load = cast<MemIntrinsicSDNode>(N);
- SDValue PassThru = N->getOperand(2);
- SDValue Base = N->getOperand(3);
- SDValue Stride = N->getOperand(4);
- SDValue Mask = N->getOperand(5);
-
- // If the stride is equal to the element size in bytes, we can use
- // a masked.load.
- const unsigned ElementSize = VT.getScalarStoreSize();
- if (auto *StrideC = dyn_cast<ConstantSDNode>(Stride);
- StrideC && StrideC->getZExtValue() == ElementSize)
- return DAG.getMaskedLoad(VT, DL, Load->getChain(), Base,
- DAG.getUNDEF(XLenVT), Mask, PassThru,
- Load->getMemoryVT(), Load->getMemOperand(),
- ISD::UNINDEXED, ISD::NON_EXTLOAD);
- return SDValue();
- }
- case Intrinsic::riscv_masked_strided_store: {
- auto *Store = cast<MemIntrinsicSDNode>(N);
- SDValue Value = N->getOperand(2);
- SDValue Base = N->getOperand(3);
- SDValue Stride = N->getOperand(4);
- SDValue Mask = N->getOperand(5);
-
- // If the stride is equal to the element size in bytes, we can use
- // a masked.store.
- const unsigned ElementSize = Value.getValueType().getScalarStoreSize();
- if (auto *StrideC = dyn_cast<ConstantSDNode>(Stride);
- StrideC && StrideC->getZExtValue() == ElementSize)
- return DAG.getMaskedStore(Store->getChain(), DL, Value, Base,
- DAG.getUNDEF(XLenVT), Mask,
- Value.getValueType(), Store->getMemOperand(),
- ISD::UNINDEXED, false);
- return SDValue();
- }
case Intrinsic::riscv_vcpop:
case Intrinsic::riscv_vcpop_mask:
case Intrinsic::riscv_vfirst:
diff --git a/llvm/test/CodeGen/RISCV/pr89833.ll b/llvm/test/CodeGen/RISCV/pr89833.ll
deleted file mode 100644
index 54a985040e758..0000000000000
--- a/llvm/test/CodeGen/RISCV/pr89833.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v | FileCheck %s
-
-declare void @llvm.riscv.masked.strided.store.nxv16i8.p0.i64(<vscale x 16 x i8>, ptr, i64, <vscale x 16 x i1>)
-
-define void @test(<vscale x 16 x i16> %value, <vscale x 16 x i1> %mask) {
-; CHECK-LABEL: test:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma
-; CHECK-NEXT: vnsrl.wi v12, v8, 0
-; CHECK-NEXT: vse8.v v12, (zero), v0.t
-; CHECK-NEXT: ret
- %trunc = trunc <vscale x 16 x i16> %value to <vscale x 16 x i8>
- call void @llvm.riscv.masked.strided.store.nxv16i8.p0.i64(<vscale x 16 x i8> %trunc, ptr null, i64 1, <vscale x 16 x i1> %mask)
- ret void
-}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll
index ab5885a604443..d723c2f6df1af 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store.ll
@@ -16,7 +16,8 @@ define void @gather(ptr noalias nocapture %A, ptr noalias nocapture readonly %B)
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> undef, ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> undef, i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -58,7 +59,8 @@ define void @gather_masked(ptr noalias nocapture %A, ptr noalias nocapture reado
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> [[MASKEDOFF:%.*]], ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 5, <32 x i1> <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> [[MASKEDOFF:%.*]], i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -100,7 +102,8 @@ define void @gather_negative_stride(ptr noalias nocapture %A, ptr noalias nocapt
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 155, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> undef, ptr [[TMP0]], i64 -5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 -5, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> undef, i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -142,7 +145,8 @@ define void @gather_zero_stride(ptr noalias nocapture %A, ptr noalias nocapture
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND_SCALAR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[VEC_IND_NEXT_SCALAR:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[B:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.riscv.masked.strided.load.v32i8.p0.i64(<32 x i8> undef, ptr [[TMP0]], i64 0, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
+; CHECK-NEXT: [[TMP1:%.*]] = call <32 x i8> @llvm.experimental.vp.strided.load.v32i8.p0.i64(ptr [[TMP0]], i64 0, <32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i32 32)
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <32 x i8> @llvm.vp.select.v32i8(<32 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <32 x i8> [[TMP1]], <32 x i8> undef, i32 32)
; CHECK-NEXT: [[I2:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I2]], align 1
; CHECK-NEXT: [[I4:%.*]] = add <32 x i8> [[WIDE_LOAD]], [[WIDE_MASKED_GATHER]]
@@ -190,9 +194,10 @@ define void @scatter(ptr noalias nocapture %A, ptr noalias nocapture readonly %B
; CHECK-NEXT: [[I:%.*]] = getelementptr inbounds i8, ptr [[B:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <32 x i8>, ptr [[I]], align 1
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[VEC_IND_SCALAR]]
-; CHECK-NEXT: [[W...
[truncated]
|
Rebased and updated to remove |
@@ -116,7 +116,7 @@ define void @stride_one_store(i64 %n, ptr %p) { | |||
; RV64: # %bb.0: | |||
; RV64-NEXT: vsetvli a0, zero, e64, m1, ta, ma | |||
; RV64-NEXT: vmv.v.i v8, 0 | |||
; RV64-NEXT: vs1r.v v8, (a1) | |||
; RV64-NEXT: vse64.v v8, (a1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because we're missing a generic DAG combine from unit strided VP load/store -> regular load/store. I'm not sure if we have any precedent for canonicalising VP ops to non-VP ops already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to fix this? The semantics is the same, the difference is vs1r doesn't depend on vl while vse64 does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess the added dependency on vl is a regression but I'm not overly worried about it, since I don't think the loop vectorizer will emit a unit strided masked gather in the first place. But still happy to address it if others want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this, maybe just add a FIXME here? The extra complexity may not be worthy to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying we need to fix this before landing, but we do need to fix this. There's an existing combine for converting an element strided riscv_maked_strided_* into a masked_*. We should replicate that over the vp.strided form before calling the transition complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(replying to self to continue a thought)
The needed conversion would be from vp.strided.load to a vp.load not a normal or masked load. Once that was done, we could in turn conversion to a normal load, but that's a separate step. The first step is simply getting rid of the "strided" part of the intrinsic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head there's already a generic unit strided VP load - > VP load combine, my original comment should have said "full length AVL and all ones mask"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. We're getting to the vse64.v, just not the vs1r. So it's the second step we're missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than do a vp_load -> load combine, I've instead replaced the whole register load patterns with a more generic peephole in #100116. I think that should fix this case.
After thinking about it for a bit, I'm a little hesitant to start converting vp ops to non-vp ops, since it goes somewhat against the roadmap goal to lift existing combines to vp ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -116,7 +116,7 @@ define void @stride_one_store(i64 %n, ptr %p) { | |||
; RV64: # %bb.0: | |||
; RV64-NEXT: vsetvli a0, zero, e64, m1, ta, ma | |||
; RV64-NEXT: vmv.v.i v8, 0 | |||
; RV64-NEXT: vs1r.v v8, (a1) | |||
; RV64-NEXT: vse64.v v8, (a1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this, maybe just add a FIXME here? The extra complexity may not be worthy to fix this.
I'll hold off until the LLVM 19 release branch is cut before landing this, I'm not worried about it but it's not particularly urgent either |
ab71f0a
to
44f5687
Compare
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
44f5687
to
23db2ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Shall we backport these commits? |
This patch should be more or less NFC so I don't think it's important to backport it. #100116 might provide some minor optimisations though if people want to see that in LLVM 19 |
On second thought though this does remove the @llvm.riscv.masked.strided.load and @llvm.riscv.masked.strided.store intrinsics which is breaking. Maybe there's an argument for backporting so we don't have people using them between now and LLVM 20 |
I actively oppose backporting this. Backporting has a risk, and there's no obvious benefit to outweigh that. |
If you believe that somebody external is going to use them or is using them, then we should add AutoUpgrade.cpp support for upgrading them. Independent of whether we backport or not. |
I don't think anyone would start using them given that it's not documented and they could just use |
…98111) Summary: RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800. For loads we also use a vp_select to get passthru (mask undisturbed) behaviour Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250735
…lvm#98111) RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path. riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800. For loads we also use a vp_select to get passthru (mask undisturbed) behaviour
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after #98131 and #98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.
riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after #97800.
For loads we also use a vp_select to get passthru (mask undisturbed) behaviour