Skip to content

Commit

Permalink
[AArch64] Add invalid 1 x vscale costs for reductions and reduction-o…
Browse files Browse the repository at this point in the history
…perations. (#102105)

The code-generator is currently not able to handle scalable vectors of
<vscale x 1 x eltty>. The usual "fix" for this until it is supported is
to mark the costs of loads/stores with an invalid cost, preventing the
vectorizer from vectorizing at those factors. But on rare occasions
loops do not contain load/stores, only reductions.

So whilst this is still unsupported return an invalid cost to avoid
selecting vscale x 1 VFs. The cost of a reduction is not currently used
by the vectorizer so this adds the cost to the add/mul/and/or/xor or
min/max that should feed the reduction. It includes reduction costs
too, for completeness. This change will be removed when code-generation
for these types is sufficiently reliable.

Fixes #99760
  • Loading branch information
davemgreen authored Aug 9, 2024
1 parent 1953629 commit 0b745a1
Show file tree
Hide file tree
Showing 6 changed files with 107 additions and 0 deletions.
32 changes: 32 additions & 0 deletions llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,15 @@ static InstructionCost getHistogramCost(const IntrinsicCostAttributes &ICA) {
InstructionCost
AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) {
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
auto *RetTy = ICA.getReturnType();
if (auto *VTy = dyn_cast<ScalableVectorType>(RetTy))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

switch (ICA.getID()) {
case Intrinsic::experimental_vector_histogram_add:
if (!ST->hasSVE2())
Expand Down Expand Up @@ -3070,6 +3078,14 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
ArrayRef<const Value *> Args,
const Instruction *CxtI) {

// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (auto *VTy = dyn_cast<ScalableVectorType>(Ty))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

// TODO: Handle more cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Expand Down Expand Up @@ -3844,6 +3860,14 @@ InstructionCost
AArch64TTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
FastMathFlags FMF,
TTI::TargetCostKind CostKind) {
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (auto *VTy = dyn_cast<ScalableVectorType>(Ty))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);

if (LT.second.getScalarType() == MVT::f16 && !ST->hasFullFP16())
Expand Down Expand Up @@ -3888,6 +3912,14 @@ InstructionCost
AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *ValTy,
std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind) {
// The code-generator is currently not able to handle scalable vectors
// of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
// it. This change will be removed when code-generation for these types is
// sufficiently reliable.
if (auto *VTy = dyn_cast<ScalableVectorType>(ValTy))
if (VTy->getElementCount() == ElementCount::getScalable(1))
return InstructionCost::getInvalid();

if (TTI::requiresOrderedReduction(FMF)) {
if (auto *FixedVTy = dyn_cast<FixedVectorType>(ValTy)) {
InstructionCost BaseCost =
Expand Down
4 changes: 4 additions & 0 deletions llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ define void @fadd() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fadd <vscale x 4 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fadd <vscale x 8 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fadd <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V1F32 = fadd <vscale x 1 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fadd <vscale x 2 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fadd <vscale x 4 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fadd <vscale x 8 x float> undef, undef
Expand All @@ -19,6 +20,7 @@ define void @fadd() {
%V8F16 = fadd <vscale x 8 x half> undef, undef
%V16F16 = fadd <vscale x 16 x half> undef, undef

%V1F32 = fadd <vscale x 1 x float> undef, undef
%V2F32 = fadd <vscale x 2 x float> undef, undef
%V4F32 = fadd <vscale x 4 x float> undef, undef
%V8F32 = fadd <vscale x 8 x float> undef, undef
Expand All @@ -34,6 +36,7 @@ define void @fsub() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fsub <vscale x 4 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fsub <vscale x 8 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fsub <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V1F32 = fsub <vscale x 1 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fsub <vscale x 2 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fsub <vscale x 4 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fsub <vscale x 8 x float> undef, undef
Expand All @@ -45,6 +48,7 @@ define void @fsub() {
%V8F16 = fsub <vscale x 8 x half> undef, undef
%V16F16 = fsub <vscale x 16 x half> undef, undef

%V1F32 = fsub <vscale x 1 x float> undef, undef
%V2F32 = fsub <vscale x 2 x float> undef, undef
%V4F32 = fsub <vscale x 4 x float> undef, undef
%V8F32 = fsub <vscale x 8 x float> undef, undef
Expand Down
2 changes: 2 additions & 0 deletions llvm/test/Analysis/CostModel/AArch64/cttz_elts.ll
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

define void @foo_no_vscale_range() {
; CHECK-LABEL: 'foo_no_vscale_range'
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %res.i64.nxv1i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv1i1(<vscale x 1 x i1> undef, i1 true)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res.i64.nxv2i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv2i1(<vscale x 2 x i1> undef, i1 true)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res.i64.nxv4i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv4i1(<vscale x 4 x i1> undef, i1 true)
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res.i64.nxv8i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv8i1(<vscale x 8 x i1> undef, i1 true)
Expand Down Expand Up @@ -45,6 +46,7 @@ define void @foo_no_vscale_range() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %res.i32.v32i1.nzip = call i32 @llvm.experimental.cttz.elts.i32.v32i1(<32 x i1> undef, i1 false)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%res.i64.nxv1i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv1i1(<vscale x 1 x i1> undef, i1 true)
%res.i64.nxv2i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv2i1(<vscale x 2 x i1> undef, i1 true)
%res.i64.nxv4i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv4i1(<vscale x 4 x i1> undef, i1 true)
%res.i64.nxv8i1.zip = call i64 @llvm.experimental.cttz.elts.i64.nxv8i1(<vscale x 8 x i1> undef, i1 true)
Expand Down
21 changes: 21 additions & 0 deletions llvm/test/Analysis/CostModel/AArch64/sve-arith.ll
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,34 @@ define void @scalable_mul() #0 {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv8i16 = mul <vscale x 8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv4i32 = mul <vscale x 4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv2i64 = mul <vscale x 2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %mul_nxv1i64 = mul <vscale x 1 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
entry:
%mul_nxv16i8 = mul <vscale x 16 x i8> undef, undef
%mul_nxv8i16 = mul <vscale x 8 x i16> undef, undef
%mul_nxv4i32 = mul <vscale x 4 x i32> undef, undef
%mul_nxv2i64 = mul <vscale x 2 x i64> undef, undef
%mul_nxv1i64 = mul <vscale x 1 x i64> undef, undef

ret void
}

define void @scalable_add() #0 {
; CHECK-LABEL: 'scalable_add'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv16i8 = add <vscale x 16 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv8i16 = add <vscale x 8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv4i32 = add <vscale x 4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %add_nxv2i64 = add <vscale x 2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %add_nxv1i64 = add <vscale x 1 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
entry:
%add_nxv16i8 = add <vscale x 16 x i8> undef, undef
%add_nxv8i16 = add <vscale x 8 x i16> undef, undef
%add_nxv4i32 = add <vscale x 4 x i32> undef, undef
%add_nxv2i64 = add <vscale x 2 x i64> undef, undef
%add_nxv1i64 = add <vscale x 1 x i64> undef, undef

ret void
}
Expand Down
Loading

0 comments on commit 0b745a1

Please sign in to comment.