Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM] Add NEON support for ISD::ABDS/ABDU nodes. #94504

Merged
merged 1 commit into from
Jun 7, 2024
Merged

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Jun 5, 2024

As noted on #94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns. This patch flags basic ABDS/ABDU for neon types as legal.

Ideally all the VABD/VABA/VABDL/VABAL handling should be moved to using the abds/abdu nodes - but am I on the right track?

Fixes #94466

@llvmbot
Copy link
Member

llvmbot commented Jun 5, 2024

@llvm/pr-subscribers-backend-arm

Author: Simon Pilgrim (RKSimon)

Changes

As noted on #94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns. This patch flags basic ABDS/ABDU for neon types as legal.

I'm not clear how ARM handles intrinsic -> ISD mapping, so I've left all the intrinsics handling in place so far at the moment.

Ideally all the VABD/VABA/VABDL/VABAL handling should be moved to using the abds/abdu nodes - but am I on the right track?

Fixes #94466


Full diff: https://github.com/llvm/llvm-project/pull/94504.diff

2 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+3-3)
  • (modified) llvm/lib/Target/ARM/ARMInstrNEON.td (+30-2)
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 5212d2c620b75..0f4eafa8e5baf 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -205,9 +205,9 @@ void ARMTargetLowering::addTypeForNEON(MVT VT, MVT PromotedLdStVT) {
   setOperationAction(ISD::SDIVREM, VT, Expand);
   setOperationAction(ISD::UDIVREM, VT, Expand);
 
-  if (!VT.isFloatingPoint() &&
-      VT != MVT::v2i64 && VT != MVT::v1i64)
-    for (auto Opcode : {ISD::ABS, ISD::SMIN, ISD::SMAX, ISD::UMIN, ISD::UMAX})
+  if (!VT.isFloatingPoint() && VT != MVT::v2i64 && VT != MVT::v1i64)
+    for (auto Opcode : {ISD::ABS, ISD::ABDS, ISD::ABDU, ISD::SMIN, ISD::SMAX,
+                        ISD::UMIN, ISD::UMAX})
       setOperationAction(Opcode, VT, Legal);
   if (!VT.isFloatingPoint())
     for (auto Opcode : {ISD::SADDSAT, ISD::UADDSAT, ISD::SSUBSAT, ISD::USUBSAT})
diff --git a/llvm/lib/Target/ARM/ARMInstrNEON.td b/llvm/lib/Target/ARM/ARMInstrNEON.td
index 21a5817252aea..f25706a267741 100644
--- a/llvm/lib/Target/ARM/ARMInstrNEON.td
+++ b/llvm/lib/Target/ARM/ARMInstrNEON.td
@@ -5655,6 +5655,34 @@ def  VABDhq   : N3VQInt<1, 0, 0b11, 0b1101, 0, N3RegFrm, IIC_VBINQ,
                         "vabd", "f16", v8f16, v8f16, int_arm_neon_vabds, 1>,
                 Requires<[HasNEON, HasFullFP16]>;
 
+let Predicates = [HasNEON] in {
+def : Pat<(v2i32 (abds (v2i32 DPR:$opA), (v2i32 DPR:$opB))),
+          (VABDsv2i32 DPR:$opA, DPR:$opB)>;
+def : Pat<(v4i16 (abds (v4i16 DPR:$opA), (v4i16 DPR:$opB))),
+          (VABDsv4i16 DPR:$opA, DPR:$opB)>;
+def : Pat<(v8i8 (abds (v8i8 DPR:$opA), (v8i8 DPR:$opB))),
+          (VABDsv8i8 DPR:$opA, DPR:$opB)>;
+def : Pat<(v4i32 (abds (v4i32 QPR:$opA), (v4i32 QPR:$opB))),
+          (VABDsv4i32 QPR:$opA, QPR:$opB)>;
+def : Pat<(v8i16 (abds (v8i16 QPR:$opA), (v8i16 QPR:$opB))),
+          (VABDsv8i16 QPR:$opA, QPR:$opB)>;
+def : Pat<(v16i8 (abds (v16i8 QPR:$opA), (v16i8 QPR:$opB))),
+          (VABDsv16i8 QPR:$opA, QPR:$opB)>;
+
+def : Pat<(v2i32 (abdu (v2i32 DPR:$opA), (v2i32 DPR:$opB))),
+          (VABDuv2i32 DPR:$opA, DPR:$opB)>;
+def : Pat<(v4i16 (abdu (v4i16 DPR:$opA), (v4i16 DPR:$opB))),
+          (VABDuv4i16 DPR:$opA, DPR:$opB)>;
+def : Pat<(v8i8 (abdu (v8i8 DPR:$opA), (v8i8 DPR:$opB))),
+          (VABDuv8i8 DPR:$opA, DPR:$opB)>;
+def : Pat<(v4i32 (abdu (v4i32 QPR:$opA), (v4i32 QPR:$opB))),
+          (VABDuv4i32 QPR:$opA, QPR:$opB)>;
+def : Pat<(v8i16 (abdu (v8i16 QPR:$opA), (v8i16 QPR:$opB))),
+          (VABDuv8i16 QPR:$opA, QPR:$opB)>;
+def : Pat<(v16i8 (abdu (v16i8 QPR:$opA), (v16i8 QPR:$opB))),
+          (VABDuv16i8 QPR:$opA, QPR:$opB)>;
+}
+
 //   VABDL    : Vector Absolute Difference Long (Q = | D - D |)
 defm VABDLs   : N3VLIntExt_QHS<0,1,0b0111,0, IIC_VSUBi4Q,
                                "vabdl", "s", int_arm_neon_vabds, zext, 1>;
@@ -5662,9 +5690,9 @@ defm VABDLu   : N3VLIntExt_QHS<1,1,0b0111,0, IIC_VSUBi4Q,
                                "vabdl", "u", int_arm_neon_vabdu, zext, 1>;
 
 let Predicates = [HasNEON] in {
-def : Pat<(v8i16 (abs (sub (zext (v8i8 DPR:$opA)), (zext (v8i8 DPR:$opB))))),
+def : Pat<(v8i16 (zext (abdu (v8i8 DPR:$opA), (v8i8 DPR:$opB)))),
           (VABDLuv8i16 DPR:$opA, DPR:$opB)>;
-def : Pat<(v4i32 (abs (sub (zext (v4i16 DPR:$opA)), (zext (v4i16 DPR:$opB))))),
+def : Pat<(v4i32 (zext (abdu (v4i16 DPR:$opA), (v4i16 DPR:$opB)))),
           (VABDLuv4i32 DPR:$opA, DPR:$opB)>;
 }
 

@davemgreen
Copy link
Collaborator

Yeah it sounds OK to me. I think I would expect the intrinsics to be converted to ISD::ABD nodes, as we do for AArch64.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 5, 2024

Cheers, I'll tidy this up and update all the patterns.

RKSimon added a commit that referenced this pull request Jun 6, 2024
RKSimon added a commit that referenced this pull request Jun 6, 2024
@RKSimon RKSimon marked this pull request as draft June 6, 2024 08:32
Copy link

github-actions bot commented Jun 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

(zext node:$in2)), (i32 $shift))>;

let Predicates = [HasNEON] in {
def : Pat<(xor (v2i64 (abd_shr (v2i32 DPR:$opA), (v2i32 DPR:$opB), 63)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible to keep, or is the pattern now different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to avoid the need for it in this patch, otherwise it can wait until #92576

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed (and I was making a mess trying to fix it in #94601) - so I've simplified this patch to just replace the intrinsics initially, then we can remove the rest of it for #94601.

RKSimon added a commit that referenced this pull request Jun 6, 2024
RKSimon added a commit that referenced this pull request Jun 6, 2024
…nd hasOperation.

Avoids the need to explicitly test both commuted variants and doesn't match custom lowering after legalization.

Cleanup for #94504
@RKSimon RKSimon marked this pull request as ready for review June 6, 2024 15:47
@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 6, 2024

Should be ready now.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure lets do it. LGTM

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 6, 2024

Sure lets do it. LGTM

Cheers - it turned into a bit of chicken-and-egg situation :)

As noted on llvm#94466, NEON has ABDS/ABDU instructions but only handles them via intrinsics, plus some VABDL custom patterns.

Fixes llvm#94466
@RKSimon RKSimon merged commit c0b4685 into llvm:main Jun 7, 2024
4 of 6 checks passed
@RKSimon RKSimon deleted the arm-abd branch June 7, 2024 09:18
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ARM][NEON] NEON lowering doesn't legalize ISD::ABDS/ABDU nodes
3 participants