Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421

tannergooding · 2020-04-24T19:21:24Z

This makes progress towards #956 and is based on the prototype discussed and implemented here: #9766 (comment)

Rather than reimplementing the SIMD intrinsics in managed code or duplicating a lot of the HWIntrinsic support for containment and the VEX encoding on the SIMD intrinsics, this merely recognizes the SIMD Intrinsics in importation via an alternative path and replaces them with equivalent HWIntrinsic nodes.
This allows the SIMD intrinsics to freely get support for features that have already been added to the HWIntrinsics feature such as being VEX aware, supporting containment, and other minor optimizations that have been made.

This does not cover all of the SIMD intrinsics yet, but does lay a foundational framework for the remaining intrinsics to be ported as well. It does cover x86, x64, and ARM64.

tannergooding · 2020-04-24T19:21:57Z

CC. @CarolEidt, @echesakovMSFT

Will post a jit diff shortly.

tannergooding · 2020-04-24T20:59:12Z

src/coreclr/src/jit/compiler.h

    // Returns the codegen type for a given SIMD size.
-    var_types getSIMDTypeForSize(unsigned size)
+    static var_types getSIMDTypeForSize(unsigned size)


We could also change GenTreeHWIntrinsic to track the SimdType rather than the SimdSize, but it is a more involved change.

tannergooding · 2020-04-24T21:00:09Z

src/coreclr/src/jit/importer.cpp

+
+            if ((ni > NI_SIMD_AS_HWINTRINSIC_START) && (ni < NI_SIMD_AS_HWINTRINSIC_END))
+            {
+                return impSimdAsHWIntrinsic(ni, clsHnd, method, sig, mustExpand);


If the intrinsic isn't currently handled or ends up returning nullptr, we will currently still hit the existing impSIMDIntrinsic path and produce a GT_SIMD node later.

tannergooding · 2020-04-24T21:00:54Z

src/coreclr/src/jit/lowerxarch.cpp


    GenTree* op1 = node->gtGetOp1();
    GenTree* op2 = node->gtGetOp2();
    GenTree* op3 = nullptr;

-    if (!HWIntrinsicInfo::SupportsContainment(intrinsicId))
+    if (!HWIntrinsicInfo::SupportsContainment(intrinsicId) || (simdSize == 8) || (simdSize == 12))


We should be able to support containment on simdSize == 12 if it is a local or one of the other cases we allocate 16-bytes of storage for it.

I believe that morph will retype TYP_SIMD12 locals as TYP_SIMD16 where possible, so I think it's probably reasonable to assume it's not safe here (or fix the cases where we don't widen it if we should).

Just noting, it is retyped but the simdSize isn't changed, and even if node is retyped, it doesn't mean op1 or op2 are retyped.
So it will require a few changes to get correct, but shouldn't be too difficult overall.

src/coreclr/src/jit/simdashwintrinsiclistxarch.h

tannergooding · 2020-04-25T00:48:03Z

x64 Windows AVX2 Diff:

Found 271 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -741 (-0.00% of base)
    diff is an improvement.

Top file regressions (bytes):
          21 : diff\System.Text.Encodings.Web.dasm (0.06% of base)

Top file improvements (bytes):
        -754 : diff\System.Private.CoreLib.dasm (-0.02% of base)
          -4 : diff\System.Net.WebSockets.dasm (-0.01% of base)
          -4 : diff\System.Net.WebSockets.WebSocketProtocol.dasm (-0.01% of base)

4 total files with Code Size differences (3 improved, 1 regressed), 262 unchanged.

Top method regressions (bytes):
          14 ( 4.96% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           7 ( 7.22% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
           4 ( 0.33% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
           4 ( 8.00% of base) : diff\System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           4 ( 8.70% of base) : diff\System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float

Top method improvements (bytes):
         -40 (-4.87% of base) : diff\System.Private.CoreLib.dasm - Vector:AndNot(Vector`1,Vector`1):Vector`1 (6 methods)
         -38 (-15.02% of base) : diff\System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -34 (-12.36% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -32 (-11.64% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-9.76% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_UnaryNegation(Vector`1):Vector`1 (6 methods)
         -20 (-4.07% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-3.18% of base) : diff\System.Private.CoreLib.dasm - Vector`1:ConditionalSelect(Vector`1,Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-2.88% of base) : diff\System.Private.CoreLib.dasm - Vector:OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-7.58% of base) : diff\System.Private.CoreLib.dasm - Vector:EqualsAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)
         -20 (-3.15% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThanOrEqualAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)
         -20 (-3.15% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThanOrEqualAny(Vector`1,Vector`1):bool (6 methods)

Top method regressions (percentages):
           4 ( 8.70% of base) : diff\System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float
           4 ( 8.00% of base) : diff\System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           7 ( 7.22% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
          14 ( 4.96% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           4 ( 0.33% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4

Top method improvements (percentages):
          -8 (-24.24% of base) : diff\System.Private.CoreLib.dasm - Vector4:Clamp(Vector4,Vector4,Vector4):Vector4
          -4 (-17.39% of base) : diff\System.Private.CoreLib.dasm - Vector4:op_UnaryNegation(Vector4):Vector4
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Add(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Subtract(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Multiply(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Divide(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:op_Multiply(Vector4,float):Vector4
          -4 (-16.00% of base) : diff\System.Private.CoreLib.dasm - Vector4:op_Multiply(float,Vector4):Vector4
          -2 (-15.38% of base) : diff\System.Private.CoreLib.dasm - Sse42:Crc32(int,ubyte):int
          -5 (-15.15% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(int,Vector`1):Vector`1
         -38 (-15.02% of base) : diff\System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
          -4 (-14.81% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(Vector`1,double):Vector`1
          -2 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Sse42:Crc32(int,ushort):int
          -4 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(double,Vector`1):Vector`1
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:Add(Vector`1,Vector`1):Vector`1 (6 methods)

60 total methods with Code Size differences (55 improved, 5 regressed), 244683 unchanged.

1 files had text diffs but no metric diffs.
diff\System.Text.Json.dasm had 16 diffs

ARM64 AdvSIMD diff:

Found 274 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 0 (0.00% of base)

0 total files with Code Size differences (0 improved, 0 regressed), 266 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 244744 unchanged.

8 files had text diffs but no metric diffs.
System.Private.CoreLib.dasm had 366 diffs
System.Runtime.Numerics.dasm had 48 diffs
xunit.console.dasm had 6 diffs
Microsoft.CSharp.dasm had 4 diffs
System.ComponentModel.Annotations.dasm had 4 diffs
System.Data.Common.dasm had 4 diffs
System.Data.OleDb.dasm had 4 diffs
System.Security.Cryptography.Primitives.dasm had 2 diffs

CarolEidt

A couple of initial comments. I've only skimmed through the simdashwintrinsic* files. I'd like to see the impact on compile time and table sizes at some point.

CarolEidt · 2020-04-25T00:08:47Z

src/coreclr/src/jit/lowerxarch.cpp


    GenTree* op1 = node->gtGetOp1();
    GenTree* op2 = node->gtGetOp2();
    GenTree* op3 = nullptr;

-    if (!HWIntrinsicInfo::SupportsContainment(intrinsicId))
+    if (!HWIntrinsicInfo::SupportsContainment(intrinsicId) || (simdSize == 8) || (simdSize == 12))


I believe that morph will retype TYP_SIMD12 locals as TYP_SIMD16 where possible, so I think it's probably reasonable to assume it's not safe here (or fix the cases where we don't widen it if we should).

src/coreclr/src/jit/importer.cpp

tannergooding · 2020-04-25T01:14:21Z

I'd like to see the impact on compile time and table sizes at some point.

What is the best way to collect this information?

CarolEidt · 2020-04-25T01:17:38Z

@tannergooding - For the tables, you can just look at file sizes. For JIT time, the best way is to use SuperPmi to compile a bunch of methods. SuperPmi is described here: https://github.com/dotnet/runtime/blob/master/src/coreclr/scripts/superpmi.md and I usually measure using pin. If it's a hassle I can probably measure for you.

tannergooding · 2020-04-25T02:13:35Z

I should be able to collect the SuperPMI diffs, although I don't see anything related to pin listed in the doc (or other superpmi reference).
I tried running python .\src\coreclr\scripts\superpmi.py asmdiffs D:\tagoo\Repos\runtime_base\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\clrjit.dll for the time being and it immediately exits with an assert in superpmi, same as when doing just replay against runtime_base.

As for file sizes:

File	Before	After	Diff
clrjit.dll	1,245,696 bytes	1,252,352 bytes	+ 6,656 Bytes
linuxnonjit.dll	1,047,040 bytes	1,057,280 bytes	+10,240 Bytes
protononjit.dll	957,440 bytes	962,560 bytes	+ 5,120 Bytes

We will naturally be able to gain some or all of this back as more get implemented and we can start removing the GT_SIMD path.

I've listed the jit-pmi-diff for x64 and ARM64 here: #35421 (comment)

tannergooding · 2020-04-25T02:56:45Z

and it immediately exits with an assert in superpmi, same as when doing just replay against runtime_base.

Looks like its because it expects something to be 226 bytes but it is actually 228 bytes. I'm guessing possibly the JIT/EE version changed which I believe means I'll need to do a new collection

Removing the [Intrinsic] attribute from some Vector2/3/4 methods which aren't intrinsic

There were a couple of operator * methods marked as intrinsic when they weren't actually. It isn't a problem for GT_SIMD since it tracks the expected argument kinds and checks it against the method signature.
I've resolved the issue and updated the jit-pmi-diff entries above.

tannergooding · 2020-04-25T03:24:49Z

Need to fix OpOrEqual comparisons, likewise need to fix division for 8/12 byte to continue zeroing the upper bits after they have completed.

…HW intrinsics

…h aren't intrinsic

…e driven

…MD behavior

src/coreclr/src/jit/simdashwintrinsiclistxarch.h

src/coreclr/src/jit/simdashwintrinsic.cpp

…ument

… static class

tannergooding · 2020-05-01T23:27:07Z

I believe I've resolved the couple minor things I found and am collecting new diffs for the Abs methods and additional scenarios covered by the Vector static class being properly handled.

CarolEidt

I believe you've addressed all my questions and concerns. And I've looked over the recent changes.

tannergooding · 2020-05-02T00:29:58Z

Thanks @CarolEidt, I'm just working on ensuring the diffs are correct again now as there were a couple surprises with the Vector static class and its APIs differing from the equivalent APIs on Vector<T> 😄

…Expr in SimdAsHWIntrinsic

tannergooding · 2020-05-02T06:12:54Z

Diff is back to expected and shows a few additional improvements.

Found 271 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -847 (-0.00% of base)
    diff is an improvement.

Top file regressions (bytes):
          21 : System.Text.Encodings.Web.dasm (0.06% of base)

Top file improvements (bytes):
        -852 : System.Private.CoreLib.dasm (-0.02% of base)
          -8 : System.Text.Json.dasm (-0.00% of base)
          -4 : System.Net.WebSockets.dasm (-0.01% of base)
          -4 : System.Net.WebSockets.WebSocketProtocol.dasm (-0.01% of base)

5 total files with Code Size differences (4 improved, 1 regressed), 261 unchanged.

Top method regressions (bytes):
          14 ( 4.96% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           7 ( 7.22% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
           4 ( 0.33% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
           4 ( 8.00% of base) : System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           4 ( 8.70% of base) : System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float

Top method improvements (bytes):
         -40 (-6.37% of base) : System.Private.CoreLib.dasm - Vector`1:ConditionalSelect(Vector`1,Vector`1,Vector`1):Vector`1 (6 methods)
         -40 (-4.87% of base) : System.Private.CoreLib.dasm - Vector:AndNot(Vector`1,Vector`1):Vector`1 (6 methods)
         -38 (-15.02% of base) : System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : System.Private.CoreLib.dasm - Vector:LessThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : System.Private.CoreLib.dasm - Vector:GreaterThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -34 (-12.36% of base) : System.Private.CoreLib.dasm - Vector:GreaterThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -32 (-11.64% of base) : System.Private.CoreLib.dasm - Vector:LessThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
         -22 (-14.97% of base) : System.Private.CoreLib.dasm - Vector:Abs(Vector`1):Vector`1 (6 methods)
         -20 (-9.76% of base) : System.Private.CoreLib.dasm - Vector`1:op_UnaryNegation(Vector`1):Vector`1 (6 methods)
         -20 (-4.07% of base) : System.Private.CoreLib.dasm - Vector`1:op_OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-2.88% of base) : System.Private.CoreLib.dasm - Vector:OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-7.58% of base) : System.Private.CoreLib.dasm - Vector:EqualsAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : System.Private.CoreLib.dasm - Vector:LessThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)
         -20 (-3.15% of base) : System.Private.CoreLib.dasm - Vector:LessThanOrEqualAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : System.Private.CoreLib.dasm - Vector:GreaterThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)

Top method regressions (percentages):
           4 ( 8.70% of base) : System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float
           4 ( 8.00% of base) : System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           7 ( 7.22% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
          14 ( 4.96% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           4 ( 0.33% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4

Top method improvements (percentages):
          -8 (-24.24% of base) : System.Private.CoreLib.dasm - Vector4:Clamp(Vector4,Vector4,Vector4):Vector4
          -4 (-17.39% of base) : System.Private.CoreLib.dasm - Vector4:op_UnaryNegation(Vector4):Vector4
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Add(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Subtract(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Multiply(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Divide(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:op_Multiply(Vector4,float):Vector4
          -4 (-16.00% of base) : System.Private.CoreLib.dasm - Vector4:op_Multiply(float,Vector4):Vector4
          -2 (-15.38% of base) : System.Private.CoreLib.dasm - Sse42:Crc32(int,ubyte):int
          -5 (-15.15% of base) : System.Private.CoreLib.dasm - Vector`1:op_Multiply(int,Vector`1):Vector`1
         -38 (-15.02% of base) : System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
         -22 (-14.97% of base) : System.Private.CoreLib.dasm - Vector:Abs(Vector`1):Vector`1 (6 methods)
          -4 (-14.81% of base) : System.Private.CoreLib.dasm - Vector`1:op_Multiply(Vector`1,double):Vector`1
          -2 (-14.29% of base) : System.Private.CoreLib.dasm - Sse42:Crc32(int,ushort):int
          -4 (-14.29% of base) : System.Private.CoreLib.dasm - Vector`1:op_Multiply(double,Vector`1):Vector`1
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)

tannergooding · 2020-05-03T04:16:58Z

Will wait for @echesakovMSFT to review before merging

echesakov · 2020-05-04T17:40:50Z

Will wait for @echesakovMSFT to review before merging

I am taking a look now, sorry for the wait

echesakov

Overall, looks good - I left couple question/suggestions.
Do we need to update simdashwintrinsiclistarm64.h as further progress on Arm64 intrinsic are made?

echesakov · 2020-05-04T22:25:43Z

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp

@@ -207,7 +207,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
    }
    else
    {
-        emitSize = EA_SIZE(node->gtSIMDSize);
+        emitSize = emitActualTypeSize(Compiler::getSIMDTypeForSize(node->gtSIMDSize));


Why is this needed? To support Vector3 on Arm64?

Yes, its required to support Vector3 as it is size = 12 but actualSize = 16

echesakov · 2020-05-04T22:53:23Z

src/coreclr/src/jit/hwintrinsic.h

+                }
+
+                assert(id != NI_AVX_CompareGreaterThan);
+                return static_cast<int>(FloatComparisonMode::OrderedLessThanSignaling);


This could be confusing to someone who doesn't know that we expect later in the JIT to swap the intrinsic arguments.

IMO, it would be clearer to leave these as special import intrinsics and move all the plumbing related to opportunisticallyDependsOnAVX to one place.

I can add a comment, but the entire point of moving it to lowering is so the rest of the JIT doesn't need to care that AVX supports proper GreaterThan while Pre-AVX emulates it.

As we continue adding more JIT optimizations around HWIntrinsics, the distinction doesn't matter to anything except for codegen and so handling the fixup in lowering makes this trivial for everything else.

Will submit a follow up PR that adds the comment.

echesakov · 2020-05-04T23:02:33Z

src/coreclr/src/jit/importer.cpp

@@ -4143,7 +4143,7 @@ GenTree* Compiler::impIntrinsic(GenTree*                newobjThis,
            case NI_System_MathF_FusedMultiplyAdd:
            {
 #ifdef TARGET_XARCH
-                if (compExactlyDependsOn(InstructionSet_FMA))
+                if (compExactlyDependsOn(InstructionSet_FMA) && supportSIMDTypes())


Not related to this change, but I wonder if this optimization can be implemented without requiring to support SIMD types

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2020

tannergooding commented Apr 24, 2020

View reviewed changes

src/coreclr/src/jit/simdashwintrinsiclistxarch.h Outdated Show resolved Hide resolved

tannergooding force-pushed the simd-as-hwintrinsic branch from 940e753 to 759df57 Compare April 25, 2020 00:27

CarolEidt reviewed Apr 25, 2020

View reviewed changes

tannergooding force-pushed the simd-as-hwintrinsic branch from 759df57 to c6c4005 Compare April 25, 2020 01:12

tannergooding force-pushed the simd-as-hwintrinsic branch from 4072878 to 6d47adf Compare April 25, 2020 01:51

jkotas mentioned this pull request Apr 25, 2020

Delete redundant CoreCLR regression tests #35437

Merged

tannergooding force-pushed the simd-as-hwintrinsic branch from 2b7c331 to 2169790 Compare April 26, 2020 07:53

tannergooding added 12 commits April 26, 2020 07:15

Adding basic support for recognizing and handling SIMD intrinsics as …

703ab93

…HW intrinsics

Applying formatting patch

32209f3

Fixing a preprocessor concatenation for non windows

a5ad01c

Add a default case to workaround a compiler warning on FreeBSD

3b5f8f4

Fixing a noway_assert to include GT_HWINTRINSIC

8744b7b

Fixing some asserts that were being triggered

4beacab

Use getSIMDVectorRegisterByteLength

3af99b2

Applying formatting patch

e229ca0

Fixing ARM64 to use the actual type size

92ec83c

Removing the [Intrinsic] attribute from some Vector2/3/4 methods whic…

e9e7b89

…h aren't intrinsic

Updating SSE/SSE2 CompareGreaterThan and related functions to be tabl…

f788049

…e driven

Fixing the SimdAsHWIntrinsic relational operations to match the GT_SI…

0cf2a0b

…MD behavior

tannergooding commented May 1, 2020

View reviewed changes

src/coreclr/src/jit/simdashwintrinsiclistxarch.h Outdated Show resolved Hide resolved

tannergooding commented May 1, 2020

View reviewed changes

src/coreclr/src/jit/simdashwintrinsic.cpp Outdated Show resolved Hide resolved

tannergooding added 3 commits May 1, 2020 12:47

Fixing SSSE3_Abs and AVX2_Abs to get the base type from the first arg…

017fe54

…ument

Ensure we adjust the class handle used for intrinsics from the Vector…

06bec3e

… static class

Ensure we populate the handle cache for clsHnd even if it isn't used

bd6e87a

CarolEidt approved these changes May 2, 2020

View reviewed changes

tannergooding force-pushed the simd-as-hwintrinsic branch from 3ef1601 to 7291546 Compare May 2, 2020 00:28

tannergooding force-pushed the simd-as-hwintrinsic branch from 7291546 to 4bdc51c Compare May 2, 2020 00:44

Fix where we grab the base type from for the static Vector class

341aac8

tannergooding force-pushed the simd-as-hwintrinsic branch 2 times, most recently from 7127164 to 73f7315 Compare May 2, 2020 03:54

Fixing ConditionalSelect and improving the messages used for impClone…

b6494ee

…Expr in SimdAsHWIntrinsic

tannergooding force-pushed the simd-as-hwintrinsic branch from 73f7315 to b6494ee Compare May 2, 2020 05:02

tannergooding added 2 commits May 2, 2020 09:01

Ensure we clone the constVectorDup before using it

470f627

Applying formatting patch

03840fa

echesakov approved these changes May 4, 2020

View reviewed changes

tannergooding merged commit 56518a7 into dotnet:master May 5, 2020

jaredpar mentioned this pull request May 5, 2020

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

This was referenced May 5, 2020

Adding a clarifying comment as to why HWIntrinsicInfo::lookupIval returns an inverted comparison op #35867

Merged

Arm64: Improve code generation for Vector<T> comparision #31685

Closed

kunalspathak mentioned this pull request May 9, 2020

Optimize ToScalar() and GetElement() to use arm64 intrinsic #36156

Merged

This was referenced May 11, 2020

Test failure: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt #36198

Closed

Porting more of the SIMD intrinsics to be implemented as HWIntrinsics #36579

Merged

tannergooding mentioned this pull request May 18, 2020

Assertion failed 'varDsc->lvExactSize == 12' #36586

Closed

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421

Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421

tannergooding commented Apr 24, 2020

tannergooding commented Apr 24, 2020

tannergooding Apr 24, 2020

tannergooding Apr 24, 2020

tannergooding Apr 24, 2020

CarolEidt Apr 25, 2020

tannergooding Apr 26, 2020

tannergooding commented Apr 25, 2020 •

edited

Loading

CarolEidt left a comment

CarolEidt Apr 25, 2020

tannergooding commented Apr 25, 2020

CarolEidt commented Apr 25, 2020

tannergooding commented Apr 25, 2020 •

edited

Loading

tannergooding commented Apr 25, 2020

tannergooding commented Apr 25, 2020

tannergooding commented May 1, 2020

CarolEidt left a comment

tannergooding commented May 2, 2020

tannergooding commented May 2, 2020

tannergooding commented May 3, 2020

echesakov commented May 4, 2020

echesakov left a comment

echesakov May 4, 2020

tannergooding May 4, 2020

echesakov May 4, 2020

echesakov May 4, 2020

tannergooding May 4, 2020

tannergooding May 5, 2020

echesakov May 4, 2020

Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421

Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421

Conversation

tannergooding commented Apr 24, 2020

tannergooding commented Apr 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Apr 25, 2020 • edited Loading

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Apr 25, 2020

CarolEidt commented Apr 25, 2020

tannergooding commented Apr 25, 2020 • edited Loading

tannergooding commented Apr 25, 2020

tannergooding commented Apr 25, 2020

tannergooding commented May 1, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding commented May 2, 2020

tannergooding commented May 2, 2020

tannergooding commented May 3, 2020

echesakov commented May 4, 2020

echesakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Apr 25, 2020 •

edited

Loading

tannergooding commented Apr 25, 2020 •

edited

Loading