Optimize Vector64 and Vector128.Create methods #36267

tannergooding · 2020-05-12T05:57:54Z

This is basically the same as #35857 but for ARM64 and updates the Vector64/128.Create methods to be intrinsic.

They are handled entirely in lowering where they will be replaced with the corresponding constant (as was done for GT_SIMD nodes) or where they are lowered to the correct sequence of HWIntrinsics (which allows containment and other checks to "just work"). This should make it rather trivial to support "partial constants" as well (that is, a vector where say 50% of the inputs are constant and the other half are not).

It might be beneficial to eventually create a proper GenTreeVecCns node and to also try and handle this earlier (which would allow constants to be deduplicated and other features), but that is a more involved change.

I'm working on getting a jit-diff and will post when it is available.

…nsic on ARM64

tannergooding · 2020-05-12T05:58:25Z

CC. @kunalspathak, @CarolEidt, @echesakovMSFT

tannergooding · 2020-05-12T05:58:53Z

Also CC. @TamarChristinaArm

tannergooding · 2020-05-12T06:00:41Z

src/coreclr/src/jit/hwintrinsiclistarm64.h

@@ -197,6 +197,7 @@ HARDWARE_INTRINSIC(AdvSimd_Arm64,   CompareTest,                               1
 HARDWARE_INTRINSIC(AdvSimd_Arm64,   CompareTestScalar,                          8,           2,     {INS_invalid,           INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_cmtst,          INS_cmtst,          INS_invalid,        INS_cmtst},             HW_Category_SIMDScalar,             HW_Flag_Commutative)
 HARDWARE_INTRINSIC(AdvSimd_Arm64,   Divide,                                    -1,           2,     {INS_invalid,           INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_fdiv,           INS_fdiv},              HW_Category_SimpleSIMD,             HW_Flag_NoFlag)
 HARDWARE_INTRINSIC(AdvSimd_Arm64,   DuplicateSelectedScalarToVector128,        16,           2,     {INS_invalid,           INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_dup,            INS_dup,            INS_invalid,        INS_dup},               HW_Category_IMM,                    HW_Flag_SupportsContainment|HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
+HARDWARE_INTRINSIC(AdvSimd_Arm64,   DuplicateToVector64,                       16,           1,     {INS_invalid,           INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_mov,            INS_mov,            INS_invalid,        INS_fmov},              HW_Category_SimpleSIMD,             HW_Flag_SupportsContainment|HW_Flag_SpecialCodeGen)


ARM actually exposes native intrinsics for these under the same name as the other primitive types (vdup_n_s64, etc). I'm going to include this in the misc proposal for APIs that may be missing.

tannergooding · 2020-05-12T06:01:32Z

src/coreclr/src/jit/lowerarmarch.cpp

+//  Arguments:
+//     node - The hardware intrinsic node.
+//
+void Lowering::LowerHWIntrinsicCreate(GenTreeHWIntrinsic* node)


This ended up being quite a bit simpler than x86 as ARM defines insert intrinsics as part of the baseline.

tannergooding · 2020-05-12T06:03:27Z

src/coreclr/src/jit/lowerarmarch.cpp

+
+    if ((simdSize == 8) && (simdType == TYP_DOUBLE))
+    {
+        simdType = TYP_SIMD8;


This is because impFixupStructReturnType changes the node from TYP_SIMD8 to TYP_DOUBLE. I'd imagine it would be desirable to preserve it as TYP_SIMD8 and adjust the various code paths to recognize TYP_SIMD8.

Is that covered by an existing issue?

This is related to the work that @sandreenko is doing to avoid retyping return types.

tannergooding · 2020-05-12T06:04:44Z

src/coreclr/src/jit/lowerarmarch.cpp

+        idx = comp->gtNewIconNode(N, TYP_INT);
+        BlockRange().InsertBefore(opN, idx);
+
+        tmp1 = comp->gtNewSimdHWIntrinsicNode(simdType, tmp1, idx, opN, NI_AdvSimd_Insert, baseType, simdSize);


We are currently exposing AdvSimd.Insert for long, ulong, and double.; However, I don't see how these can be supported on ARM32...

Could someone please indicate what instruction would be used, because it seems like something we need to go fix.

These should become VMOV. I have a bigger explanation in #35037

We are currently exposing AdvSimd.Insert for long, ulong, and double.; However, I don't see how these can be supported on ARM32...

I put implementation details in the <summary></summary> comments for those

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.cs

Lines 5256 to 5261 in 583fa67

/// <summary>

/// float64x2_t vsetq_lane_f64 (float64_t a, float64x2_t v, const int lane)

/// A32: VMOV.F64 Dd, Dm

/// A64: INS Vd.D[lane], Vn.D[0]

/// </summary>

public static Vector128<double> Insert(Vector128<double> vector, byte index, double data) => Insert(vector, index, data);

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.cs

Lines 5277 to 5282 in 583fa67

/// <summary>

/// int64x2_t vsetq_lane_s64 (int64_t a, int64x2_t v, const int lane)

/// A32: VMOV.64 Dd, Rt, Rt2

/// A64: INS Vd.D[lane], Xn

/// </summary>

public static Vector128<long> Insert(Vector128<long> vector, byte index, long data) => Insert(vector, index, data);

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.cs

Lines 5312 to 5317 in 583fa67

/// <summary>

/// uint64x2_t vsetq_lane_u64 (uint64_t a, uint64x2_t v, const int lane)

/// A32: VMOV.64 Dd, Rt, Rt2

/// A64: INS Vd.D[lane], Xn

/// </summary>

public static Vector128<ulong> Insert(Vector128<ulong> vector, byte index, ulong data) => Insert(vector, index, data);

tannergooding · 2020-05-12T06:05:16Z

src/coreclr/src/jit/lowerxarch.cpp

@@ -1100,141 +1100,8 @@ void Lowering::LowerHWIntrinsic(GenTreeHWIntrinsic* node)
    ContainCheckHWIntrinsic(node);
 }

-union VectorConstant {


This was just moved to https://github.com/dotnet/runtime/pull/36267/files#diff-62e3e74a242676427a24505cb3b1c209R321

tannergooding · 2020-05-12T06:05:32Z

src/coreclr/src/jit/lower.h

@@ -317,6 +317,139 @@ class Lowering final : public Phase
    void LowerHWIntrinsicCC(GenTreeHWIntrinsic* node, NamedIntrinsic newIntrinsicId, GenCondition condition);
    void LowerHWIntrinsicCreate(GenTreeHWIntrinsic* node);
    void LowerFusedMultiplyAdd(GenTreeHWIntrinsic* node);
+
+    union VectorConstant {


This was just moved from https://github.com/dotnet/runtime/pull/36267/files#diff-55de8576070c2b77ed274e8e03c6676fL1103

…orted mustExpand intrinsics

tannergooding · 2020-05-12T15:00:26Z

src/coreclr/src/jit/importer.cpp

@@ -4272,7 +4272,8 @@ GenTree* Compiler::impIntrinsic(GenTree*                newobjThis,

    if (mustExpand && (retNode == nullptr))
    {
-        NO_WAY("JIT must expand the intrinsic!");
+        assert(!"Unhandled must expand intrinsic, throwing PlatformNotSupportedException");
+        return impUnsupportedHWIntrinsic(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, method, sig, mustExpand);


Previously the JIT would fail fast with The JIT compiler encountered invalid IL code or an internal limitation.

Now, it will assert in debug mode but will throw PlatformNotSupportedException at runtime.

~~This was changed to not use impUnsupportedHWIntrinsic and instead use gtNewMustThrowException directly.~~

impUnsupportedHWIntrinsic was renamed to impUnsupportedNamedIntrinsic and moved out of FEATURE_HW_INTRINSIC

tannergooding · 2020-05-12T15:04:04Z

src/coreclr/src/jit/importer.cpp

-        }
-        else
+
+        if (result == NI_Illegal)


This path accounts for mustExpand for unrecognized HWIntrinsics (and thus will never hit the above assert) in particular as we know that throw PlatformNotSupportedException is the desired behavior for anything not recognized and which is recursive under namespace System.Runtime.Intrinsics.

All of the code paths are either directly recursive (void MyMethod() => MyMethod()) and are under one of the S.R.I.X86 or S.R.I.Arm or are indirectly recursive and are directly under S.R.I. An indirectly recursive intrinsic is one like Vector64.Create which is:

if (AdvSimd.IsSupported) { return Vector64.Create(...); } return SoftwareFallback(...);

So the throw PNSE will be generated under the dead AdvSimd.IsSupported code path under minOpts and will never actually be hit.

I'm a bit confused about this, as I thought that we always expanded intrinsics, even under minopts.
That said, I think that this code section could use a comment describing the "indirectly recursive" case. In fact, for developers who are not intimately familiar with this code, a few words in general about how/why we reach this case would be useful.

This isn't necessarily about minopts vs optimized, it's about platforms like ARM where HWIntrinsics aren't supported at all or cases like Vector64 on x86.

In both of those scenarios, it is recognized as intrinsic (due to the [Intrinsic]) and as mustExpand (due to the recursion), but it isn't possible to actually expand the call since it isn't recognized and so it was hitting https://github.com/dotnet/runtime/pull/36267/files#diff-b8d003a58643e5595d2920ca5993b952L4275

This was really only a problem for classes like Vector64/Vector128/Vector256 as they are shared between all architectures (where classes like x86.Sse or Arm.AdvSimd have two copies: one which is recursive and one which throws). It looks to have only been showing for minOpts because we drop the dead code entirely otherwise and so we never try to expand the call.

I added a comment elaborating as to why here: https://github.com/dotnet/runtime/pull/36267/files#diff-b8d003a58643e5595d2920ca5993b952R4501-R4517

tannergooding · 2020-05-12T20:08:00Z

Hmmm... Getting a Assertion failed 'NYI: Unimplemented node type CLS_VAR_ADDR'. Is there something else used for method constants on ARM64?

tannergooding · 2020-05-12T20:19:49Z

src/coreclr/src/jit/lowerarmarch.cpp

+    }
+    assert((argCnt == 1) || (argCnt == (simdSize / genTypeSize(baseType))));
+
+    if (argCnt == cnsArgCnt)


If argCnt == cnsArgCnt == 1 and the constant is small enough for mov or fmov (immediate), I presume that is better and we should prefer it here?

@TamarChristinaArm

@tannergooding Indeed. Though when testing the values you have to keep all alias of mov in mind.

But also If the alternative is a literal load I think you should allow some mov/movk as well as I believe currently the address calculation for the literal pool itself takes a couple of instructions, so you might as well use those for making the constant.

Thanks!

But also If the alternative is a literal load I think you should allow some mov/movk as well as I believe currently the address calculation for the literal pool itself takes a couple of instructions, so you might as well use those for making the constant.

I will log an issue for this in particular. It seems like something we should more generally support and which we likely don't support today; but I don't think it's worth blocking this PR on getting completed.

tannergooding · 2020-05-13T01:07:38Z

Hmmm... Getting a Assertion failed 'NYI: Unimplemented node type CLS_VAR_ADDR'. Is there something else used for method constants on ARM64?

I added basic support for GT_CLS_VAR_ADDR to the ARM64 JIT to handle this.

…nsupported platforms

CarolEidt

Overall LGTM, just a couple of suggestions

CarolEidt · 2020-05-13T01:14:03Z

src/coreclr/src/jit/importer.cpp

+            impPopStack();
+        }
+
+        return gtNewMustThrowException(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, JITtype2varType(sig->retType),


Excellent - this is a much better way to handle this. Thanks!

CarolEidt · 2020-05-13T01:17:21Z

src/coreclr/src/jit/importer.cpp

-        }
-        else
+
+        if (result == NI_Illegal)


I'm a bit confused about this, as I thought that we always expanded intrinsics, even under minopts.
That said, I think that this code section could use a comment describing the "indirectly recursive" case. In fact, for developers who are not intimately familiar with this code, a few words in general about how/why we reach this case would be useful.

CarolEidt · 2020-05-13T17:41:51Z

src/coreclr/src/jit/lowerarmarch.cpp

+        argList = argList->Rest();
+    }
+
+    assert(N == (argCnt - 1));


It seems like it would be more straightforward to include this last case in the loop above - unless I'm missing something it's identical to all the other cases, except in the case where there are 2 args, and checking for that inside the loop seems pretty low cost and quite a bit cleaner.

Its different because it modifies the original node, rather than introducing a new node.

Ah, makes sense.

…forms that don't support them

tannergooding · 2020-05-13T22:53:33Z

The diff is:

Found 303 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 292 (0.001% of base)
    diff is a regression.

Total byte diff includes 1256 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had   17 unique methods,     1256 unique bytes

Top file regressions (bytes):
         528 : System.Private.CoreLib.dasm (0.010% of base)

Top file improvements (bytes):
        -180 : System.Text.Encodings.Web.dasm (-0.471% of base)
         -48 : System.Memory.dasm (-0.016% of base)
          -8 : System.Collections.dasm (-0.001% of base)

4 total files with Code Size differences (3 improved, 1 regressed), 262 unchanged.

Top method regressions (bytes):
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|12_0(ubyte):Vector64`1 (0 base, 1 diff methods)
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|17_0(byte):Vector64`1 (0 base, 1 diff methods)
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|22_0(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector64`1 (0 base, 1 diff methods)
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|25_0(byte,byte,byte,byte,byte,byte,byte,byte):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|14_0(short):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|19_0(ushort):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|23_0(short,short,short,short):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|27_0(ushort,ushort,ushort,ushort):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|15_0(int):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|18_0(float):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|20_0(int):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|24_0(int,int):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|26_0(float,float):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|28_0(int,int):Vector64`1 (0 base, 1 diff methods)
          36 (25.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector128`1
          36 (75.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(short,short,short,short,short,short,short,short):Vector128`1
          36 (25.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte,byte):Vector128`1
          36 (75.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(ushort,ushort,ushort,ushort,ushort,ushort,ushort,ushort):Vector128`1
          24 (75.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(long,long):Vector128`1 (2 methods)
          24 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|16_0(long):Vector64`1 (0 base, 1 diff methods)

Top method improvements (bytes):
        -180 (-61.644% of base) : System.Text.Encodings.Web.dasm - Ssse3Helper:.cctor()
        -132 (-62.264% of base) : System.Private.CoreLib.dasm - Vector64:Create(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector64`1
        -132 (-62.264% of base) : System.Private.CoreLib.dasm - Vector64:Create(byte,byte,byte,byte,byte,byte,byte,byte):Vector64`1
        -112 (-73.684% of base) : System.Private.CoreLib.dasm - Vector64:Create(int):Vector64`1 (2 methods)
        -104 (-59.091% of base) : System.Private.CoreLib.dasm - Vector128:Create(int,int,int,int):Vector128`1 (2 methods)
         -76 (-76.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(ubyte):Vector64`1
         -76 (-76.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(byte):Vector64`1
         -60 (-71.429% of base) : System.Private.CoreLib.dasm - Vector64:Create(short):Vector64`1
         -60 (-71.429% of base) : System.Private.CoreLib.dasm - Vector64:Create(ushort):Vector64`1
         -56 (-73.684% of base) : System.Private.CoreLib.dasm - Vector64:Create(float):Vector64`1
         -52 (-52.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(short,short,short,short):Vector64`1
         -52 (-52.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(ushort,ushort,ushort,ushort):Vector64`1
         -32 (-6.667% of base) : System.Memory.dasm - Base64:Ssse3Encode(byref,byref,long,int,int,long,long)
         -16 (-3.053% of base) : System.Memory.dasm - Base64:Ssse3Decode(byref,byref,long,int,int,long,long)
          -8 (-4.255% of base) : System.Collections.dasm - BitArray:.cctor()
          -8 (-16.667% of base) : System.Private.CoreLib.dasm - Vector64:Create(long):Vector64`1 (2 methods)
          -4 (-0.571% of base) : System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long
          -4 (-0.725% of base) : System.Private.CoreLib.dasm - ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long
Top method regressions (percentages):
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|12_0(ubyte):Vector64`1 (0 base, 1 diff methods)
          16 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|13_0(double):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|14_0(short):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|15_0(int):Vector64`1 (0 base, 1 diff methods)
          24 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|16_0(long):Vector64`1 (0 base, 1 diff methods)
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|17_0(byte):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|18_0(float):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|19_0(ushort):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|20_0(int):Vector64`1 (0 base, 1 diff methods)
          24 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|21_0(long):Vector64`1 (0 base, 1 diff methods)
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|22_0(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|23_0(short,short,short,short):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|24_0(int,int):Vector64`1 (0 base, 1 diff methods)
         100 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|25_0(byte,byte,byte,byte,byte,byte,byte,byte):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|26_0(float,float):Vector64`1 (0 base, 1 diff methods)
          84 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|27_0(ushort,ushort,ushort,ushort):Vector64`1 (0 base, 1 diff methods)
          76 (     ∞ of base) : System.Private.CoreLib.dasm - Vector64:<Create>g__SoftwareFallback|28_0(int,int):Vector64`1 (0 base, 1 diff methods)
          20 (125.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(float,float,float,float):Vector128`1
          36 (75.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(short,short,short,short,short,short,short,short):Vector128`1
          36 (75.000% of base) : System.Private.CoreLib.dasm - Vector128:Create(ushort,ushort,ushort,ushort,ushort,ushort,ushort,ushort):Vector128`1

Top method improvements (percentages):
         -76 (-76.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(ubyte):Vector64`1
         -76 (-76.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(byte):Vector64`1
         -56 (-73.684% of base) : System.Private.CoreLib.dasm - Vector64:Create(float):Vector64`1
        -112 (-73.684% of base) : System.Private.CoreLib.dasm - Vector64:Create(int):Vector64`1 (2 methods)
         -60 (-71.429% of base) : System.Private.CoreLib.dasm - Vector64:Create(short):Vector64`1
         -60 (-71.429% of base) : System.Private.CoreLib.dasm - Vector64:Create(ushort):Vector64`1
        -132 (-62.264% of base) : System.Private.CoreLib.dasm - Vector64:Create(ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte,ubyte):Vector64`1
        -132 (-62.264% of base) : System.Private.CoreLib.dasm - Vector64:Create(byte,byte,byte,byte,byte,byte,byte,byte):Vector64`1
        -180 (-61.644% of base) : System.Text.Encodings.Web.dasm - Ssse3Helper:.cctor()
        -104 (-59.091% of base) : System.Private.CoreLib.dasm - Vector128:Create(int,int,int,int):Vector128`1 (2 methods)
         -52 (-52.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(short,short,short,short):Vector64`1
         -52 (-52.000% of base) : System.Private.CoreLib.dasm - Vector64:Create(ushort,ushort,ushort,ushort):Vector64`1
          -8 (-16.667% of base) : System.Private.CoreLib.dasm - Vector64:Create(long):Vector64`1 (2 methods)
         -32 (-6.667% of base) : System.Memory.dasm - Base64:Ssse3Encode(byref,byref,long,int,int,long,long)
          -8 (-4.255% of base) : System.Collections.dasm - BitArray:.cctor()
         -16 (-3.053% of base) : System.Memory.dasm - Base64:Ssse3Decode(byref,byref,long,int,int,long,long)
          -4 (-0.725% of base) : System.Private.CoreLib.dasm - ASCIIUtility:NarrowUtf16ToAscii_Sse2(long,long,long):long          -4 (-0.571% of base) : System.Private.CoreLib.dasm - ASCIIUtility:GetIndexOfFirstNonAsciiChar_Sse2(long,long):long

42 total methods with Code Size differences (18 improved, 24 regressed), 244865 unchanged.

33 files had text diffs but no metric diffs.
System.Private.Xml.dasm had 138 diffs
System.ComponentModel.TypeConverter.dasm had 66 diffs
System.Reflection.MetadataLoadContext.dasm had 30 diffs
Microsoft.CodeAnalysis.VisualBasic.dasm had 28 diffs
Microsoft.Extensions.DependencyInjection.dasm had 22 diffs
System.ComponentModel.Composition.dasm had 22 diffs
System.Composition.Convention.dasm had 20 diffs
System.Linq.Expressions.dasm had 14 diffs
Newtonsoft.Json.dasm had 10 diffs
System.Data.Odbc.dasm had 8 diffs
System.Linq.Parallel.dasm had 8 diffs
System.Text.Json.dasm had 8 diffs
System.Diagnostics.StackTrace.dasm had 6 diffs
System.DirectoryServices.dasm had 6 diffs
Microsoft.Extensions.Logging.Abstractions.dasm had 4 diffs
System.DirectoryServices.AccountManagement.dasm had 4 diffs
System.Drawing.Common.dasm had 4 diffs
System.Net.Requests.dasm had 4 diffs
System.Security.Cryptography.Pkcs.dasm had 4 diffs
System.Text.RegularExpressions.dasm had 4

It is slightly "smaller" due to the additional methods that were introduced in S.P.Corelib to separate out the Software Fallback (same reason why Vector128.Create shows as a "regression").
-- We are (and were prior to this PR) inserting unnecessary uxtb/sxtb (and related for ushort/short) before Insert in several cases however. Ideally we would optimize that away.

Diffs are largely cases like:

- mov     w0, #0xd1ffab1e
- dup     v8.8h, w0
+ ldr     q8, [@RWD00]
  ...
+ RWD00  db	080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh, 080h, 07Fh

We should also see a couple other wins once #36267 (comment) is resolved.

CarolEidt

LGTM - thanks!

tannergooding added 2 commits May 11, 2020 20:53

Updating the Vector64 and Vector128 Create methods to be marked Intri…

6bc9303

…nsic on ARM64

Updating the JIT to emit constants for Vector64 and Vector128.Create

fca5ca5

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 12, 2020

tannergooding commented May 12, 2020

View reviewed changes

kunalspathak mentioned this pull request May 12, 2020

Test failure: JIT\\Regression\\JitBlue\\GitHub_35821\\GitHub_35821\\GitHub_35821.cmd #36206

Closed

Fixing lookupNamedIntrinsic and impIntrinsic to throw PNSE for unsupp…

0a05995

…orted mustExpand intrinsics

jaredpar mentioned this pull request May 12, 2020

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

tannergooding commented May 12, 2020

View reviewed changes

tannergooding added 3 commits May 12, 2020 09:40

Fixing impIntrinsic to directly use gtNewMustThrowException

c0b72a5

Move gtNewMustThrowException to not depend on FEATURE_HW_INTRINSICS

4d282af

Applying formatting patch

07a08aa

tannergooding commented May 12, 2020

View reviewed changes

Add basic support for GT_CLS_VAR_ADDR to the ARM64 JIT

5ec1fcd

Update lookupNamedIntrinsic to handle System.Runtime.Intrinsics for u…

b2401b5

…nsupported platforms

tannergooding force-pushed the const-vector branch from 112d9cb to b2401b5 Compare May 13, 2020 17:32

CarolEidt reviewed May 13, 2020

View reviewed changes

tannergooding added 2 commits May 13, 2020 10:53

Fixing INS_ldr in emitIns_R_C to use isValidVectorLSDatasize

e86e2d6

Elaborate on why we specially recognize the HWIntrinsics even on plat…

ee58c32

…forms that don't support them

CarolEidt approved these changes May 13, 2020

View reviewed changes

kunalspathak mentioned this pull request May 13, 2020

Vector64 and Vector128.Create doesn't generate movi for long immediates #36254

Closed

tannergooding merged commit 9924705 into dotnet:master May 13, 2020

tannergooding mentioned this pull request Jun 17, 2020

Vectorized common String.Split() paths #38001

Merged

stephentoub mentioned this pull request Sep 15, 2020

Cleanup BitArray code #38780

Closed

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Vector64 and Vector128.Create methods #36267

Optimize Vector64 and Vector128.Create methods #36267

tannergooding commented May 12, 2020

tannergooding commented May 12, 2020

tannergooding commented May 12, 2020

tannergooding May 12, 2020

tannergooding May 12, 2020

tannergooding May 12, 2020

CarolEidt May 13, 2020

tannergooding May 12, 2020

TamarChristinaArm May 12, 2020

echesakov May 12, 2020

tannergooding May 12, 2020

tannergooding May 12, 2020

tannergooding May 12, 2020

tannergooding May 12, 2020 •

edited

Loading

tannergooding May 12, 2020 •

edited

Loading

CarolEidt May 13, 2020

tannergooding May 13, 2020

tannergooding May 13, 2020

tannergooding commented May 12, 2020

tannergooding May 12, 2020

tannergooding May 12, 2020

TamarChristinaArm May 13, 2020

tannergooding May 13, 2020

tannergooding commented May 13, 2020

CarolEidt left a comment

CarolEidt May 13, 2020

CarolEidt May 13, 2020

CarolEidt May 13, 2020

tannergooding May 13, 2020

CarolEidt May 13, 2020

tannergooding commented May 13, 2020

CarolEidt left a comment

	/// <summary>
	/// float64x2_t vsetq_lane_f64 (float64_t a, float64x2_t v, const int lane)
	/// A32: VMOV.F64 Dd, Dm
	/// A64: INS Vd.D[lane], Vn.D[0]
	/// </summary>
	public static Vector128<double> Insert(Vector128<double> vector, byte index, double data) => Insert(vector, index, data);

	/// <summary>
	/// int64x2_t vsetq_lane_s64 (int64_t a, int64x2_t v, const int lane)
	/// A32: VMOV.64 Dd, Rt, Rt2
	/// A64: INS Vd.D[lane], Xn
	/// </summary>
	public static Vector128<long> Insert(Vector128<long> vector, byte index, long data) => Insert(vector, index, data);

	/// <summary>
	/// uint64x2_t vsetq_lane_u64 (uint64_t a, uint64x2_t v, const int lane)
	/// A32: VMOV.64 Dd, Rt, Rt2
	/// A64: INS Vd.D[lane], Xn
	/// </summary>
	public static Vector128<ulong> Insert(Vector128<ulong> vector, byte index, ulong data) => Insert(vector, index, data);

Optimize Vector64 and Vector128.Create methods #36267

Optimize Vector64 and Vector128.Create methods #36267

Conversation

tannergooding commented May 12, 2020

tannergooding commented May 12, 2020

tannergooding commented May 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding May 12, 2020 • edited Loading

Choose a reason for hiding this comment

tannergooding May 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented May 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented May 13, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented May 13, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding May 12, 2020 •

edited

Loading

tannergooding May 12, 2020 •

edited

Loading