[RyuJIT] Emit shlx, sarx, shrx on x64 #67182

JulieLeeMSFT · 2022-03-26T00:55:36Z

Fixes #41881.
Generates shlx, sarx, shrx for 64 bit shifts if BMI2 platform.

ulong Shlx(ulong x, int y) => x << y;
long Sarx(long x, int y) => x >> y;
ulong Shrx(ulong x, int y) => x >> y;

Current codegen:

; Method  Test:Shlx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shl      eax, cl
       ret      

; Method Test:Sarx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       sar      eax, cl
       ret      

; Method Test:Shrx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shr      eax, cl
       ret

New codegen:

for method Test:Shlx(long,int):long
    C4E2E9F7C1           shlx     rax, rcx, rdx

for method Test:Sarx(long,int):long
    C4E2EAF7C1           sarx     rax, rcx, rdx

for method Test:Shrx(long,int):long
    C4E2EBF7C1           shrx     rax, rcx, rdx

It needs further work to remove mov when memory address is used instead of all registers (handle it when enabling contained form in #67314).

ulong ShrxRef(ulong *x, int y) => *x >> y;
    488B01              mov    rax, qword ptr [rcx]
    C4E2EBF7C0          shrx    rax, rax, rdx

x86 support needs to be enabled (added it in #67314).

ghost · 2022-03-26T00:55:45Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #41881.
Generates shlx, sarx, shrx if BMI2 platform and result is TYP_LONG.

ulong Shlx(ulong x, int y) => x << y;
long Sarx(long x, int y) => x >> y;
ulong Shrx(ulong x, int y) => x >> y;

Current codegen:

; Method  Test:Shlx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shl      eax, cl
       ret      

; Method Test:Sarx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       sar      eax, cl
       ret      

; Method Test:Shrx(long,int):long
       mov      eax, ecx
       mov      ecx, edx
       shr      eax, cl
       ret

New codegen:

for method Test:Shlx(long,int):long
    C4E2E9F7C1           shlx     rax, rcx, rdx

for method Test:Sarx(long,int):long
    C4E2EAF7C1           sarx     rax, rcx, rdx

for method Test:Shrx(long,int):long
    C4E2EBF7C1           shrx     rax, rcx, rdx

It needs further work to remove mov when memory address is used instead of all registers.

ulong ShrxRef(ulong *x, int y) => *x >> y;
    488B01              mov    rax, qword ptr [rcx]
    C4E2EBF7C0          shrx    rax, rax, rdx

Author:	JulieLeeMSFT
Assignees:	JulieLeeMSFT
Labels:	`area-CodeGen-coreclr`
Milestone:	-

src/tests/JIT/SIMD/ShiftOperations.csproj

src/coreclr/jit/emitxarch.cpp

src/coreclr/jit/codegenxarch.cpp

tannergooding · 2022-03-27T06:13:17Z

src/coreclr/jit/codegenxarch.cpp

+
+        regNumber shiftByReg = shiftBy->GetRegNum();
+        emitAttr  size       = emitTypeSize(tree);
+        GetEmitter()->emitIns_R_R_R(ins, size, tree->GetRegNum(), shiftByReg, operandReg);


Might be worth a note that we don't currently support the contained form due to more complex changes being needed in the emitter.

Ideally we'd fix up the emitter and use inst_RV_RV_TT instead so that we can emit shlx r32a, r/m32, r32b. Someone would need to walk through the relevant IF_RWR_RRD_*RD formats and ensure that it's all handled correctly (noting that technically the format is IF_RWR_*RD_RRD but that should be the same as IF_RWR_RRD_*RD with swapping op1/op2, like we do for a couple other BMI2 instructions, namely bextr and bzhi).

Opened #67314.

As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.

Added commnets

As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.

src/coreclr/jit/lsraxarch.cpp

tannergooding · 2022-03-27T06:23:19Z

Change generally LGTM. Left a suggestion around the ordering of the opportunistic check and potentially logging an issue or a comment around adding support for containment in the future.

JulieLeeMSFT · 2022-03-29T23:11:00Z

Change generally LGTM. Left a suggestion around the ordering of the opportunistic check and potentially logging an issue or a comment around adding support for containment in the future.

Opened #67314.

JulieLeeMSFT · 2022-03-29T23:25:37Z

@kunalspathak PTAL.
cc @dotnet/jit-contrib.

kunalspathak · 2022-03-30T05:00:26Z

Can you check why there is a regression in coreclr_tests windows/x64?

Total bytes of base: 59717535 (overridden on cmd)
Total bytes of diff: 59717754 (overridden on cmd)
Total bytes of delta: 219 (0.00 % of base)
    diff is a regression.
    relative diff is an improvement.

kunalspathak

Added some comments. I think we need to understand the regression in coreclr/libraries test.

kunalspathak · 2022-03-30T05:03:07Z

src/coreclr/jit/lowerxarch.cpp

@@ -4754,7 +4754,7 @@ void Lowering::ContainCheckDivOrMod(GenTreeOp* node)
 void Lowering::ContainCheckShiftRotate(GenTreeOp* node)
 {
    assert(node->OperIsShiftOrRotate());
-#ifdef TARGET_X86
+#if defined(TARGET_X86)


you could just revert this change...

kunalspathak · 2022-03-30T05:06:19Z

src/coreclr/jit/lsraxarch.cpp

@@ -932,6 +932,16 @@ int LinearScan::BuildShiftRotate(GenTree* tree)
    {
        assert(shiftBy->OperIsConst());
    }
+#if defined(TARGET_64BIT)


Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?

I wonder why it worked for x86 without changing this part of the code?

Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?

Isn't InstructionSet_BMI2 the 32 bit version and for 64 bit you'd want InstructionSet_BMI2_X64 ? If so and you checked 64 bit correctly elsewhere and then 32 here it would explain it working for 32.

Double checked that my change handles only x64 case. Will open an issue to handle x86.

kunalspathak · 2022-03-30T05:07:31Z

src/coreclr/jit/codegenxarch.cpp

@@ -4378,6 +4378,7 @@ void CodeGen::genCodeForShift(GenTree* tree)
            int shiftByValue = (int)shiftBy->AsIntConCommon()->IconValue();

 #if defined(TARGET_64BIT)
+


delete the extra line.

kunalspathak · 2022-03-30T05:08:01Z

src/coreclr/jit/codegenxarch.cpp

@@ -4397,6 +4398,36 @@ void CodeGen::genCodeForShift(GenTree* tree)
            inst_RV_SH(ins, size, tree->GetRegNum(), shiftByValue);
        }
    }
+#if defined(TARGET_64BIT)


same here...why is it only for x64?

Handling for x64 only for now.

kunalspathak · 2022-03-30T05:10:36Z

src/coreclr/jit/codegenxarch.cpp

+
+        regNumber shiftByReg = shiftBy->GetRegNum();
+        emitAttr  size       = emitTypeSize(tree);
+        GetEmitter()->emitIns_R_R_R(ins, size, tree->GetRegNum(), shiftByReg, operandReg);


As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.

kunalspathak · 2022-03-30T05:12:38Z

src/coreclr/jit/emitxarch.cpp

@@ -749,6 +749,9 @@ bool emitter::TakesRexWPrefix(instruction ins, emitAttr attr)
            case INS_pdep:
            case INS_pext:
            case INS_rorx:
+            case INS_shlx:


As per the changes at other places, we have the code to have it supported only for TARGET_64BIT, but here, it is under TARGET_ARM64. What is the difference and should it be consistent?

It is under #ifdef TARGET_AMD64, not ARM64. So, I guess it is correct.

kunalspathak · 2022-03-30T05:13:51Z

src/coreclr/jit/emitxarch.cpp

@@ -987,17 +990,25 @@ unsigned emitter::emitOutputRexOrVexPrefixIfNeeded(instruction ins, BYTE* dst, c
                                case INS_rorx:
                                case INS_pdep:
                                case INS_mulx:
+                                case INS_shrx:


this code path is also for x86?

Added #ifdef TARGET_64BIT.

kunalspathak · 2022-03-30T05:15:03Z

src/coreclr/jit/emitxarch.cpp

            {
-                // BMI bextr and bzhi encodes the reg2 in VEX.vvvv and reg3 in modRM,
+                // BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,


Suggested change

// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,

// BMI bextr, bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,

We need to have similar comment where we swap the operands I pointed out.

Added comment.

nit: You have already added a comment in codegenxarch.cpp. No need here. The above comment covers it.

kunalspathak · 2022-03-30T05:15:17Z

src/coreclr/jit/emitxarch.cpp

@@ -10288,6 +10302,7 @@ BYTE* emitter::emitOutputAM(BYTE* dst, instrDesc* id, code_t code, CnsVal* addc)
    // For this format, moves do not support a third operand, so we only need to handle the binary ops.
    if (TakesVexPrefix(ins))
    {
+


kunalspathak · 2022-03-30T05:18:28Z

src/coreclr/jit/emitxarch.cpp

+        case INS_sarx:
+        case INS_shrx:
+        {
+            result.insLatency    = PERFSCORE_LATENCY_2C;


Suggested change

result.insLatency = PERFSCORE_LATENCY_2C;

result.insLatency += PERFSCORE_LATENCY_1C;

It should be similar to that of rorx AFAIK.

Qhere are you getting that number from? I think that matches the newer hardware and not Skylake, which is what we have used for the other numbers

Updated to result.insLatency += PERFSCORE_LATENCY_1C;.

BruceForstall · 2022-03-30T23:21:30Z

src/tests/JIT/SIMD/ShiftOperations.cs

+		if (resUInt != expectedUInt)
+		{
+			Console.Write(" != {0} Failed.\n", expectedUInt);
+			return 101;


You should consider letting all the tests run, and not exiting on the first failure. Just make sure to return 101 if any test fails.

BruceForstall · 2022-03-30T23:23:06Z

src/coreclr/jit/codegenxarch.cpp

+        genProduceReg(tree);
+
+        return;


Delete these lines, and let the normal fall-through make this call and return

JulieLeeMSFT · 2022-04-22T21:12:10Z

Reran asmdiffs and it is an improvement.

[14:06:59] Summary of Code Size diffs:
[14:06:59] (Lower is better)
[14:06:59]
[14:06:59] Total bytes of base: 129839366 (overridden on cmd)
[14:06:59] Total bytes of diff: 129826617 (overridden on cmd)
[14:06:59] Total bytes of delta: -12749 (-0.01 % of base)
[14:06:59]
[14:06:59]
[14:06:59] 0 total files with Code Size differences (0 improved, 0 regressed), 1125 unchanged.
[14:06:59]
[14:06:59] 0 total methods with Code Size differences (0 improved, 0 regressed), 0 unchanged.

Enabled this only for x64, not x86. Will open a new issue to address it for x86.
Edit: Added comment in the existing issue: #67314

JulieLeeMSFT · 2022-04-23T02:11:52Z

All tests passed. @kunalspathak PTAL.
cc @dotnet/jit-contrib.

kunalspathak

Overall looks good. Added some minor suggestions.

kunalspathak · 2022-04-23T02:49:47Z

src/tests/issues.targets

@@ -1474,7 +1473,10 @@
        <ExcludeList Include="$(XunitTestBinBase)/JIT/SIMD/Vector3Interop_ro/*">
            <Issue>https://github.com/dotnet/runtime/issues/46174</Issue>
        </ExcludeList>
-
+        <ExcludeList Include="$(XunitTestBinBase)/JIT/SIMD/ShiftOperations/*">
+          <Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue>


Suggested change

<Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue>

<Issue>There is a known undefined behavior with shifts and 0xFFFFFFFF overflows, so skip the test for mono.</Issue>

kunalspathak · 2022-04-23T02:51:09Z

src/coreclr/jit/lowerxarch.cpp

@@ -4803,7 +4803,7 @@ void Lowering::ContainCheckShiftRotate(GenTreeOp* node)
        assert(source->OperGet() == GT_LONG);
        MakeSrcContained(node, source);
    }
-#endif // !TARGET_X86
+#endif


revert this change.

This has been alrady removed.

I mean put it back (revert the deletion of the comment // !TARGET_X86).

kunalspathak · 2022-04-23T02:53:56Z

src/coreclr/jit/codegenxarch.cpp

+                unreached();
+        }
+
+        // It handles all register forms, but it does not handle contained form for memory operand.


This comment makes more sense next to your lsraxarch.cpp change.

Moved it to lsraxarch.cpp.

kunalspathak · 2022-04-23T02:59:45Z

src/coreclr/jit/emitxarch.cpp

            {
-                // BMI bextr and bzhi encodes the reg2 in VEX.vvvv and reg3 in modRM,
+                // BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,


nit: You have already added a comment in codegenxarch.cpp. No need here. The above comment covers it.

kunalspathak · 2022-04-23T03:02:03Z

src/tests/JIT/SIMD/ShiftOperations.cs

@@ -0,0 +1,501 @@
+using System;


Include the license.

kunalspathak · 2022-04-23T03:05:36Z

src/tests/JIT/SIMD/ShiftOperations.cs

+            shiftBy = 31;
+            resUInt = Shlx32bit(valUInt, shiftBy);
+            expectedUInt = (uint) (valUInt * Math.Pow(2, (shiftBy % MOD32)));
+            Console.Write("UnitTest Shlx32bit({0},{1}): {2}", valUInt, shiftBy, resUInt);


Why not create a Validate() function that will do most of these things at one place?

public int Validate(..., actual, ...) { expected = if (expected != actual) { Console.WriteLine("Fail"); return 101; } return 100; }

You can then just have an input array and iterate over it or something like that.

kunalspathak · 2022-04-23T03:07:39Z

Also, there are some significant regressions. Did you figure out why? Eg. here is from asp.net collection.

          16 (8.51 % of base) : 6903.dasm - System.Number:ComputeProductApproximation(int,long,long):System.ValueTuple`2[UInt64,UInt64]
          14 (0.87 % of base) : 13765.dasm - V8.Crypto.BigInteger:modPow(V8.Crypto.BigInteger,V8.Crypto.BigInteger):V8.Crypto.BigInteger:this
           9 (1.80 % of base) : 13749.dasm - V8.Crypto.BigInteger:subTo(V8.Crypto.BigInteger,V8.Crypto.BigInteger):this
           8 (0.27 % of base) : 26580.dasm - System.Text.RegularExpressions.Symbolic.SymbolicRegexMatcher`1[UInt64][System.UInt64]:FindEndPositionCapturing(System.ReadOnlySpan`1[Char],int,byref,PerThreadData[UInt64]):int:this
           8 (1.61 % of base) : 13767.dasm - V8.Crypto.BigInteger:addTo(V8.Crypto.BigInteger,V8.Crypto.BigInteger):this
           7 (4.43 % of base) : 15897.dasm - BenchmarksGame.ByteString:GetHashCode():int:this
           7 (0.08 % of base) : 7387.dasm - System.Text.RegularExpressions.RegexInterpreter:TryMatchAtCurrentPosition(System.ReadOnlySpan`1[Char]):bool:this
           6 (0.34 % of base) : 27178.dasm - Microsoft.CodeAnalysis.PEModule:GetTypeAndConstructor(System.Reflection.Metadata.MetadataReader,System.Reflection.Metadata.CustomAttributeHandle,byref,byref):bool
           6 (1.61 % of base) : 17686.dasm - System.Collections.BitArray:set_Length(int):this
           6 (5.45 % of base) : 3915.dasm - System.Text.Encodings.Web.TextEncoderSettings:AllowRange(System.Text.Unicode.UnicodeRange):this
           5 (1.18 % of base) : 16464.dasm - System.Number:AssembleFloatingPointBits(byref,long,int,bool):long
           4 (0.05 % of base) : 28137.dasm - Microsoft.CodeAnalysis.CSharp.Binder:FoldNeverOverflowBinaryOperators(int,Microsoft.CodeAnalysis.ConstantValue,Microsoft.CodeAnalysis.ConstantValue):System.Object
           4 (0.36 % of base) : 27480.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.Metadata.PE.PENamedTypeSymbol:MakeDeclaredBaseType():Microsoft.CodeAnalysis.CSharp.Symbols.NamedTypeSymbol:this
           4 (0.40 % of base) : 27192.dasm - Microsoft.CodeAnalysis.MetadataReaderExtensions:IsTheObjectClass(System.Reflection.Metadata.MetadataReader,System.Reflection.Metadata.TypeDefinition):bool
           4 (4.30 % of base) : 13741.dasm - MontgomeryReducer:.ctor(V8.Crypto.BigInteger):this
           4 (1.53 % of base) : 5738.dasm - System.Net.Http.HPack.IntegerEncoder:Encode(int,int,System.Span`1[Byte],byref):bool
           4 (3.45 % of base) : 16465.dasm - System.Number:RightShiftWithRounding(long,int,bool):long
           4 (4.40 % of base) : 12332.dasm - System.Text.RegularExpressions.Symbolic.BDD:GetMin():long:this
           3 (1.48 % of base) : 1197.dasm - <>c:<EmitMatchCharacterClass>b__122_1(System.Span`1[Char],System.String):this
           3 (0.18 % of base) : 9584.dasm - BenchmarksGame.BinaryTrees_5:Bench(int,bool):int

JulieLeeMSFT · 2022-05-12T02:22:43Z

Also, there are some significant regressions. Did you figure out why? Eg. here is from asp.net collection.

They are due to register assignment changes. Bytes regressed in some files but perfscores are better. Some adds NOP of 4 bytes.

benchmarks.run.windows.x64.checked.mch:

Top method regressions (bytes):
16 (8.51 % of base) : 6903.dasm
14 (0.87 % of base) : 13765.dasm

16 (8.51 % of base) : 6903.dasm: better perfscore for diff.

Base:

       shr      r11, cl
						;; size=13 bbWeight=0.50 PerfScore 1.12

Diff:

       shrx     r11, r11, rdx
						;; size=15 bbWeight=0.50 PerfScore 0.38

14 (0.87 % of base) : 13765.dasm

Base:

       sub      ecx, eax
       sar      edx, cl
       and      edx, dword ptr [rsp+7CH]
       jmp      SHORT G_M1573_IG21
						;; size=40 bbWeight=2    PerfScore 37.50

Diff:

       mov      edx, ebx
       sub      edx, eax
       sarx     ecx, ecx, edx
       and      ecx, dword ptr [rsp+7CH]
       jmp      G_M1573_IG21
						;; size=46 bbWeight=2    PerfScore 34.50

aspnet.run.windows.x64.checked.mch:

Top method regressions (bytes):

31 (11.79 % of base) : 29879.dasm
- Checked it and shlx block is 3 bytes larger but perfScore is 28.71 compared to base PerfScore 34.65.
19 (8.56 % of base) : 6388.dasm
Base:
; Total bytes of code 222, prolog size 14, PerfScore 3321.21, instruction count 69
Diff
; Total bytes of code 241, prolog size 14, PerfScore 2796.93, instruction count 69
19 (8.56 % of base) : 5120.dasm
-- It add NOP of 4 bytes
Base:

G_M35765_IG02:        ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byref, isz
       test     rdi, rdi
       je       SHORT G_M35765_IG09
       xor      ebx, ebx
       mov      ebp, dword ptr [rdi+8]
       test     ebp, ebp
       jle      SHORT G_M35765_IG08
						;; size=18 bbWeight=1    PerfScore 4.75

Diff

G_M35765_IG02:        ; gcrefRegs=000000C0 {rsi rdi}, byrefRegs=00000000 {}, byref
       test     rdi, rdi
       je       G_M35765_IG09
       xor      ebx, ebx
       mov      ebp, dword ptr [rdi+8]
       test     ebp, ebp
       jle      SHORT G_M35765_IG08
		  ;; NOP compensation instructions of 4 bytes.
						;; size=22 bbWeight=1    PerfScore 4.75

JulieLeeMSFT · 2022-05-13T19:39:44Z

Why is this only for x64, not for x86 as well? Is there significant work to enable it for x86? Shouldn't it just fall out?

It was not a simple fall out, so I discussed with Kunal to skip x86 for now.

BruceForstall · 2022-05-13T19:47:28Z

It was not a simple fall out, so I discussed with Kunal to skip x86 for now.

Can you describe what problems were encountered, and what is required to implement it for x86? The tracking issue #67314 doesn't have any details.

JulieLeeMSFT · 2022-05-13T19:59:02Z

It was not a simple fall out, so I discussed with Kunal to skip x86 for now.

Can you describe what problems were encountered, and what is required to implement it for x86? The tracking issue #67314 doesn't have any details.

It has been a while so I cannot remember exactly, but it was not generating the right code. lsraxarch and emitxarch need to be looked into. Updated #67314.

JulieLeeMSFT · 2022-05-13T20:04:44Z

/azp list

azure-pipelines · 2022-05-13T20:04:49Z

CI/CD Pipelines for this repository: runtime-coreclr outerloop runtime-coreclr jitstress runtime-coreclr jitstressregs runtime-coreclr jitstress2-jitstressregs runtime-coreclr gcstress0x3-gcstress0xc runtime-coreclr gcstress-extra runtime-coreclr r2r-extra runtime-coreclr jitstress-isas-x86 runtime-coreclr jitstress-isas-arm runtime-coreclr jitstressregs-x86 runtime-coreclr libraries-jitstressregs runtime-coreclr libraries-jitstress2-jitstressregs runtime-coreclr r2r runtime-coreclr runincontext runtime-coreclr crossgen2 runtime-libraries-coreclr outerloop runtime-libraries-coreclr outerloop-windows runtime-libraries-coreclr outerloop-linux runtime-libraries-coreclr outerloop-osx runtime runtime-libraries stress-http runtime-libraries stress-ssl runtime-dev-innerloop runtime-coreclr crossgen2 outerloop coreclr-release-outerloop-nightly sync-runtime-to-mono runtime-coreclr crossgen2-composite runtime-jit-experimental runtime-coreclr libraries-jitstress dotnet-linker-tests runtime-coreclr ilasm runtime-coreclr crossgen2-composite gcstress runtime-libraries-mono outerloop runtime-staging runtime-coreclr pgo runtime-coreclr libraries-pgo coreclr-gc-regions Antigen runtime-community Fuzzlyn runtime-coreclr superpmi-replay runtime-wasm runtime-coreclr superpmi-diffs runtime-coreclr superpmi-asmdiffs-checked-release runtime-manual runtime-extra-platforms jit-cfg perf-wasm runtime-llvm

JulieLeeMSFT · 2022-05-14T00:19:48Z

@kunalspathak and @BruceForstall it is ready to review.

kunalspathak · 2022-05-14T00:39:03Z

src/tests/JIT/SIMD/ShiftOperations.cs

+            long expectedLong = 0;
+            int MOD64 = 64;
+
+/* TODO: Enable 32bit test when x86 shift is enabled.


I would suggest deleting the commented code.

OOF

src/coreclr/jit/lsraxarch.cpp

kunalspathak · 2022-05-17T17:58:28Z

src/tests/JIT/SIMD/ShiftOperations.cs

+
+        try
+        {
+            ulong valULong = 0;


I would define these variables closer to their usage.

kunalspathak · 2022-05-17T17:58:45Z

src/tests/JIT/SIMD/ShiftOperations.cs

+            ulong valULong = 0;
+            long valLong = 0;
+            int shiftBy = 0;
+            ulong resULong = 0;


Is it worth adding short and ints as test cases?

kunalspathak

LGTM...added some suggestions.

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>

kunalspathak · 2022-05-19T04:56:46Z

src/tests/JIT/SIMD/ShiftOperations.cs

+        switch (x)
+        {
+            case ulong a:
+                ulong resUlong = ((ulong)a) << y;


Why doesn't this generic work?

T res = ((T)x) << y; return (R)Convert.ChangeType(res,typeof(R));

shift operators do not work on generics.
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/bitwise-and-shift-operators
Because the shift operators are defined only for the int, uint, long, and ulong types, the result of an operation always contains at least 32 bits. If the left-hand operand is of another integral type (sbyte, byte, short, ushort, or char), its value is converted to the int type

@kunalspathak, this is the error message.
error CS0019: Operator '<<' cannot be applied to operands of type 'T' and 'int'
All tests passed, so merging it now. Thanks for all the code reviews.

kunalspathak · 2022-06-02T16:33:31Z

Some improvements in windows x64: System.Collections.Tests.Perf_BitArray dotnet/perf-autofiling-issues#5495

kunalspathak · 2022-06-02T16:42:51Z

Improvements in linux/x64 #67182

kunalspathak · 2022-06-02T16:51:53Z

Improvements windows/x64: dotnet/perf-autofiling-issues#5460

ghost assigned JulieLeeMSFT Mar 26, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 26, 2022

JulieLeeMSFT added this to the 7.0.0 milestone Mar 26, 2022

am11 reviewed Mar 26, 2022

View reviewed changes

src/tests/JIT/SIMD/ShiftOperations.csproj Show resolved Hide resolved

Wraith2 reviewed Mar 26, 2022

View reviewed changes

src/coreclr/jit/emitxarch.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Mar 27, 2022

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Mar 27, 2022

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Mar 27, 2022

View reviewed changes

src/coreclr/jit/lsraxarch.cpp Outdated Show resolved Hide resolved

JulieLeeMSFT force-pushed the 41881_shrx branch from a1a37bd to fb558b3 Compare March 29, 2022 20:05

JulieLeeMSFT changed the title ~~[RyuJIT] Emit shlx, sarx, shrx on x64~~ [RyuJIT] Emit shlx, sarx, shrx on x64 and x86 Mar 29, 2022

JulieLeeMSFT mentioned this pull request Mar 29, 2022

[RyuJIT] Enable contained form for shlx, sarx, shrx and on x86 #67314

Open

JulieLeeMSFT requested a review from kunalspathak March 29, 2022 23:26

kunalspathak requested changes Mar 30, 2022

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 30, 2022

BruceForstall reviewed Mar 30, 2022

View reviewed changes

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 31, 2022

JulieLeeMSFT force-pushed the 41881_shrx branch from fb558b3 to 53575ca Compare April 22, 2022 21:14

JulieLeeMSFT changed the title ~~[RyuJIT] Emit shlx, sarx, shrx on x64 and x86~~ [RyuJIT] Emit shlx, sarx, shrx on x64 Apr 22, 2022

kunalspathak requested changes Apr 23, 2022

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 23, 2022

[Emit shlx, sarx, shrx] Removed x86 unit tests.

2292e05

JulieLeeMSFT closed this May 13, 2022

JulieLeeMSFT reopened this May 13, 2022

kunalspathak reviewed May 14, 2022

View reviewed changes

[Emit shlx, sarx, shrx] Remove 32bit test cases.

f85d34a

kunalspathak reviewed May 17, 2022

View reviewed changes

src/coreclr/jit/lsraxarch.cpp Outdated Show resolved Hide resolved

kunalspathak reviewed May 17, 2022

View reviewed changes

kunalspathak approved these changes May 17, 2022

View reviewed changes

JulieLeeMSFT and others added 2 commits May 18, 2022 09:01

[Emit shlx, sarx, shrx] Update a comment in lsraxarch.cpp

36b26a2

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>

[Emit shlx, sarx, shrx] Added uint, int, ushort and short test cases.

5b454bb

kunalspathak reviewed May 19, 2022

View reviewed changes

JulieLeeMSFT merged commit c0c614c into dotnet:main May 20, 2022

kunalspathak mentioned this pull request May 24, 2022

Regressions in System.Collections.Tests.Perf_BitArray #69728

Closed

This was referenced Jun 2, 2022

[Perf] Changes at 5/20/2022 1:06:58 AM dotnet/perf-autofiling-issues#5476

Closed

[Perf] Changes at 5/20/2022 1:06:58 AM dotnet/perf-autofiling-issues#5460

Closed

JulieLeeMSFT mentioned this pull request Jun 3, 2022

What's new in .NET 7 Preview 5 [WIP] dotnet/core#7441

Closed

ghost locked as resolved and limited conversation to collaborators Jul 2, 2022

JulieLeeMSFT deleted the 41881_shrx branch February 1, 2023 02:41

JulieLeeMSFT added the arch-riscv Related to the RISC-V architecture label May 23, 2023

		@@ -4378,6 +4378,7 @@ void CodeGen::genCodeForShift(GenTree* tree)
		int shiftByValue = (int)shiftBy->AsIntConCommon()->IconValue();

		#if defined(TARGET_64BIT)

	// BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,
	// BMI bextr, bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM,

	result.insLatency = PERFSCORE_LATENCY_2C;
	result.insLatency += PERFSCORE_LATENCY_1C;

	<Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue>
	<Issue>There is a known undefined behavior with shifts and 0xFFFFFFFF overflows, so skip the test for mono.</Issue>

[RyuJIT] Emit shlx, sarx, shrx on x64 #67182

[RyuJIT] Emit shlx, sarx, shrx on x64 #67182

Conversation

JulieLeeMSFT commented Mar 26, 2022 • edited Loading

ghost commented Mar 26, 2022

tannergooding Mar 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Mar 27, 2022

JulieLeeMSFT commented Mar 29, 2022

JulieLeeMSFT commented Mar 29, 2022

kunalspathak commented Mar 30, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JulieLeeMSFT commented Apr 22, 2022 • edited Loading

JulieLeeMSFT commented Apr 23, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak commented Apr 23, 2022

JulieLeeMSFT commented May 12, 2022

benchmarks.run.windows.x64.checked.mch:

aspnet.run.windows.x64.checked.mch:

JulieLeeMSFT commented May 13, 2022

BruceForstall commented May 13, 2022

JulieLeeMSFT commented May 13, 2022

JulieLeeMSFT commented May 13, 2022

azure-pipelines bot commented May 13, 2022

JulieLeeMSFT commented May 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

JulieLeeMSFT commented Mar 26, 2022 •

edited

Loading

tannergooding Mar 27, 2022 •

edited

Loading

JulieLeeMSFT commented Apr 22, 2022 •

edited

Loading

JulieLeeMSFT May 19, 2022 •

edited

Loading