Arm64: Add xtn and xtn2 intrinsics codegen, api and tests. #33108

TamarChristinaArm · 2020-03-03T15:29:28Z

Hi All,

This adds the Extract and Narrow intrinsics.

The special codegen is because I couldn't get it
to correctly determine the right size for the intrinsic.

If there's a simpler way to do this do let me know.

/CC @tannergooding @CarolEidt @echesakovMSFT

Implements parts of #31324

Thanks,
Tamar

Dotnet-GitSync-Bot · 2020-03-03T15:29:32Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

TamarChristinaArm · 2020-03-12T09:01:11Z

Those test failures don't seem to have anything to do with the intrinsics.. before I rebase did you have any comments @tannergooding or @echesakovMSFT ?

src/coreclr/src/jit/emitarm64.cpp

src/coreclr/src/jit/hwintrinsicarm64.cpp

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp

...braries/System.Runtime.Intrinsics.Experimental/ref/System.Runtime.Intrinsics.Experimental.cs

src/coreclr/tests/src/JIT/HardwareIntrinsics/Arm/Shared/GenerateTests.csx

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.cs

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp

TamarChristinaArm · 2020-03-25T13:28:01Z

@tannergooding @echesakovMSFT So when I try to do it with just

+HARDWARE_INTRINSIC(AdvSimd,         ExtractAndNarrowLow,                       -1,               8,           1,     {INS_invalid,           INS_invalid,        INS_xtn,            INS_xtn,            INS_xtn,            INS_xtn,            INS_xtn,            INS_xtn,            INS_invalid,        INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_NoContainment|HW_Flag_UnfixedSIMDSize)
+HARDWARE_INTRINSIC(AdvSimd,         ExtractAndNarrowHigh,                      -1,              16,           2,     {INS_invalid,           INS_invalid,        INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_invalid,        INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_NoContainment|HW_Flag_UnfixedSIMDSize)

i.e. no special treatment it just crashes

Assert failure(PID 26767 [0x0000688f], Thread: 26767 [0x688f]): Assertion failed '!"Unexpected HW Intrinsic"' in 'JIT.HardwareIntrinsics.Arm.SimpleUnaryOpTest__ExtractAndNarrowLow_Vector128_Int16:RunBasicScenario_UnsafeRead():this' during 'Importation' (IL size 87)

    File: ~/git/runtime/src/coreclr/src/jit/hwintrinsic.cpp Line: 675
    Image: ~/git/runtime/artifacts/tests/coreclr/Linux.arm64.Debug/Tests/Core_Root/corerun

Since the intrinsics lookup seems to fail...

from what I remember

HWIntrinsicInfo::lookupIns(intrinsic, baseType)

fails which is why I added the HW_Flag_BaseTypeFromFirstArg and HW_Flag_BaseTypeFromSecondArg the first time.

The problem is that the intrinsics doesn't exist for a type of byte since you can't narrow that any further. So using the return type here is wrong.

tannergooding · 2020-03-25T14:57:53Z

The problem is that the intrinsics doesn't exist for a type of byte since you can't narrow that any further. So using the return type here is wrong.

Are there any issues with encoding the instruction based on the return type, rather than the input type? Is the input type size needed for importation or codegen elsewhere?

TamarChristinaArm · 2020-03-25T15:56:30Z

The problem is that the intrinsics doesn't exist for a type of byte since you can't narrow that any further. So using the return type here is wrong.

Are there any issues with encoding the instruction based on the return type, rather than the input type? Is the input type size needed for importation or codegen elsewhere?

I think that should be ok. I'll give that a try. It would make the narrowing and widening intrinsics a bit odd in the intrinsics list though, but I guess a comment could fix that :)

echesakov · 2020-03-26T03:25:34Z

I think the way JIT emitter implements xtn and xtn2 is what causes confusion here.

In my opinion, the size and opts arguments that we pass in emitIns_R_R should correspond to the source register and not to the destination register. We already had to add this for SIMD

runtime/src/coreclr/src/jit/codegenarm64.cpp

Lines 4332 to 4355 in db23750

    
           // This is not the same as genGetSimdInsOpt() 
        
           // Basetype is the soure operand type 
        
           // However encoding is based on the destination operand type which is 1/2 the basetype. 
        
           switch (baseType) 
        
           { 
        
               case TYP_ULONG: 
        
               case TYP_LONG: 
        
                   opt  = INS_OPTS_2S; 
        
                   opt2 = INS_OPTS_4S; 
        
                   break; 
        
               case TYP_UINT: 
        
               case TYP_INT: 
        
                   opt  = INS_OPTS_4H; 
        
                   opt2 = INS_OPTS_8H; 
        
                   break; 
        
               case TYP_USHORT: 
        
               case TYP_SHORT: 
        
                   opt  = INS_OPTS_8B; 
        
                   opt2 = INS_OPTS_16B; 
        
                   break; 
        
               default: 
        
                   assert(!"Unsupported narrowing element type"); 
        
                   unreached(); 
        
           }

in order to overcome this behavior.

I propose we fix the emitter such that emitIns_R_R expects EA_16BYTE for both instructions and the vector arrangement parameters corresponds to the source register

Then we do the following:

Keep HW_Flag_BaseTypeFromFirstArg for NI_AdvSimd_ExtractAndNarrowLow
Keep HW_Flag_BaseTypeFromSecondArg for NI_AdvSimd_ExtractAndNarrowHigh
We can make NI_AdvSimd_ExtractAndNarrowLow table-driven but will have to manually codegen for NI_AdvSimd_ExtractAndNarrowHigh - since we need to do if (targetReg != op1Reg) mov targetReg, op1Reg as discussed above.

tannergooding · 2020-03-26T04:03:06Z

In my opinion, the size and opts arguments that we pass in emitIns_R_R should correspond to the source register and not to the destination register

I don't think which matters as long as we are consistent overall. For instructions which have the same sized inputs/outputs, it doesn't matter.
For instructions like XTN, we have to encode the size of the source and the destination so as long as we pick one and can easily determine the other, it should be fine.

The default behavior on x86 was that the type came from the destination and you explicitly opted in using the first/second arg only when necessary (basically just when returning a non-SIMD type or void), because this is by far the most common and is how most of the underlying instructions were actually differentiated.

tannergooding · 2020-03-26T04:06:40Z

since we need to do if (targetReg != op1Reg) mov targetReg, op1Reg as discussed above.

Can we not just detect this and handle it more generally via some flag (like if it is marked RMW)? We have a number of intrinsics like this and they will all need the same handling.

We have similar logic shared across x86 where we will do the if (op1Reg != targetReg) { mov targetReg, op1Reg } ins targetReg, op2Reg on non-VEX but just emit ins targetReg, op1Reg, op2Reg on VEX enabled hardware.

echesakov · 2020-03-26T18:30:33Z

Can we not just detect this and handle it more generally via some flag (like if it is marked RMW)? We have a number of intrinsics like this and they will all need the same handling.

We have similar logic shared across x86 where we will do the if (op1Reg != targetReg) { mov targetReg, op1Reg } ins targetReg, op2Reg on non-VEX but just emit ins targetReg, op1Reg, op2Reg on VEX enabled hardware.

Yes, this was a plan. As I mentioned in #33889 (comment) I am changing NoRMW to HasRMW flag on Arm64 and I planned to add automatic handling of RMW intrinsics in codegen at the same time. However, I was not sure whether that PR will be up before Tamar finishes this one so I asked to manually handle this case.

TamarChristinaArm · 2020-04-01T09:25:43Z

Looks like the two failures are infrastructure related

...ies/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.PlatformNotSupported.cs

src/coreclr/src/jit/emitarm64.cpp

src/coreclr/src/jit/hwintrinsiclistarm64.h

echesakov

LGTM. Thank you Tamar!

Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Mar 3, 2020

TamarChristinaArm force-pushed the implement-extract-and-narrow branch from af54747 to 3fb7571 Compare March 3, 2020 16:24

tannergooding reviewed Mar 12, 2020

View reviewed changes

src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Mar 12, 2020

View reviewed changes

src/coreclr/src/jit/hwintrinsicarm64.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Mar 12, 2020

View reviewed changes

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp Outdated Show resolved Hide resolved

tannergooding reviewed Mar 12, 2020

View reviewed changes

...braries/System.Runtime.Intrinsics.Experimental/ref/System.Runtime.Intrinsics.Experimental.cs Outdated Show resolved Hide resolved

echesakov reviewed Mar 13, 2020

View reviewed changes

TamarChristinaArm force-pushed the implement-extract-and-narrow branch from 3fb7571 to f733248 Compare March 31, 2020 19:05

This was referenced Apr 1, 2020

Errors installing the SDK during builds #34015

Closed

Add retry to install script for most of the error dotnet/sdk#11001

Closed

tannergooding approved these changes Apr 1, 2020

View reviewed changes

echesakov reviewed Apr 1, 2020

View reviewed changes

...ies/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.PlatformNotSupported.cs Outdated Show resolved Hide resolved

src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved

src/coreclr/src/jit/hwintrinsiclistarm64.h Outdated Show resolved Hide resolved

Arm64: Add xtn and xtn2 intrinsics codegen, api and tests.

323a277

TamarChristinaArm force-pushed the implement-extract-and-narrow branch from f733248 to 323a277 Compare April 2, 2020 16:21

echesakov approved these changes Apr 2, 2020

View reviewed changes

echesakov merged commit 8259df7 into dotnet:master Apr 3, 2020

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm64: Add xtn and xtn2 intrinsics codegen, api and tests. #33108

Arm64: Add xtn and xtn2 intrinsics codegen, api and tests. #33108

TamarChristinaArm commented Mar 3, 2020 •

edited

Loading

Dotnet-GitSync-Bot commented Mar 3, 2020

TamarChristinaArm commented Mar 12, 2020

TamarChristinaArm commented Mar 25, 2020

tannergooding commented Mar 25, 2020

TamarChristinaArm commented Mar 25, 2020

echesakov commented Mar 26, 2020

tannergooding commented Mar 26, 2020

tannergooding commented Mar 26, 2020 •

edited

Loading

echesakov commented Mar 26, 2020

TamarChristinaArm commented Apr 1, 2020

echesakov left a comment

Arm64: Add xtn and xtn2 intrinsics codegen, api and tests. #33108

Arm64: Add xtn and xtn2 intrinsics codegen, api and tests. #33108

Conversation

TamarChristinaArm commented Mar 3, 2020 • edited Loading

Dotnet-GitSync-Bot commented Mar 3, 2020

TamarChristinaArm commented Mar 12, 2020

TamarChristinaArm commented Mar 25, 2020

tannergooding commented Mar 25, 2020

TamarChristinaArm commented Mar 25, 2020

echesakov commented Mar 26, 2020

tannergooding commented Mar 26, 2020

tannergooding commented Mar 26, 2020 • edited Loading

echesakov commented Mar 26, 2020

TamarChristinaArm commented Apr 1, 2020

echesakov left a comment

Choose a reason for hiding this comment

TamarChristinaArm commented Mar 3, 2020 •

edited

Loading

tannergooding commented Mar 26, 2020 •

edited

Loading