Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Add xtn and xtn2 intrinsics codegen, api and tests. #33108

Merged

Conversation

TamarChristinaArm
Copy link
Contributor

@TamarChristinaArm TamarChristinaArm commented Mar 3, 2020

Hi All,

This adds the Extract and Narrow intrinsics.

The special codegen is because I couldn't get it
to correctly determine the right size for the intrinsic.

If there's a simpler way to do this do let me know.

/CC @tannergooding @CarolEidt @echesakovMSFT

Implements parts of #31324

Thanks,
Tamar

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Mar 3, 2020
@TamarChristinaArm TamarChristinaArm force-pushed the implement-extract-and-narrow branch from af54747 to 3fb7571 Compare March 3, 2020 16:24
@TamarChristinaArm
Copy link
Contributor Author

Those test failures don't seem to have anything to do with the intrinsics.. before I rebase did you have any comments @tannergooding or @echesakovMSFT ?

@TamarChristinaArm
Copy link
Contributor Author

@tannergooding @echesakovMSFT So when I try to do it with just

+HARDWARE_INTRINSIC(AdvSimd,         ExtractAndNarrowLow,                       -1,               8,           1,     {INS_invalid,           INS_invalid,        INS_xtn,            INS_xtn,            INS_xtn,            INS_xtn,            INS_xtn,            INS_xtn,            INS_invalid,        INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_NoContainment|HW_Flag_UnfixedSIMDSize)
+HARDWARE_INTRINSIC(AdvSimd,         ExtractAndNarrowHigh,                      -1,              16,           2,     {INS_invalid,           INS_invalid,        INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_xtn2,           INS_invalid,        INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_NoContainment|HW_Flag_UnfixedSIMDSize)

i.e. no special treatment it just crashes

Assert failure(PID 26767 [0x0000688f], Thread: 26767 [0x688f]): Assertion failed '!"Unexpected HW Intrinsic"' in 'JIT.HardwareIntrinsics.Arm.SimpleUnaryOpTest__ExtractAndNarrowLow_Vector128_Int16:RunBasicScenario_UnsafeRead():this' during 'Importation' (IL size 87)

    File: ~/git/runtime/src/coreclr/src/jit/hwintrinsic.cpp Line: 675
    Image: ~/git/runtime/artifacts/tests/coreclr/Linux.arm64.Debug/Tests/Core_Root/corerun

Since the intrinsics lookup seems to fail...

from what I remember

HWIntrinsicInfo::lookupIns(intrinsic, baseType)

fails which is why I added the HW_Flag_BaseTypeFromFirstArg and HW_Flag_BaseTypeFromSecondArg the first time.

The problem is that the intrinsics doesn't exist for a type of byte since you can't narrow that any further. So using the return type here is wrong.

@tannergooding
Copy link
Member

The problem is that the intrinsics doesn't exist for a type of byte since you can't narrow that any further. So using the return type here is wrong.

Are there any issues with encoding the instruction based on the return type, rather than the input type? Is the input type size needed for importation or codegen elsewhere?

@TamarChristinaArm
Copy link
Contributor Author

The problem is that the intrinsics doesn't exist for a type of byte since you can't narrow that any further. So using the return type here is wrong.

Are there any issues with encoding the instruction based on the return type, rather than the input type? Is the input type size needed for importation or codegen elsewhere?

I think that should be ok. I'll give that a try. It would make the narrowing and widening intrinsics a bit odd in the intrinsics list though, but I guess a comment could fix that :)

@echesakov
Copy link
Contributor

I think the way JIT emitter implements xtn and xtn2 is what causes confusion here.

In my opinion, the size and opts arguments that we pass in emitIns_R_R should correspond to the source register and not to the destination register. We already had to add this for SIMD

// This is not the same as genGetSimdInsOpt()
// Basetype is the soure operand type
// However encoding is based on the destination operand type which is 1/2 the basetype.
switch (baseType)
{
case TYP_ULONG:
case TYP_LONG:
opt = INS_OPTS_2S;
opt2 = INS_OPTS_4S;
break;
case TYP_UINT:
case TYP_INT:
opt = INS_OPTS_4H;
opt2 = INS_OPTS_8H;
break;
case TYP_USHORT:
case TYP_SHORT:
opt = INS_OPTS_8B;
opt2 = INS_OPTS_16B;
break;
default:
assert(!"Unsupported narrowing element type");
unreached();
}

in order to overcome this behavior.

I propose we fix the emitter such that emitIns_R_R expects EA_16BYTE for both instructions and the vector arrangement parameters corresponds to the source register

Then we do the following:

  1. Keep HW_Flag_BaseTypeFromFirstArg for NI_AdvSimd_ExtractAndNarrowLow
  2. Keep HW_Flag_BaseTypeFromSecondArg for NI_AdvSimd_ExtractAndNarrowHigh
  3. We can make NI_AdvSimd_ExtractAndNarrowLow table-driven but will have to manually codegen for NI_AdvSimd_ExtractAndNarrowHigh - since we need to do if (targetReg != op1Reg) mov targetReg, op1Reg as discussed above.

@tannergooding
Copy link
Member

In my opinion, the size and opts arguments that we pass in emitIns_R_R should correspond to the source register and not to the destination register

I don't think which matters as long as we are consistent overall. For instructions which have the same sized inputs/outputs, it doesn't matter.
For instructions like XTN, we have to encode the size of the source and the destination so as long as we pick one and can easily determine the other, it should be fine.

The default behavior on x86 was that the type came from the destination and you explicitly opted in using the first/second arg only when necessary (basically just when returning a non-SIMD type or void), because this is by far the most common and is how most of the underlying instructions were actually differentiated.

@tannergooding
Copy link
Member

tannergooding commented Mar 26, 2020

since we need to do if (targetReg != op1Reg) mov targetReg, op1Reg as discussed above.

Can we not just detect this and handle it more generally via some flag (like if it is marked RMW)? We have a number of intrinsics like this and they will all need the same handling.

We have similar logic shared across x86 where we will do the if (op1Reg != targetReg) { mov targetReg, op1Reg } ins targetReg, op2Reg on non-VEX but just emit ins targetReg, op1Reg, op2Reg on VEX enabled hardware.

@echesakov
Copy link
Contributor

Can we not just detect this and handle it more generally via some flag (like if it is marked RMW)? We have a number of intrinsics like this and they will all need the same handling.

We have similar logic shared across x86 where we will do the if (op1Reg != targetReg) { mov targetReg, op1Reg } ins targetReg, op2Reg on non-VEX but just emit ins targetReg, op1Reg, op2Reg on VEX enabled hardware.

Yes, this was a plan. As I mentioned in #33889 (comment) I am changing NoRMW to HasRMW flag on Arm64 and I planned to add automatic handling of RMW intrinsics in codegen at the same time. However, I was not sure whether that PR will be up before Tamar finishes this one so I asked to manually handle this case.

@TamarChristinaArm TamarChristinaArm force-pushed the implement-extract-and-narrow branch from 3fb7571 to f733248 Compare March 31, 2020 19:05
@TamarChristinaArm
Copy link
Contributor Author

Looks like the two failures are infrastructure related

@TamarChristinaArm TamarChristinaArm force-pushed the implement-extract-and-narrow branch from f733248 to 323a277 Compare April 2, 2020 16:21
Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you Tamar!

@echesakov echesakov merged commit 8259df7 into dotnet:master Apr 3, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants