-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Add AddAcross #101674
JIT ARM64-SVE: Add AddAcross #101674
Conversation
Note regarding the
|
@@ -5314,11 +5314,11 @@ void CodeGen::genArm64EmitterUnitTestsSve() | |||
#endif // ALL_ARM64_EMITTER_UNIT_TESTS_SVE_UNSUPPORTED | |||
|
|||
// IF_SVE_AI_3A | |||
theEmitter->emitIns_R_R_R(INS_sve_saddv, EA_1BYTE, REG_V1, REG_P4, REG_V2, | |||
theEmitter->emitIns_R_R_R(INS_sve_saddv, EA_SCALABLE, REG_V1, REG_P4, REG_V2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the codegen changes:
For these instructions, arg2 (EA_1BYTE
etc) is never used as the return value is dependent on the input type which is already specified in opt.
Switching arg2 to EA_SCALABLE means there is no need to write special hwinstrinsiccodegen code.
I've changed the bare minimal of instructions needed to make this patch work. There are quite a few more reduction like instructions - we should do those as we get to them in the API
@@ -0,0 +1,302 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. | |||
// The .NET Foundation licenses this file to you under the MIT license. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a copy of _SveUnaryOpTestTemplate.template with conditional tests removed as they can't be used for reduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do cndSel(mask, AddAcross(a), falseVal)
. The result of this would be:
- If 0th lane was active in mask, it would have the result of AddAcross(a)
- For all the other lanes, it will either have 0 (for active lanes) or falseVal for in-active lanes
If that makes sense, can you add those in your new template? This might not be super meaningful but want to make sure that we test the jit code paths at least.
cc: @tannergooding
a0aaade
to
df9385b
Compare
df9385b
to
a3d0161
Compare
@kunalspathak @dotnet/arm64-contrib |
superpmi-* failures are #101685 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some questions and minor changes.
@@ -0,0 +1,302 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. | |||
// The .NET Foundation licenses this file to you under the MIT license. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do cndSel(mask, AddAcross(a), falseVal)
. The result of this would be:
- If 0th lane was active in mask, it would have the result of AddAcross(a)
- For all the other lanes, it will either have 0 (for active lanes) or falseVal for in-active lanes
If that makes sense, can you add those in your new template? This might not be super meaningful but want to make sure that we test the jit code paths at least.
cc: @tannergooding
@@ -1769,6 +1803,8 @@ private static byte HighNarrowing(ushort op1, bool round) | |||
|
|||
public static ushort AddWidening(ushort op1, byte op2) => (ushort)(op1 + op2); | |||
|
|||
public static ulong AddWidening(ulong op1, byte op2) => (ulong)(op1 + (ulong)op2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also not need the following?
public static uint AddWidening(uint op1, byte op2) => (uint)(op1 + (uint)op2);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for the testing. Sve.AddAcross()
always widens to 64bits regardless of the input.
helper.cs
seems to be taking the approach of only adding helper functions when they are needed.
return reduceOp(op1[0], op1[1]); | ||
} | ||
|
||
if (op1.Length % 2 != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have input parameters that exercise this condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This would require an vector where the number of elements was not a power of 2.
I wasn't sure if there was a way of raising an exception in the testing. So instead, NaN would ensure the test failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/ba-g Failure is #101721 |
* JIT ARM64-SVE: Add AddAcross * Remove enum changes * Fix SVE tests max vector size to 512bit * fix zip test cases --------- Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
* JIT ARM64-SVE: Add AddAcross * Remove enum changes * Fix SVE tests max vector size to 512bit * fix zip test cases --------- Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
Contributes towards #99957