-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fmov arm64 intrinsic in JIT to implement Vector*.CreateScalarUnsafe API #34485
Comments
cc : @tannergooding , @echesakovMSFT , @BruceForstall |
This is the mapping that I can think of at this point. For below APIs that takes data types that are 32-bit or 64-bit variant, use public static Vector64<int> CreateScalarUnsafe(int value);
public static Vector64<uint> CreateScalarUnsafe(uint value);
public static Vector64<float> CreateScalarUnsafe(float value);
public static Vector128<int> CreateScalarUnsafe(int value);
public static Vector128<uint> CreateScalarUnsafe(uint value);
public static Vector128<float> CreateScalarUnsafe(float value);
public static Vector128<long> CreateScalarUnsafe(long value);
public static Vector128<ulong> CreateScalarUnsafe(ulong value);
public static Vector128<double> CreateScalarUnsafe(double value); fmov Sn/Dn, Xn/Wn However for the ones, whose arguments are less than 32-bit variants, use public static Vector64<byte> CreateScalarUnsafe(byte value);
public static Vector64<sbyte> CreateScalarUnsafe(sbyte value);
public static Vector64<short> CreateScalarUnsafe(short value);
public static Vector64<ushort> CreateScalarUnsafe(ushort value);
public static Vector128<byte> CreateScalarUnsafe(byte value);
public static Vector128<sbyte> CreateScalarUnsafe(sbyte value);
public static Vector128<short> CreateScalarUnsafe(short value);
public static Vector128<ushort> CreateScalarUnsafe(ushort value); ins Sn.b[0], w0 # sbyte
ins Sn.h[0], w0 # short Note: We can use @TamarChristinaArm , @tannergooding , @echesakovMSFT |
For |
@kunalspathak As you work on System.Runtime.Intrinsics cases, can you please edit/update the top comment in #33308 to be more reflective of the sets of work to do there? When I created that issue, I was a little sloppy for this particular namespace, since there are so many overloads. I'd prefer that the "checkbox items" be split up as appropriate such that if a PR implements some subset, that becomes a checkbox we can check in the overall tracking issue. |
Sure, I will update that list as I discover them. |
Hmm is this because you can pass this type on to a Vector function by accident? There's no instruction that would use the register as a scalar input that would read the top bits. C compilers usually don't care about this since you can't really do this easily, e.g.
produces
So we don't really care about the top bits going into a scalar operation or a by-element as far as I can tell. |
The former is the default and it ensures you get deterministic results. The latter exists primarily for perf reasons as there may be scenarios where ensuring the upper bits get zeroed (either explicitly by another instruction or implicitly by an operation) may be undesirable. A couple examples are because the upper elements are unused or because you will be explicitly setting them in a later operation. X86On x86, the scalar operations preserve the upper elements by default. That is, if you do What this means is that if you want to ensure "deterministic" results, then you have the following codegen: ; Vector128<float> CreateScalar(float value)
vxorps xmm0, xmm0, xmm0 ; Zero the xmm0 register
vmovss xmm0, xmm0, xmm1 ; Move scalar value (xmm1) into element 0 of xmm0; elements 1, 2, and 3 are taken from xmm0 which was set to zero
; Vector128<float> CreateScalarUnsafe(float value)
vmovss xmm0, xmm0, xmm1 ; Move scalar value (xmm1) into element 0 of xmm0; elements 1, 2, and 3 are taken from xmm0 and could be any value The latter can also be fully elided from codegen (we can just use the value from ARMAs I understand it, ARM is basically the opposite. It zeroes the upper elements by default except for specific instructions like However, it seems like there are likely some cases (like |
I gave more thought to this while working on #35030 and I believe the implementation of for byte, sbyte, short, ushort, int, uint, long, ulong
for float or double
@kunalspathak Let me know if you plan to continue working on this or you want me to do this as continuation of #35030. |
Thanks @echesakovMSFT ! Let me take a stab at it. I will ping you if I hit any blocker. |
@echesakovMSFT 2b is generally the preferred approach.. for If it possible to use the |
@TamarChristinaArm Thanks for the reply! cc @dotnet/jit-contrib This is related to our discussion of "constant pool vs sequence of movs" we had yesterday. |
Add support of
fmov
in JIT to move to and from gp register into float/vector register. With that we will be able to generate that instruction when user callsCreateScalar()
C# API.See #33495 (comment) and needed for #33496 as well.
The text was updated successfully, but these errors were encountered: