Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Add Sve.LoadVector*ZeroExtendTo*() #101291

Merged
merged 7 commits into from
Apr 25, 2024

Conversation

mikabl-arm
Copy link
Contributor

Add the following APIs:

LoadVectorByteZeroExtendToInt16
LoadVectorByteZeroExtendToInt32
LoadVectorByteZeroExtendToInt64
LoadVectorByteZeroExtendToUInt16
LoadVectorByteZeroExtendToUInt32
LoadVectorByteZeroExtendToUInt64
LoadVectorInt16SignExtendToInt32
LoadVectorInt16SignExtendToInt64
LoadVectorInt16SignExtendToUInt32
LoadVectorInt16SignExtendToUInt64
LoadVectorInt32SignExtendToInt64
LoadVectorInt32SignExtendToUInt64
LoadVectorSByteSignExtendToInt16
LoadVectorSByteSignExtendToInt32
LoadVectorSByteSignExtendToInt64
LoadVectorSByteSignExtendToUInt16
LoadVectorSByteSignExtendToUInt32
LoadVectorSByteSignExtendToUInt64
LoadVectorUInt16ZeroExtendToInt32
LoadVectorUInt16ZeroExtendToInt64
LoadVectorUInt16ZeroExtendToUInt32
LoadVectorUInt16ZeroExtendToUInt64
LoadVectorUInt32ZeroExtendToInt64
LoadVectorUInt32ZeroExtendToUInt64

Test results, existing APIs starting with SveLoad are removed from the output:

~/dotnet/runtime$ $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll SveLoad                                                                                                               [321/1947]                                                         
Supported ISAs:                                                                                                                                                
  AdvSimd:   True                                                                                                                                              
  Aes:       True                                                                                                                                              
  ArmBase:   True                                                                                                                                              
  Crc32:     True                                                                                                                                              
  Dp:        True
  Rdm:       True
  Sha1:      True
  Sha256:    True
  Sve:       True

13:57:16.550 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt16()
Beginning scenario: RunBasicScenario_Load
13:57:16.556 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt16()
13:57:16.558 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.563 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt32()
13:57:16.565 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.570 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt64()
13:57:16.572 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToUInt16()
Beginning scenario: RunBasicScenario_Load
13:57:16.578 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToUInt16()
13:57:16.580 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToUInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.585 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToUInt32()
13:57:16.587 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToUInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.593 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToUInt64()
13:57:16.595 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.600 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToInt32()
13:57:16.602 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.608 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToInt64()
13:57:16.609 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToUInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.615 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToUInt32()
13:57:16.617 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToUInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.622 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt16SignExtendToUInt64()
13:57:16.624 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt32SignExtendToInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.630 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt32SignExtendToInt64()
13:57:16.631 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt32SignExtendToUInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.637 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorInt32SignExtendToUInt64()
13:57:16.639 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToInt16()
Beginning scenario: RunBasicScenario_Load
13:57:16.645 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToInt16()
13:57:16.647 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.652 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToInt32()
13:57:16.654 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.660 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToInt64()
13:57:16.662 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToUInt16()
Beginning scenario: RunBasicScenario_Load
13:57:16.668 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToUInt16()
13:57:16.669 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToUInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.675 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToUInt32()
13:57:16.677 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToUInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.683 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorSByteSignExtendToUInt64()
13:57:16.685 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.690 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToInt32()
13:57:16.692 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.698 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToInt64()
13:57:16.700 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToUInt32()
Beginning scenario: RunBasicScenario_Load
13:57:16.705 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToUInt32()
13:57:16.707 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToUInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.712 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt16ZeroExtendToUInt64()
13:57:16.714 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt32ZeroExtendToInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.720 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt32ZeroExtendToInt64()
13:57:16.722 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt32ZeroExtendToUInt64()
Beginning scenario: RunBasicScenario_Load
13:57:16.727 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorUInt32ZeroExtendToUInt64()

Add the following APIs:

LoadVectorByteZeroExtendToInt16
LoadVectorByteZeroExtendToInt32
LoadVectorByteZeroExtendToInt64
LoadVectorByteZeroExtendToUInt16
LoadVectorByteZeroExtendToUInt32
LoadVectorByteZeroExtendToUInt64
LoadVectorInt16SignExtendToInt32
LoadVectorInt16SignExtendToInt64
LoadVectorInt16SignExtendToUInt32
LoadVectorInt16SignExtendToUInt64
LoadVectorInt32SignExtendToInt64
LoadVectorInt32SignExtendToUInt64
LoadVectorSByteSignExtendToInt16
LoadVectorSByteSignExtendToInt32
LoadVectorSByteSignExtendToInt64
LoadVectorSByteSignExtendToUInt16
LoadVectorSByteSignExtendToUInt32
LoadVectorSByteSignExtendToUInt64
LoadVectorUInt16ZeroExtendToInt32
LoadVectorUInt16ZeroExtendToInt64
LoadVectorUInt16ZeroExtendToUInt32
LoadVectorUInt16ZeroExtendToUInt64
LoadVectorUInt32ZeroExtendToInt64
LoadVectorUInt32ZeroExtendToUInt64
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@mikabl-arm
Copy link
Contributor Author

@dotnet-policy-service agree company="Arm"

@mikabl-arm
Copy link
Contributor Author

@kunalspathak @dotnet/arm64-contrib @a74nh

@JulieLeeMSFT
Copy link
Member

Contributes to #99957.
Thanks @mikabl-arm for the PR for loads. When you merge the PR, could you check off the boxes from 99957?

@kunalspathak kunalspathak self-requested a review April 22, 2024 01:48
@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Apr 22, 2024
/// <summary>
/// svint16_t svld1ub_s16(svbool_t pg, const uint8_t *base)
/// LD1B Zresult.H, Pg/Z, [Xarray, Xindex]
/// LD1B Zresult.H, Pg/Z, [Xbase, #0, MUL VL]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please confirm which of the ld1* instruction this API maps to? Is it https://docsmirror.github.io/A64/2023-06/ld1b_z_p_ai.html?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's https://docsmirror.github.io/A64/2023-06/ld1b_z_p_bi.html . @a74nh, could you confirm this JIC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disassembly gives ld1b { z16.h }, p7/z, [x0]:

Beginning scenario: RunBasicScenario_Load
10:50:41.891 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVector_ulong()
10:50:41.892 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.SveLoadVectorByteZeroExtendToInt16()
; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16:RunBasicScenario_Load():this (Tier0)
; Emitting BLENDED_CODE for generic ARM64 - Unix
; Tier0 code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00    ] (  1,  1   )     ref  ->  [fp+0x48]  do-not-enreg[] this class-hnd <JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16>
;  V01 loc0         [V01    ] (  1,  1   )  simd16  ->  [fp+0x30]  HFA(simd16)  do-not-enreg[S] must-init <System.Numerics.Vector`1[short]>
;  V02 loc1         [V02    ] (  1,  1   )  simd16  ->  [fp+0x20]  HFA(simd16)  do-not-enreg[S] must-init <System.Numerics.Vector`1[short]>
;# V03 OutArgs      [V03    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V04 tmp1         [V04    ] (  1,  1   )    long  ->  [fp+0x18]  do-not-enreg[] "non-inline candidate call"
;  V05 tmp2         [V05    ] (  1,  1   )    long  ->  [fp+0x10]  do-not-enreg[] "argument with side effect"
;
; Lcl frame size = 64

G_M30813_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x50]!
            mov     fp, sp
            str     xzr, [fp, #0x30]	// [V01 loc0]
            str     xzr, [fp, #0x38]	// [V01 loc0+0x08]
            str     xzr, [fp, #0x20]	// [V02 loc1]
            str     xzr, [fp, #0x28]	// [V02 loc1+0x08]
            str     x0, [fp, #0x48]	// [V00 this]
						;; size=28 bbWeight=1 PerfScore 6.50
G_M30813_IG02:  ;; offset=0x001C
            movz    x0, #0xBE18
            movk    x0, #0x8423 LSL #16
            movk    x0, #0xFFFF LSL #32
            movz    x1, #0x7438      // code for TestLibrary.TestFramework:BeginScenario(System.String)
            movk    x1, #0x3AB9 LSL #16
            movk    x1, #0xFFFF LSL #32
            ldr     x1, [x1]
            blr     x1
            ptrue   p7.h
            mov     z16.h, p7/z, #1
            str     q16, [fp, #0x30]	// [V01 loc0]
            ldr     x0, [fp, #0x48]	// [V00 this]
            ldrsb   wzr, [x0]
            ldr     x0, [fp, #0x48]	// [V00 this]
            add     x0, x0, #16
            movz    x1, #0x7DE0      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16+DataTable:get_inArray1Ptr():ulong:this
            movk    x1, #0x3AD1 LSL #16
            movk    x1, #0xFFFF LSL #32
            ldr     x1, [x1]
            blr     x1
            ldr     q16, [fp, #0x30]	// [V01 loc0]
            ptrue   p7.h
            cmpne   p7.h, p7/z, z16.h, #0
            ld1b    { z16.h }, p7/z, [x0]
            str     q16, [fp, #0x20]	// [V02 loc1]
            ldr     x0, [fp, #0x48]	// [V00 this]
            ldrsb   wzr, [x0]
            ldr     x0, [fp, #0x48]	// [V00 this]
            add     x0, x0, #16
            movz    x1, #0x7DF8      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16+DataTable:get_outArrayPtr():ulong:this
            movk    x1, #0x3AD1 LSL #16
            movk    x1, #0xFFFF LSL #32
            ldr     x1, [x1]
            blr     x1
            ldr     q16, [fp, #0x20]	// [V02 loc1]
            str     q16, [x0]
            ldr     x0, [fp, #0x48]	// [V00 this]
            ldrsb   wzr, [x0]
            ldr     x0, [fp, #0x48]	// [V00 this]
            add     x0, x0, #16
            movz    x1, #0x7DE0      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16+DataTable:get_inArray1Ptr():ulong:this
            movk    x1, #0x3AD1 LSL #16
            movk    x1, #0xFFFF LSL #32
            ldr     x1, [x1]
            blr     x1
            str     x0, [fp, #0x18]	// [V04 tmp1]
            ldr     x0, [fp, #0x48]	// [V00 this]
            ldrsb   wzr, [x0]
            ldr     x0, [fp, #0x48]	// [V00 this]
            add     x0, x0, #16
            movz    x1, #0x7DF8      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16+DataTable:get_outArrayPtr():ulong:this
            movk    x1, #0x3AD1 LSL #16
            movk    x1, #0xFFFF LSL #32
            ldr     x1, [x1]
            blr     x1
            str     x0, [fp, #0x10]	// [V05 tmp2]
            ldr     x2, [fp, #0x10]	// [V05 tmp2]
            ldr     x1, [fp, #0x18]	// [V04 tmp1]
            ldr     x0, [fp, #0x48]	// [V00 this]
            movz    x3, #0xBE18
            movk    x3, #0x8423 LSL #16
            movk    x3, #0xFFFF LSL #32
            movz    x4, #0x7EE8      // code for JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16:ValidateResult(ulong,ulong,System.String):this
            movk    x4, #0x3AD1 LSL #16
            movk    x4, #0xFFFF LSL #32
            ldr     x4, [x4]
            blr     x4
						;; size=268 bbWeight=1 PerfScore 98.00
G_M30813_IG03:  ;; offset=0x0128
            ldp     fp, lr, [sp], #0x50
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00

; Total bytes of code 304, prolog size 24, PerfScore 106.50, instruction count 76, allocated bytes for code 304 (MethodHash=c5fe87a2) for method JIT.HardwareIntrinsics.Arm._Sve.LoadUnaryOpTest__SveLoadVectorByteZeroExtendToInt16:RunBasicScenario_Load():this (Tier0)
; ============================================================

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disassembly gives ld1b { z16.h }, p7/z, [x0]:

Just 1 register address. Due to the shortcut in emitInsSve_R_R_R(), it drops into emitIns_R_R_R_I using an immediate of 0. Which means.....

I believe it's https://docsmirror.github.io/A64/2023-06/ld1b_z_p_bi.html . @a74nh, could you confirm this JIC?

.....Yes, that one. Contiguous load unsigned bytes to vector (immediate index).

So, we only want the one in the API file:

        ///   LD1B Zresult.H, Pg/Z, [Xbase, #0, MUL VL]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that entry from all the API documentation should go away.

("SveLoadMaskedUnOpTest.template", new Dictionary<string, string> {["TestName"] = "SveLoadVectorSByteSignExtendToUInt32", ["Isa"] = "Sve", ["Method"] = "LoadVectorSByteSignExtendToUInt32", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt32", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt32", ["Op2BaseType"] = "SByte", ["LargestVectorSize"] = "8", ["NextValueOp2"] = "TestLibrary.Generator.GetSByte()", ["ValidateIterResult"] = "firstOp[i] != result[i]"}),
("SveLoadMaskedUnOpTest.template", new Dictionary<string, string> {["TestName"] = "SveLoadVectorSByteSignExtendToUInt64", ["Isa"] = "Sve", ["Method"] = "LoadVectorSByteSignExtendToUInt64", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt64", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt64", ["Op2BaseType"] = "SByte", ["LargestVectorSize"] = "8", ["NextValueOp2"] = "TestLibrary.Generator.GetSByte()", ["ValidateIterResult"] = "(ulong)firstOp[i] != result[i]"}),
("SveLoadMaskedUnOpTest.template", new Dictionary<string, string> {["TestName"] = "SveLoadVectorUInt16ZeroExtendToInt32", ["Isa"] = "Sve", ["Method"] = "LoadVectorUInt16ZeroExtendToInt32", ["RetVectorType"] = "Vector", ["RetBaseType"] = "Int32", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "Int32", ["Op2BaseType"] = "UInt16", ["LargestVectorSize"] = "8", ["NextValueOp2"] = "TestLibrary.Generator.GetUInt16()", ["ValidateIterResult"] = "firstOp[i] != result[i]"}),
("SveLoadMaskedUnOpTest.template", new Dictionary<string, string> {["TestName"] = "SveLoadVectorUInt16ZeroExtendToInt64", ["Isa"] = "Sve", ["Method"] = "LoadVectorUInt16ZeroExtendToInt64", ["RetVectorType"] = "Vector", ["RetBaseType"] = "Int64", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "Int64", ["Op2BaseType"] = "UInt16", ["LargestVectorSize"] = "8", ["NextValueOp2"] = "TestLibrary.Generator.GetUInt16()", ["ValidateIterResult"] = "firstOp[i] != result[i]"}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the comment for LoadVector as well - but it seems that the mask vector gets populated with "TestLibrary.Generator.Get*() which is mostly non-zero. We should also add a way to have in-active elements in the mask and make sure that they remain untouched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a way to have in-active elements in the mask and make sure that they remain untouched.

Agreed. the tricky part of creating the mask and knowing which elements are set. We could use the API methods, but most haven't been implemented yet.

Alternatively - add a helper which constructs a Vector filled randomly with 0 and 1 elements. The implicit mask to vector conversions mean that this can be used for the load mask. Then validateResult needs passing the mask so it can check the 0 entries in the mask match to 0s in the result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a way to have in-active elements in the mask and make sure that they remain untouched.

Agreed. the tricky part of creating the mask and knowing which elements are set. We could use the API methods, but most haven't been implemented yet.

This is how I created it and that is giving good test coverage - https://github.com/dotnet/runtime/pull/100743/files#diff-fd9c6bd33b62670bf0ba80ff74092bd4abfa7d8fb85b0b729a570289531e39d4R181

Alternatively - add a helper which constructs a Vector filled randomly with 0 and 1 elements. The implicit mask to vector conversions mean that this can be used for the load mask. Then validateResult needs passing the mask so it can check the 0 entries in the mask match to 0s in the result.

Yes, pretty much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #101475

@a74nh
Copy link
Contributor

a74nh commented Apr 22, 2024

Contributes to #99957. Thanks @mikabl-arm for the PR for loads. When you merge the PR, could you check off the boxes from 99957?

@JulieLeeMSFT I think only Kunal has the permissions to do this.

Remove comments that mentions instuctions that APIs are never mapped to.
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please fix the summary docs and the formatting errors. I think it should be good after that.

@mikabl-arm
Copy link
Contributor Author

could you please fix the summary docs and the formatting errors. I think it should be good after that.

Done, please check.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution!

@kunalspathak
Copy link
Member

I have updated this PR to resolve merge conflicts and mark the APIs as HW_Flag_ExplicitMaskedOperation which is needed after my change in #100743

@kunalspathak kunalspathak merged commit a66dcfc into dotnet:main Apr 25, 2024
161 of 168 checks passed
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
* JIT ARM64-SVE: Add Sve.LoadVector*ZeroExtendTo*()

Add the following APIs:

LoadVectorByteZeroExtendToInt16
LoadVectorByteZeroExtendToInt32
LoadVectorByteZeroExtendToInt64
LoadVectorByteZeroExtendToUInt16
LoadVectorByteZeroExtendToUInt32
LoadVectorByteZeroExtendToUInt64
LoadVectorInt16SignExtendToInt32
LoadVectorInt16SignExtendToInt64
LoadVectorInt16SignExtendToUInt32
LoadVectorInt16SignExtendToUInt64
LoadVectorInt32SignExtendToInt64
LoadVectorInt32SignExtendToUInt64
LoadVectorSByteSignExtendToInt16
LoadVectorSByteSignExtendToInt32
LoadVectorSByteSignExtendToInt64
LoadVectorSByteSignExtendToUInt16
LoadVectorSByteSignExtendToUInt32
LoadVectorSByteSignExtendToUInt64
LoadVectorUInt16ZeroExtendToInt32
LoadVectorUInt16ZeroExtendToInt64
LoadVectorUInt16ZeroExtendToUInt32
LoadVectorUInt16ZeroExtendToUInt64
LoadVectorUInt32ZeroExtendToInt64
LoadVectorUInt32ZeroExtendToUInt64

* cleanup: remove unwatnted comments

Remove comments that mentions instuctions that APIs are never mapped to.

* fix merge conflict

* fix merge conflict

* fix spacing

* Mark LoadVector*Extend* as having HW_Flag_ExplicitMaskedOperation

---------

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
* JIT ARM64-SVE: Add Sve.LoadVector*ZeroExtendTo*()

Add the following APIs:

LoadVectorByteZeroExtendToInt16
LoadVectorByteZeroExtendToInt32
LoadVectorByteZeroExtendToInt64
LoadVectorByteZeroExtendToUInt16
LoadVectorByteZeroExtendToUInt32
LoadVectorByteZeroExtendToUInt64
LoadVectorInt16SignExtendToInt32
LoadVectorInt16SignExtendToInt64
LoadVectorInt16SignExtendToUInt32
LoadVectorInt16SignExtendToUInt64
LoadVectorInt32SignExtendToInt64
LoadVectorInt32SignExtendToUInt64
LoadVectorSByteSignExtendToInt16
LoadVectorSByteSignExtendToInt32
LoadVectorSByteSignExtendToInt64
LoadVectorSByteSignExtendToUInt16
LoadVectorSByteSignExtendToUInt32
LoadVectorSByteSignExtendToUInt64
LoadVectorUInt16ZeroExtendToInt32
LoadVectorUInt16ZeroExtendToInt64
LoadVectorUInt16ZeroExtendToUInt32
LoadVectorUInt16ZeroExtendToUInt64
LoadVectorUInt32ZeroExtendToInt64
LoadVectorUInt32ZeroExtendToUInt64

* cleanup: remove unwatnted comments

Remove comments that mentions instuctions that APIs are never mapped to.

* fix merge conflict

* fix merge conflict

* fix spacing

* Mark LoadVector*Extend* as having HW_Flag_ExplicitMaskedOperation

---------

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
@github-actions github-actions bot locked and limited conversation to collaborators May 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants