diff --git a/docs/design/coreclr/botr/vectors-and-intrinsics.md b/docs/design/coreclr/botr/vectors-and-intrinsics.md index 232c0af136975..f507b55bd622f 100644 --- a/docs/design/coreclr/botr/vectors-and-intrinsics.md +++ b/docs/design/coreclr/botr/vectors-and-intrinsics.md @@ -15,7 +15,7 @@ Most hardware intrinsics support is tied to the use of various Vector apis. Ther - The fixed length float vectors. `Vector2`, `Vector3`, and `Vector4`. These vector types represent a struct of floats of various lengths. For type layout, ABI and, interop purposes they are represented in exactly the same way as a structure with an appropriate number of floats in it. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations. - The variable length `Vector`. This represents vector data of runtime-determined length. In any given process the length of a `Vector` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be the following types (`System.Byte`, `System.SByte`, `System.Int16`, `System.UInt16`, `System.Int32`, `System.UInt32`, `System.Int64`, `System.UInt64`, `System.Single`, and `System.Double`), and allows use of integer or double data within a vector. The length and alignment of `Vector` is unknown to the developer at compile time (although discoverable at runtime by using the `Vector.Count` api), and `Vector` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations if the `Vector.IsHardwareAccelerated` api returns true. -- `Vector64`, `Vector128`, and `Vector256` represent fixed-sized vectors that closely resemble the fixed- sized vectors available in C++. These structures can be used in any code that runs, but very few features are supported directly on these types other than creation. They are used primarily in the processor specific hardware intrinsics apis. +- `Vector64`, `Vector128`, `Vector256`, and `Vector512` represent fixed-sized vectors that closely resemble the fixed- sized vectors available in C++. These structures can be used in any code that runs, but very few features are supported directly on these types other than creation. They are used primarily in the processor specific hardware intrinsics apis. - Processor specific hardware intrinsics apis such as `System.Runtime.Intrinsics.X86.Ssse3`. These apis map directly to individual instructions or short instruction sequences that are specific to a particular hardware instruction. These apis are only usable on hardware that supports the particular instruction. See https://github.com/dotnet/designs/blob/master/accepted/2018/platform-intrinsics.md for the design of these. # How to use intrinsics apis @@ -23,7 +23,7 @@ Most hardware intrinsics support is tied to the use of various Vector apis. Ther There are 3 models for use of intrinsics apis. 1. Usage of `Vector2`, `Vector3`, `Vector4`, and `Vector`. For these, its always safe to just use the types. The jit will generate code that is as optimal as it can for the logic, and will do so unconditionally. -2. Usage of `Vector64`, `Vector128`, and `Vector256`. These types may be used unconditionally, but are only truly useful when also using the platform specific hardware intrinsics apis. +2. Usage of `Vector64`, `Vector128`, `Vector256`, and `Vector512`. These types may be used unconditionally, but are only truly useful when also using the platform specific hardware intrinsics apis. 3. Usage of platform intrinsics apis. All usage of these apis should be wrapped in an `IsSupported` check of the appropriate kind. Then, within the `IsSupported` check the platform specific api may be used. If multiple instruction sets are used, then the application developer must have checks for the instruction sets as used on each one of them. # Effect of usage of hardware intrinsics on how code is generated @@ -142,7 +142,7 @@ public class BitOperations #### Crossgen implementation rules - Any code which uses an intrinsic from the `System.Runtime.Intrinsics.Arm` or `System.Runtime.Intrinsics.X86` namespace will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`) - Any code which uses `Vector` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_SIMD_NGEN_DISALLOWED`) -- Any code which uses `Vector64`, `Vector128` or `Vector256` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`) +- Any code which uses `Vector64`, `Vector128`, `Vector256`, or `Vector512` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`) - Non-platform intrinsics which require more hardware support than the minimum supported hardware capability will not take advantage of that capability. In particular the code generated for Vector2/3/4 is sub-optimal. MethodImplOptions.AggressiveOptimization may be used to disable compilation of this sub-par code. #### Characteristics which result from rules @@ -160,10 +160,10 @@ There are 2 sets of instruction sets known to the compiler. - The baseline instruction set which defaults to (Sse, Sse2), but may be adjusted via compiler option. - The optimistic instruction set which defaults to (Sse3, Ssse3, Sse41, Sse42, Popcnt, Pclmulqdq, and Lzcnt). -Code will be compiled using the optimistic instruction set to drive compilation, but any use of an instruction set beyond the baseline instruction set will be recorded, as will any attempt to use an instruction set beyond the optimistic set if that attempted use has a semantic effect. If the baseline instruction set includes `Avx2` then the size and characteristics of of `Vector` is known. Any other decisions about ABI may also be encoded. For instance, it is likely that the ABI of `Vector256` will vary based on the presence/absence of `Avx` support. +Code will be compiled using the optimistic instruction set to drive compilation, but any use of an instruction set beyond the baseline instruction set will be recorded, as will any attempt to use an instruction set beyond the optimistic set if that attempted use has a semantic effect. If the baseline instruction set includes `Avx2` then the size and characteristics of of `Vector` is known. Any other decisions about ABI may also be encoded. For instance, it is likely that the ABI of `Vector256` and `Vector512` will vary based on the presence/absence of `Avx` support. - Any code which uses `Vector` will not be compiled AOT unless the size of `Vector` is known. -- Any code which passes a `Vector256` as a parameter on a Linux or Mac machine will not be compiled AOT unless the support for the `Avx` instruction set is known. +- Any code which passes a `Vector256` or `Vector512` as a parameter on a Linux or Mac machine will not be compiled AOT unless the support for the `Avx` instruction set is known. - Non-platform intrinsics which require more hardware support than the optimistic supported hardware capability will not take advantage of that capability. MethodImplOptions.AggressiveOptimization may be used to disable compilation of this sub-par code. - Code which takes advantage of instructions sets in the optimistic set will not be used on a machine which only supports the baseline instruction set. - Code which attempts to use instruction sets outside of the optimistic set will generate code that will not be used on machines with support for the instruction set. @@ -194,7 +194,6 @@ While the above api exists, it is not expected that general purpose code within |`compExactlyDependsOn(isa)`| Use when making a decision to use or not use an instruction set when the decision will affect the semantics of the generated code. Should never be used in an assert. | Return whether or not an instruction set is supported. Calls notifyInstructionSetUsage with the result of that computation. |`compOpportunisticallyDependsOn(isa)`| Use when making an opportunistic decision to use or not use an instruction set. Use when the instruction set usage is a "nice to have optimization opportunity", but do not use when a false result may change the semantics of the program. Should never be used in an assert. | Return whether or not an instruction set is supported. Calls notifyInstructionSetUsage if the instruction set is supported. |`compIsaSupportedDebugOnly(isa)` | Use to assert whether or not an instruction set is supported | Return whether or not an instruction set is supported. Does not report anything. Only available in debug builds. -|`getSIMDSupportLevel()`| Use when determining what codegen to generate for code that operates on `Vector`, `Vector2`, `Vector3` or `Vector4`.| Queries the instruction sets supported using `compOpportunisticallyDependsOn`, and finds a set of instructions available to use for working with the platform agnostic vector types. |`getSIMDVectorType()`| Use to get the TYP of a the `Vector` type. | Determine the TYP of the `Vector` type. If on the architecture the TYP may vary depending on whatever rules, this function will make sufficient use of the `notifyInstructionSetUsage` api to ensure that the TYP is consistent between compile time and runtime. |`getSIMDVectorRegisterByteLength()` | Use to get the size of a `Vector` value. | Determine the size of the `Vector` type. If on the architecture the size may vary depending on whatever rules, this function will make sufficient use of the `notifyInstructionSetUsage` api to ensure that the size is consistent between compile time and runtime. |`maxSIMDStructBytes()`| Get the maximum number of bytes that might be used in a SIMD type during this compilation. | Query the set of instruction sets supported, and determine the largest simd type supported. Use `compOpportunisticallyDependsOn` to perform the queries so that the maximum size needed is the only one recorded. diff --git a/src/coreclr/jit/codegencommon.cpp b/src/coreclr/jit/codegencommon.cpp index 4536cd3509096..08c46a06b421d 100644 --- a/src/coreclr/jit/codegencommon.cpp +++ b/src/coreclr/jit/codegencommon.cpp @@ -1731,6 +1731,7 @@ void CodeGen::genGenerateMachineCode() printf(" for "); +#if defined(TARGET_X86) if (compiler->info.genCPU == CPU_X86) { printf("generic X86 CPU"); @@ -1739,9 +1740,14 @@ void CodeGen::genGenerateMachineCode() { printf("Pentium 4"); } - else if (compiler->info.genCPU == CPU_X64) +#elif defined(TARGET_AMD64) + if (compiler->info.genCPU == CPU_X64) { - if (compiler->canUseVexEncoding()) + if (compiler->canUseEvexEncoding()) + { + printf("X64 CPU with AVX512"); + } + else if (compiler->canUseVexEncoding()) { printf("X64 CPU with AVX"); } @@ -1750,18 +1756,22 @@ void CodeGen::genGenerateMachineCode() printf("X64 CPU with SSE2"); } } - else if (compiler->info.genCPU == CPU_ARM) +#elif defined(TARGET_ARM) + if (compiler->info.genCPU == CPU_ARM) { printf("generic ARM CPU"); } - else if (compiler->info.genCPU == CPU_ARM64) +#elif defined(TARGET_ARM64) + if (compiler->info.genCPU == CPU_ARM64) { printf("generic ARM64 CPU"); } - else if (compiler->info.genCPU == CPU_LOONGARCH64) +#elif defined(TARGET_LOONGARCH64) + if (compiler->info.genCPU == CPU_LOONGARCH64) { printf("generic LOONGARCH64 CPU"); } +#endif else { printf("unknown architecture"); diff --git a/src/coreclr/jit/codegenxarch.cpp b/src/coreclr/jit/codegenxarch.cpp index e837c87a5c001..624ef3a514adb 100644 --- a/src/coreclr/jit/codegenxarch.cpp +++ b/src/coreclr/jit/codegenxarch.cpp @@ -10709,7 +10709,6 @@ void CodeGen::genZeroInitFrameUsingBlockInit(int untrLclHi, int untrLclLo, regNu assert(compiler->compGeneratingProlog); assert(genUseBlockInit); assert(untrLclHi > untrLclLo); - assert(compiler->getSIMDSupportLevel() >= SIMD_SSE2_Supported); emitter* emit = GetEmitter(); regNumber frameReg = genFramePointerReg(); diff --git a/src/coreclr/jit/compiler.cpp b/src/coreclr/jit/compiler.cpp index 1e454ef74668f..791c0740835ed 100644 --- a/src/coreclr/jit/compiler.cpp +++ b/src/coreclr/jit/compiler.cpp @@ -2238,11 +2238,8 @@ void Compiler::compSetProcessor() info.genCPU = CPU_X86_PENTIUM_4; else info.genCPU = CPU_X86; - #elif defined(TARGET_LOONGARCH64) - - info.genCPU = CPU_LOONGARCH64; - + info.genCPU = CPU_LOONGARCH64; #endif // diff --git a/src/coreclr/jit/compiler.h b/src/coreclr/jit/compiler.h index 81c553640ed8e..24406af654b3c 100644 --- a/src/coreclr/jit/compiler.h +++ b/src/coreclr/jit/compiler.h @@ -8323,34 +8323,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX } #endif // DEBUG - // Get highest available level for SIMD codegen - SIMDLevel getSIMDSupportLevel() - { -#if defined(TARGET_XARCH) - if (compOpportunisticallyDependsOn(InstructionSet_AVX2)) - { - if (compOpportunisticallyDependsOn(InstructionSet_Vector512)) - { - return SIMD_Vector512_Supported; - } - - return SIMD_AVX2_Supported; - } - - if (compOpportunisticallyDependsOn(InstructionSet_SSE42)) - { - return SIMD_SSE4_Supported; - } - - // min bar is SSE2 - return SIMD_SSE2_Supported; -#else - assert(!"Available instruction set(s) for SIMD codegen is not defined for target arch"); - unreached(); - return SIMD_Not_Supported; -#endif - } - bool isIntrinsicType(CORINFO_CLASS_HANDLE clsHnd) { return info.compCompHnd->isIntrinsicType(clsHnd); @@ -8781,16 +8753,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX var_types getSIMDVectorType() { #if defined(TARGET_XARCH) - // TODO-XArch-AVX512 : Return TYP_SIMD64 once Vector supports AVX512. - if (getSIMDSupportLevel() >= SIMD_AVX2_Supported) + if (compOpportunisticallyDependsOn(InstructionSet_AVX2)) { + // TODO-XArch-AVX512 : Return TYP_SIMD64 once Vector supports AVX512. return TYP_SIMD32; } else { - // Verify and record that AVX2 isn't supported compVerifyInstructionSetUnusable(InstructionSet_AVX2); - assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported); return TYP_SIMD16; } #elif defined(TARGET_ARM64) @@ -8823,16 +8793,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX unsigned getSIMDVectorRegisterByteLength() { #if defined(TARGET_XARCH) - // TODO-XArch-AVX512 : Return ZMM_REGSIZE_BYTES once Vector supports AVX512. - if (getSIMDSupportLevel() >= SIMD_AVX2_Supported) + if (compOpportunisticallyDependsOn(InstructionSet_AVX2)) { + // TODO-XArch-AVX512 : Return ZMM_REGSIZE_BYTES once Vector supports AVX512. return YMM_REGSIZE_BYTES; } else { - // Verify and record that AVX2 isn't supported compVerifyInstructionSetUnusable(InstructionSet_AVX2); - assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported); return XMM_REGSIZE_BYTES; } #elif defined(TARGET_ARM64) @@ -8847,9 +8815,11 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX // maxSIMDStructBytes // The minimum SIMD size supported by System.Numeric.Vectors or System.Runtime.Intrinsic - // SSE: 16-byte Vector and Vector128 - // AVX: 32-byte Vector256 (Vector is 16-byte) - // AVX2: 32-byte Vector and Vector256 + // Arm.AdvSimd: 16-byte Vector and Vector128 + // X86.SSE: 16-byte Vector and Vector128 + // X86.AVX: 16-byte Vector and Vector256 + // X86.AVX2: 32-byte Vector and Vector256 + // X86.AVX512F: 32-byte Vector and Vector512 unsigned int maxSIMDStructBytes() { #if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_XARCH) @@ -8859,17 +8829,22 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX { return ZMM_REGSIZE_BYTES; } - return YMM_REGSIZE_BYTES; + else + { + compVerifyInstructionSetUnusable(InstructionSet_AVX512F); + return YMM_REGSIZE_BYTES; + } } else { - // Verify and record that AVX2 isn't supported - compVerifyInstructionSetUnusable(InstructionSet_AVX2); - assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported); + compVerifyInstructionSetUnusable(InstructionSet_AVX); return XMM_REGSIZE_BYTES; } +#elif defined(TARGET_ARM64) + return FP_REGSIZE_BYTES; #else - return getSIMDVectorRegisterByteLength(); + assert(!"maxSIMDStructBytes() unimplemented on target arch"); + unreached(); #endif } @@ -9134,13 +9109,10 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX #endif } +#ifdef TARGET_XARCH bool canUseVexEncoding() const { -#ifdef TARGET_XARCH return compOpportunisticallyDependsOn(InstructionSet_AVX); -#else - return false; -#endif } //------------------------------------------------------------------------ @@ -9151,8 +9123,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX // bool canUseEvexEncoding() const { -#ifdef TARGET_XARCH - #ifdef DEBUG if (JitConfig.JitForceEVEXEncoding()) { @@ -9161,9 +9131,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX #endif // DEBUG return compOpportunisticallyDependsOn(InstructionSet_AVX512F); -#else - return false; -#endif } //------------------------------------------------------------------------ @@ -9174,7 +9141,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX // bool DoJitStressEvexEncoding() const { -#if defined(TARGET_XARCH) && defined(DEBUG) +#ifdef DEBUG // Using JitStressEVEXEncoding flag will force instructions which would // otherwise use VEX encoding but can be EVEX encoded to use EVEX encoding // This requires AVX512VL support. JitForceEVEXEncoding forces this encoding, thus @@ -9184,14 +9151,16 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX { return true; } + if (JitConfig.JitStressEvexEncoding() && compOpportunisticallyDependsOn(InstructionSet_AVX512F_VL)) { return true; } -#endif // TARGET_XARCH && DEBUG +#endif // DEBUG return false; } +#endif // TARGET_XARCH /* XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX diff --git a/src/coreclr/jit/emit.cpp b/src/coreclr/jit/emit.cpp index c28eebc6bbf4d..7f3ad3e38c32b 100644 --- a/src/coreclr/jit/emit.cpp +++ b/src/coreclr/jit/emit.cpp @@ -2609,17 +2609,16 @@ void emitter::emitSetFrameRangeArgs(int offsLo, int offsHi) /***************************************************************************** * - * A conversion table used to map an operand size value (in bytes) into its - * small encoding (0 through 3), and vice versa. + * A conversion table used to map an operand size value (in bytes) into its emitAttr */ -const emitter::opSize emitter::emitSizeEncode[] = { - emitter::OPSZ1, emitter::OPSZ2, emitter::OPSZ4, emitter::OPSZ8, emitter::OPSZ16, emitter::OPSZ32, emitter::OPSZ64, +const emitAttr emitter::emitSizeDecode[emitter::OPSZ_COUNT] = { + EA_1BYTE, EA_2BYTE, EA_4BYTE, EA_8BYTE, EA_16BYTE, +#if defined(TARGET_XARCH) + EA_32BYTE, EA_64BYTE, +#endif // TARGET_XARCH }; -const emitAttr emitter::emitSizeDecode[emitter::OPSZ_COUNT] = {EA_1BYTE, EA_2BYTE, EA_4BYTE, EA_8BYTE, - EA_16BYTE, EA_32BYTE, EA_64BYTE}; - /***************************************************************************** * * Allocate an instruction descriptor for an instruction that uses both diff --git a/src/coreclr/jit/emit.h b/src/coreclr/jit/emit.h index d517b4ac58f3d..e6bc3b68a5608 100644 --- a/src/coreclr/jit/emit.h +++ b/src/coreclr/jit/emit.h @@ -511,25 +511,30 @@ class emitter enum opSize : unsigned { - OPSZ1 = 0, - OPSZ2 = 1, - OPSZ4 = 2, - OPSZ8 = 3, - OPSZ16 = 4, + OPSZ1 = 0, + OPSZ2 = 1, + OPSZ4 = 2, + OPSZ8 = 3, + OPSZ16 = 4, + +#if defined(TARGET_XARCH) OPSZ32 = 5, OPSZ64 = 6, OPSZ_COUNT = 7, +#else + OPSZ_COUNT = 5, +#endif + #ifdef TARGET_AMD64 OPSZP = OPSZ8, #else - OPSZP = OPSZ4, + OPSZP = OPSZ4, #endif }; #define OPSIZE_INVALID ((opSize)0xffff) - static const emitter::opSize emitSizeEncode[]; - static const emitAttr emitSizeDecode[]; + static const emitAttr emitSizeDecode[]; static emitter::opSize emitEncodeSize(emitAttr size); static emitAttr emitDecodeSize(emitter::opSize ensz); @@ -3082,16 +3087,13 @@ inline emitAttr emitActualTypeSize(T type) /* static */ inline emitter::opSize emitter::emitEncodeSize(emitAttr size) { - assert(size == EA_1BYTE || size == EA_2BYTE || size == EA_4BYTE || size == EA_8BYTE || size == EA_16BYTE || - size == EA_32BYTE || size == EA_64BYTE); - + assert((size != EA_UNKNOWN) && ((size & EA_SIZE_MASK) == size)); return static_cast(genLog2(size)); } /* static */ inline emitAttr emitter::emitDecodeSize(emitter::opSize ensz) { - assert(((unsigned)ensz) < OPSZ_COUNT); - + assert(static_cast(ensz) < OPSZ_COUNT); return emitSizeDecode[ensz]; } diff --git a/src/coreclr/jit/emitxarch.h b/src/coreclr/jit/emitxarch.h index a081a162d3af6..6d44a1fc14681 100644 --- a/src/coreclr/jit/emitxarch.h +++ b/src/coreclr/jit/emitxarch.h @@ -730,14 +730,12 @@ void emitAdjustStackDepth(instruction ins, ssize_t val); inline emitter::opSize emitEncodeScale(size_t scale) { assert(scale == 1 || scale == 2 || scale == 4 || scale == 8); - - return static_cast(genLog2((unsigned int)scale)); + return static_cast(genLog2(static_cast(scale))); } inline emitAttr emitDecodeScale(unsigned ensz) { assert(ensz < 4); - return emitter::emitSizeDecode[ensz]; } diff --git a/src/coreclr/jit/instr.h b/src/coreclr/jit/instr.h index 228870ca0b96a..72295fa640af6 100644 --- a/src/coreclr/jit/instr.h +++ b/src/coreclr/jit/instr.h @@ -350,9 +350,14 @@ enum emitAttr : unsigned EA_4BYTE = 0x004, EA_8BYTE = 0x008, EA_16BYTE = 0x010, + +#if defined(TARGET_XARCH) EA_32BYTE = 0x020, EA_64BYTE = 0x040, EA_SIZE_MASK = 0x07F, +#else + EA_SIZE_MASK = 0x01F, +#endif #ifdef TARGET_64BIT EA_PTRSIZE = EA_8BYTE, diff --git a/src/coreclr/jit/simd.h b/src/coreclr/jit/simd.h index 127f223e378a4..e04895ff9563a 100644 --- a/src/coreclr/jit/simd.h +++ b/src/coreclr/jit/simd.h @@ -4,42 +4,6 @@ #ifndef _SIMD_H_ #define _SIMD_H_ -// Underlying hardware information -// This type is used to control -// 1. The length of System.Numerics.Vector. -// 2. Codegen of System.Numerics.Vectors. -// 3. Codegen of floating-point arithmetics (VEX-encoding or not). -// -// Note -// - Hardware SIMD support is classified to the levels. Do not directly use -// InstructionSet (instr.h) for System.Numerics.Vectors. -// - Values of SIMDLevel have strictly increasing order that each SIMD level -// is a superset of the previous levels. -enum SIMDLevel -{ - SIMD_Not_Supported = 0, -#ifdef TARGET_XARCH - // SSE2 - The min bar of SIMD ISA on x86/x64. - // Vector length is 128-bit. - // Floating-point instructions are legacy SSE encoded. - SIMD_SSE2_Supported = 1, - - // SSE4 - RyuJIT may generate SSE3, SSSE3, SSE4.1 and SSE4.2 instructions for certain intrinsics. - // Vector length is 128-bit. - // Floating-point instructions are legacy SSE encoded. - SIMD_SSE4_Supported = 2, - - // AVX2 - Hardware has AVX and AVX2 instruction set. - // Vector length is 256-bit and SIMD instructions are VEX-256 encoded. - // Floating-point instructions are VEX-128 encoded. - SIMD_AVX2_Supported = 3, - - // Vector512 - Hardware has AVX, AVX2 and AVX512F instruction set. - // Floating-point instructions are EVEX encoded. - SIMD_Vector512_Supported = 4 -#endif -}; - struct simd8_t { union {