[API Proposal]: VPCLMULQDQ Intrinsics #95772

saucecontrol · 2023-12-07T23:37:51Z

Background and motivation

VPCLMULQDQ is supported by Intel in the Ice Lake and newer architectures, and by AMD in Zen 4. It allows for parallel pclmulqdq in Vector256 and Vector512 and is important for implementing vectorized CRC32 among other things.

API Proposal

namespace System.Runtime.Intrinsics.X86;

public abstract class Pclmulqdq : Sse2
{
    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<long> CarrylessMultiply(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<ulong> CarrylessMultiply(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<long> CarrylessMultiply(Vector512<long> left, Vector512<long> right, [ConstantExpected] byte control);
        public static Vector512<ulong> CarrylessMultiply(Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected] byte control);
    }
}

API Usage

Examples of vectorized CRC32 implementations using the equivalent C intrinsics abound. One such example: https://github.com/corsix/fast-crc32/blob/main/sample_avx512_vpclmulqdq_crc32c_v4s5x3.c

Alternative Designs

The Pclmulqdq256 and Pclmulqdq512 classes could be nested under Pclmulqdq rather than being top-level classes inheriting from it. Since this ISA includes only a single instruction, that may be preferable.

A case could be made for making Avx the base of Pclulqdq256, as VEX encoding is required for vpclmulqdq. Likewise, Pclmulqdq512 could have Avx512F as a base given its requirement of EVEX encoding. However, the relationship will change with AVX10, where EVEX support will not imply 512-bit vector support.

Risks

N/A

The text was updated successfully, but these errors were encountered:

ghost · 2023-12-07T23:37:58Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

VPCLMULQDQ is supported by Intel in the Ice Lake and newer architectures, and by AMD in Zen 4. It allows for parallel pclmulqdq in Vector256 and Vector512 and is important for implementing vectorized CRC32 among other things.

API Proposal

namespace System.Runtime.Intrinsics.X86;

/// <summary>
/// This class provides access to Intel VPCLMULQDQ hardware instructions via intrinsics
/// </summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class Pclmulqdq256 : Pclmulqdq
{
    internal Pclmulqdq256() { }

    // This would depend on the VPCLMULQDQ CPUID bit for VEX encoding and VPCLMULQDQ + AVX512VL for EVEX
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Pclmulqdq.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m256i _mm256_clmulepi64_si128 (__m256i a, __m256i b, const int imm8)
    ///   VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
    /// </summary>
    public static Vector256<long> CarrylessMultiply(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
    /// __m256i _mm256_clmulepi64_si128 (__m256i a, __m256i b, const int imm8)
    ///   VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
    /// </summary>
    public static Vector256<ulong> CarrylessMultiply(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
}


/// <summary>
/// This class provides access to Intel AVX-512 VPCLMULQDQ hardware instructions via intrinsics
/// </summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class Pclmulqdq512 : Pclmulqdq
{
    internal Pclmulqdq256() { }

    // This would depend on the VPCLMULQDQ + AVX512F CPUID bits
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Pclmulqdq.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m512i _mm512_clmulepi64_si128 (__m512i a, __m512i b, const int imm8)
    ///   VPCLMULQDQ ymm1, ymm2, ymm3/m512, imm8
    /// </summary>
    public static Vector512<long> CarrylessMultiply(Vector512<long> left, Vector512<long> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
    /// __m512i _mm512_clmulepi64_si128 (__m512i a, __m512i b, const int imm8)
    ///   VPCLMULQDQ ymm1, ymm2, ymm3/m512, imm8
    /// </summary>
    public static Vector512<ulong> CarrylessMultiply(Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
}

API Usage

Examples of vectorized CRC32 implementations using the equivalent C intrinsics abound. One such example: https://github.com/corsix/fast-crc32/blob/main/sample_avx512_vpclmulqdq_crc32c_v4s5x3.c

Alternative Designs

The Pclmulqdq256 and Pclmulqdq256 could be nested under Pclmulqdq rather than being top-level classes inheriting from it. Since this ISA includes only a single instruction, that may be preferable.

A case could be made for making Avx the base of Pclulqdq256, as VEX encoding is required for vpclmulqdq. Likewise, Pclulqdq512 could have Avx512F as a base given its requirement of EVEX encoding. However, the relationship will change with AVX10, where EVEX support will not imply 512-bit vector support.

Risks

N/A

Author:	saucecontrol
Assignees:	-
Labels:	`api-suggestion`, `area-System.Runtime.Intrinsics`
Milestone:	-

MichalPetryka · 2023-12-08T19:22:52Z

Pclmulqdq512 could maybe be changed to inherit from Pclmulqdq256 if it implies that that's supported too.

bartonjs · 2024-01-09T19:54:59Z

Video

Looks good as proposed.

namespace System.Runtime.Intrinsics.X86;

/// <summary>
/// This class provides access to Intel VPCLMULQDQ hardware instructions via intrinsics
/// </summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class Pclmulqdq256 : Pclmulqdq
{
    internal Pclmulqdq256() { }

    // This would depend on the VPCLMULQDQ CPUID bit for VEX encoding and VPCLMULQDQ + AVX512VL for EVEX
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Pclmulqdq.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m256i _mm256_clmulepi64_si128 (__m256i a, __m256i b, const int imm8)
    ///   VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
    /// </summary>
    public static Vector256<long> CarrylessMultiply(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
    /// __m256i _mm256_clmulepi64_si128 (__m256i a, __m256i b, const int imm8)
    ///   VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
    /// </summary>
    public static Vector256<ulong> CarrylessMultiply(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
}


/// <summary>
/// This class provides access to Intel AVX-512 VPCLMULQDQ hardware instructions via intrinsics
/// </summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class Pclmulqdq512 : Pclmulqdq
{
    internal Pclmulqdq512() { }

    // This would depend on the VPCLMULQDQ + AVX512F CPUID bits
    public static new bool IsSupported { get => IsSupported; }

    [Intrinsic]
    public new abstract class X64 : Pclmulqdq.X64
    {
        internal X64() { }

        public static new bool IsSupported { get => IsSupported; }
    }

    /// <summary>
    /// __m512i _mm512_clmulepi64_si128 (__m512i a, __m512i b, const int imm8)
    ///   VPCLMULQDQ zmm1, zmm2, zmm3/m512, imm8
    /// </summary>
    public static Vector512<long> CarrylessMultiply(Vector512<long> left, Vector512<long> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
    /// __m512i _mm512_clmulepi64_si128 (__m512i a, __m512i b, const int imm8)
    ///   VPCLMULQDQ zmm1, zmm2, zmm3/m512, imm8
    /// </summary>
    public static Vector512<ulong> CarrylessMultiply(Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected] byte control) => CarrylessMultiply(left, right, control);
}

saucecontrol · 2024-10-19T03:20:38Z

For consistency with the AVX10 surface (and #86952), this should probably be revised to

namespace System.Runtime.Intrinsics.X86;

public abstract class Pclmulqdq : Sse2
{
    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<long> CarrylessMultiply(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<ulong> CarrylessMultiply(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<long> CarrylessMultiply(Vector512<long> left, Vector512<long> right, [ConstantExpected] byte control);
        public static Vector512<ulong> CarrylessMultiply(Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected] byte control);
    }
}

bartonjs · 2024-10-22T17:42:48Z

Video

Looks good as proposed

namespace System.Runtime.Intrinsics.X86;

public abstract class Pclmulqdq : Sse2
{
    public abstract class V256
    {
        public static new bool IsSupported { get; }

        public static Vector256<long> CarrylessMultiply(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<ulong> CarrylessMultiply(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        public static Vector512<long> CarrylessMultiply(Vector512<long> left, Vector512<long> right, [ConstantExpected] byte control);
        public static Vector512<ulong> CarrylessMultiply(Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected] byte control);
    }
}

saucecontrol added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 7, 2023

dotnet-issue-labeler bot added the area-System.Runtime.Intrinsics label Dec 7, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Dec 7, 2023

tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Dec 8, 2023

bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jan 9, 2024

tannergooding mentioned this issue Feb 8, 2024

[API Proposal]: Expose System.Runtime.Intrinsics.X86.Aes256 and Aes512 #86952

Open

stephentoub added this to the 10.0.0 milestone Jul 19, 2024

tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-approved API was approved in API review, it can be implemented labels Oct 21, 2024

bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 22, 2024

saucecontrol mentioned this issue Oct 23, 2024

Add VPCLMULQDQ intrinsics #109137

Merged

dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Oct 23, 2024

tannergooding closed this as completed in #109137 Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: VPCLMULQDQ Intrinsics #95772

[API Proposal]: VPCLMULQDQ Intrinsics #95772

saucecontrol commented Dec 7, 2023 •

edited by tannergooding

Loading

ghost commented Dec 7, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

MichalPetryka commented Dec 8, 2023

bartonjs commented Jan 9, 2024 •

edited by dotnet-api-review bot

Loading

saucecontrol commented Oct 19, 2024 •

edited

Loading

bartonjs commented Oct 22, 2024 •

edited by dotnet-api-review bot

Loading

[API Proposal]: VPCLMULQDQ Intrinsics #95772

[API Proposal]: VPCLMULQDQ Intrinsics #95772

Comments

saucecontrol commented Dec 7, 2023 • edited by tannergooding Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

ghost commented Dec 7, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

MichalPetryka commented Dec 8, 2023

bartonjs commented Jan 9, 2024 • edited by dotnet-api-review bot Loading

saucecontrol commented Oct 19, 2024 • edited Loading

bartonjs commented Oct 22, 2024 • edited by dotnet-api-review bot Loading

saucecontrol commented Dec 7, 2023 •

edited by tannergooding

Loading

bartonjs commented Jan 9, 2024 •

edited by dotnet-api-review bot

Loading

saucecontrol commented Oct 19, 2024 •

edited

Loading

bartonjs commented Oct 22, 2024 •

edited by dotnet-api-review bot

Loading