Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the barebones support for using embedded masking with AVX512 #97675

Merged
merged 4 commits into from
Jan 31, 2024

Conversation

tannergooding
Copy link
Member

As per the title, this adds the minimal recognition/support for emitting AVX-512 instructions with embedded masking support.

It currently only targets Avx512F.Add but can be trivially expanded to other instructions via a table driven approach after this first PR is merged.

For something like:

public static Vector512<double> M(Vector512<double> x, Vector512<double> y)
{
    return Vector512.ConditionalSelect(Vector512.Equals(x, x), x + x, y + y);
}

We will emit the following:

vmovups  zmm0, zmmword ptr [rdx]
vcmppd   k1, zmm0, zmm0, 0
vmovups  zmm1, zmmword ptr [r8]
vaddpd   zmm1, zmm1, zmm1
vaddpd   zmm1 {k1}, zmm0, zmm0   ; <--- embedded mask used here
vmovups  zmmword ptr [rcx], zmm1
mov      rax, rcx

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 29, 2024
@ghost ghost assigned tannergooding Jan 29, 2024
@ghost
Copy link

ghost commented Jan 29, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

As per the title, this adds the minimal recognition/support for emitting AVX-512 instructions with embedded masking support.

It currently only targets Avx512F.Add but can be trivially expanded to other instructions via a table driven approach after this first PR is merged.

For something like:

public static Vector512<double> M(Vector512<double> x, Vector512<double> y)
{
    return Vector512.ConditionalSelect(Vector512.Equals(x, x), x + x, y + y);
}

We will emit the following:

vmovups  zmm0, zmmword ptr [rdx]
vcmppd   k1, zmm0, zmm0, 0
vmovups  zmm1, zmmword ptr [r8]
vaddpd   zmm1, zmm1, zmm1
vaddpd   zmm1 {k1}, zmm0, zmm0   ; <--- embedded mask used here
vmovups  zmmword ptr [rcx], zmm1
mov      rax, rcx
Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines +771 to +773
// We have several pieces of information we need to encode but which are only applicable
// to a subset of instrDescs. To accommodate that, we define a several _idCustom# bitfields
// and then some defineds to make accessing them simpler
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bits are "expensive" and impact the maximum size of "small" constants, so I opted to repurpose these existing 3 bits that are only used for IF_LABEL, IF_METHOD, and related formats. They will never conflict with the SIMD instructions so this ends up being a nice way to fit it in, IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No TP impact, so this works!

{
regNumber maskReg = static_cast<regNumber>(id->idGetEvexAaaContext() + KBASE);

if (maskReg == REG_K0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K0 is special and basically means "don't mask"

Comment on lines +19791 to +19794
// TODO-AVX512F-CQ: Expand this to the full set of APIs and make it table driven
// using IsEmbMaskingCompatible. For now, however, limit it to some explicit ids
// for prototyping purposes.
return (AsHWIntrinsic()->GetHWIntrinsicId() == NI_AVX512F_Add);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main bit that actually drives which intrinsics can support embedded masking. In actuality this is most intrinsics and so the actual flag we'll add to the table is HW_Flag_EmbMaskingIncompatible to indicate the few that can't use it.

Just wanted to get the baseline support up first and then go and plumb through the minor connecting points after to avoid any excess churn based on feedback, etc.

@@ -556,6 +556,10 @@ enum GenTreeFlags : unsigned int
GTF_MDARRLEN_NONFAULTING = 0x20000000, // GT_MDARR_LENGTH -- An MD array length operation that cannot fault. Same as GT_IND_NONFAULTING.

GTF_MDARRLOWERBOUND_NONFAULTING = 0x20000000, // GT_MDARR_LOWER_BOUND -- An MD array lower bound operation that cannot fault. Same as GT_IND_NONFAULTING.

#ifdef TARGET_XARCH
GTF_HW_EM_OP = 0x10000000, // GT_HWINTRINSIC -- node is used as an operand to an embedded mask
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to figure out a way to do this without a flag, but we have several intrinsics that can be contained for a multiple of reasons and so it ends up being a bit cleaner to just tag the node that is contained for embedded masking purposes to differentiate it.

After we finish plumbing through the rest of the instructions, we might find its actually not needed (we might be able to introduce helper IDs to disambiguate for example) and we can remove it then if appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be under ifdef FEATURE_HW_INTRINSICS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be, but there isn't a strong need for it, so I'd prefer to handle in a follow up to avoid the additional CI churn (will fix it in this PR if I need to push anything for the seemingly unrelated tls_InlinedThreadStatic failure though)

{
GenTree* op2 = node->Op(2);

if (op2->IsEmbMaskOp())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much like with the embedded rounding support, this approach allows us to avoid recursion or tracking unnecessary state separately. Instead we can recognize the special scenario up front and extract the info to the tracked insOpt and pass it through. This allows us to reuse all the existing code paths.

Comment on lines +2497 to +2498
// TODO-AVX512-CQ: Ensure we can support embedded operations on RMW intrinsics
assert(!op2->isRMWHWIntrinsic(compiler));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need some more handling for cases like FMA which are RMW already. Those cases may need to emit a movaps with embedded masking + the underlying instruction with embedded masking to ensure we get all the correct codegen.

@ryujit-bot
Copy link

Diff results for #97675

Assembly diffs

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,249,675 contexts (981,298 MinOpts, 1,268,377 FullOpts).

MISSED contexts: 134 (0.01%)

Overall (-6 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,142,945 +0
benchmarks.run_tiered.linux.x64.checked.mch 15,896,118 +0
realworld.run.linux.x64.checked.mch 13,051,281 -6
FullOpts (-6 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 47,800,900 +0
benchmarks.run_tiered.linux.x64.checked.mch 3,637,734 +0
realworld.run.linux.x64.checked.mch 12,662,399 -6

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,098,432 contexts (926,221 MinOpts, 1,172,211 FullOpts).

MISSED contexts: 138 (0.01%)

Overall (-151 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,730,756 +0
benchmarks.run_pgo.windows.x64.checked.mch 35,773,696 +0
benchmarks.run_tiered.windows.x64.checked.mch 12,546,772 +0
libraries.pmi.windows.x64.checked.mch 61,645,293 -16
libraries_tests.run.windows.x64.Release.mch 278,809,463 +2
realworld.run.windows.x64.checked.mch 13,946,185 -137
FullOpts (-151 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,730,393 +0
benchmarks.run_pgo.windows.x64.checked.mch 21,741,615 +0
benchmarks.run_tiered.windows.x64.checked.mch 3,451,035 +0
libraries.pmi.windows.x64.checked.mch 61,531,772 -16
libraries_tests.run.windows.x64.Release.mch 106,634,847 +2
realworld.run.windows.x64.checked.mch 13,559,576 -137

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for linux/x64 ran on windows/x64

MinOpts (-0.01% to 0.00%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.01%
realworld.run.linux.x64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@tannergooding tannergooding force-pushed the avx512-embed-mask branch 3 times, most recently from 48219f5 to 9c2a71f Compare January 30, 2024 04:46
@ryujit-bot
Copy link

Diff results for #97675

Assembly diffs

Assembly diffs for windows/x64 ran on linux/x64

Diffs are based on 2,098,432 contexts (926,221 MinOpts, 1,172,211 FullOpts).

MISSED contexts: 138 (0.01%)

Overall (-151 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,730,756 +0
benchmarks.run_pgo.windows.x64.checked.mch 35,773,696 +0
benchmarks.run_tiered.windows.x64.checked.mch 12,546,772 +0
libraries.pmi.windows.x64.checked.mch 61,645,293 -16
libraries_tests.run.windows.x64.Release.mch 278,809,463 +2
realworld.run.windows.x64.checked.mch 13,946,185 -137
FullOpts (-151 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x64.checked.mch 8,730,393 +0
benchmarks.run_pgo.windows.x64.checked.mch 21,741,615 +0
benchmarks.run_tiered.windows.x64.checked.mch 3,451,035 +0
libraries.pmi.windows.x64.checked.mch 61,531,772 -16
libraries_tests.run.windows.x64.Release.mch 106,634,847 +2
realworld.run.windows.x64.checked.mch 13,559,576 -137

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch -0.01%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.linux.x64.checked.mch -0.01%
FullOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Throughput diffs for windows/x86 ran on windows/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97675

Assembly diffs

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,249,675 contexts (981,298 MinOpts, 1,268,377 FullOpts).

MISSED contexts: 134 (0.01%)

Overall (-6 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,142,945 +0
benchmarks.run_tiered.linux.x64.checked.mch 15,896,118 +0
realworld.run.linux.x64.checked.mch 13,051,281 -6
FullOpts (-6 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 47,800,900 +0
benchmarks.run_tiered.linux.x64.checked.mch 3,637,734 +0
realworld.run.linux.x64.checked.mch 12,662,399 -6

Details here


@ryujit-bot
Copy link

Diff results for #97675

Assembly diffs

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,249,675 contexts (981,298 MinOpts, 1,268,377 FullOpts).

MISSED contexts: 134 (0.01%)

Overall (-6 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 69,142,945 +0
benchmarks.run_tiered.linux.x64.checked.mch 15,896,118 +0
realworld.run.linux.x64.checked.mch 13,051,281 -6
FullOpts (-6 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 47,800,900 +0
benchmarks.run_tiered.linux.x64.checked.mch 3,637,734 +0
realworld.run.linux.x64.checked.mch 12,662,399 -6

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,227,722 contexts (987,923 MinOpts, 1,239,799 FullOpts).

MISSED contexts: 138 (0.01%)

Overall (-151 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 47,041,738 +0
benchmarks.run.windows.x64.checked.mch 8,730,756 +0
benchmarks.run_pgo.windows.x64.checked.mch 35,773,696 +0
benchmarks.run_tiered.windows.x64.checked.mch 12,546,772 +0
libraries.pmi.windows.x64.checked.mch 61,645,293 -16
libraries_tests.run.windows.x64.Release.mch 278,809,463 +2
realworld.run.windows.x64.checked.mch 13,946,185 -137
FullOpts (-151 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 28,550,689 +0
benchmarks.run.windows.x64.checked.mch 8,730,393 +0
benchmarks.run_pgo.windows.x64.checked.mch 21,741,615 +0
benchmarks.run_tiered.windows.x64.checked.mch 3,451,035 +0
libraries.pmi.windows.x64.checked.mch 61,531,772 -16
libraries_tests.run.windows.x64.Release.mch 106,634,847 +2
realworld.run.windows.x64.checked.mch 13,559,576 -137

Details here


Throughput diffs

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.01% to -0.00%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch -0.01%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.linux.x64.checked.mch -0.01%
FullOpts (-0.01% to -0.00%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Throughput diffs for windows/x86 ran on windows/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%

Details here


Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall. I am expecting to see diffs similar to what you have highlighted in PR description, but most of them are changing the registers. what makes those diffs?

image

Comment on lines +771 to +773
// We have several pieces of information we need to encode but which are only applicable
// to a subset of instrDescs. To accommodate that, we define a several _idCustom# bitfields
// and then some defineds to make accessing them simpler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No TP impact, so this works!

src/coreclr/jit/emit.h Show resolved Hide resolved
src/coreclr/jit/emit.h Show resolved Hide resolved
@@ -556,6 +556,10 @@ enum GenTreeFlags : unsigned int
GTF_MDARRLEN_NONFAULTING = 0x20000000, // GT_MDARR_LENGTH -- An MD array length operation that cannot fault. Same as GT_IND_NONFAULTING.

GTF_MDARRLOWERBOUND_NONFAULTING = 0x20000000, // GT_MDARR_LOWER_BOUND -- An MD array lower bound operation that cannot fault. Same as GT_IND_NONFAULTING.

#ifdef TARGET_XARCH
GTF_HW_EM_OP = 0x10000000, // GT_HWINTRINSIC -- node is used as an operand to an embedded mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be under ifdef FEATURE_HW_INTRINSICS?

HW_Flag_EmbRoundingCompatible = 0x10000000,

// The intrinsic is an embedded masking incompatible intrinsic
HW_Flag_EmbMaskingIncompatible = 0x20000000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise this? should it be under #ifdef FEATURE_HW_INTRINSICS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the hwintrinsics header, so its already effectively FEATURE_HW_INTRINSICS

src/coreclr/jit/emitxarch.cpp Show resolved Hide resolved
@tannergooding
Copy link
Member Author

I am expecting to see diffs similar to what you have highlighted in PR description, but most of them are changing the registers. what makes those diffs?

The diffs you highlighted are from a mistake in one of the earlier commits that I fixed this morning and I expect them to go away in the very latest.

I wouldn't expect really any diffs to show up in our current code since we avoided several of the patterns while the support didn't exist and it only applies to addps right now. In the follow up PR that expands this to the full table, I would expect a few additional diffs to show up and then us being able to take explicit advantage of the support in the managed code with a few targeted refactorings

@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for linux/x64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch -0.01%
realworld.run.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%

Details here


Throughput diffs for windows/x86 ran on linux/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for linux/x64 ran on linux/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%

Throughput diffs for osx/arm64 ran on linux/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch +0.01%

Details here


@tannergooding
Copy link
Member Author

@dotnet/runtime-infrastructure, @MichalStrehovsky I'm seeing some failures here like the following:

relocation R_X86_64_PLT32 cannot refer to absolute symbol: tls_InlinedThreadStatics

It's unclear what's causing this and I don't see any issues elsewhere. The only reference to that variable I can find is in the aot ILCompiler: src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/TlsRootNode.cs

But of course there isn't any changes to that or related areas in this PR. Any help would be appreciated

@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for linux/x64 ran on linux/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%

Details here


Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Details here


Throughput diffs for windows/x86 ran on windows/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


@kunalspathak
Copy link
Member

But of course there isn't any changes to that or related areas in this PR. Any help would be appreciated

https://github.com/dotnet/runtime/pull/97675/files#diff-125fb0b9396b0e85ee82c6d592cdf1750778a7cff38ad382f7c2966ef588dce1R780 might have affected it. With the bits overriding, it might be that we are not setting the _idTlsGD correctly and hence not adding the required relocation which is what the error is complaining about.

@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for windows/x86 ran on windows/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


@tannergooding
Copy link
Member Author

Thanks for the help Kunal! -- Issue was that a check was slightly wrong and so idIsTlsGD wasn't being checked for an instruction it should've been

@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Throughput diffs for windows/x64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
smoke_tests.nativeaot.windows.x64.checked.mch +0.01%

Details here


Throughput diffs for windows/x86 ran on linux/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Throughput diffs for windows/x64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
smoke_tests.nativeaot.windows.x64.checked.mch +0.01%

Details here


Throughput diffs for windows/x86 ran on windows/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #97675

Throughput diffs

Throughput diffs for windows/x86 ran on windows/x86

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.x86.checked.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding tannergooding merged commit cd460db into dotnet:main Jan 31, 2024
136 of 139 checks passed
@tannergooding tannergooding deleted the avx512-embed-mask branch January 31, 2024 05:21
bool IsEmbMaskOp()
{
bool result = (gtFlags & GTF_HW_EM_OP) != 0;
assert(!result || (gtOper == GT_HWINTRINSIC));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is invalid: the same bit used by GTF_HW_EM_OP is used by other flags for other GenTree types (e.g., GTF_OVERFLOW).

You could have:

assert(gtOper == GT_HWINTRINSIC);

if you only allow HWINTRINSIC nodes to call this. Or, you need it to be dynamic:

if (gtOper == GT_HWINTRINSIC)
{
    result = (gtFlags & GTF_HW_EM_OP) != 0;
}
else
{
    result = false;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with #98306

@github-actions github-actions bot locked and limited conversation to collaborators Mar 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants