-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Vector{Size}<T>.AllBitsSet #33924
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
I believe we can use this in a couple places: runtime/src/libraries/System.Collections/src/System/Collections/BitArray.cs Lines 542 to 556 in 963b158
|
All of the failures seem to be either of those:
|
Would it be possible for the JIT to detect such cases and emit the |
|
So, it appears that this is happening because methods that can have different instructions depending on types need an extra runtime/src/coreclr/src/jit/hwintrinsic.cpp Line 284 in 24c4cb1
What does this really mean? |
CC. @briansull who added the initial VN support for HWIntrinsics in #31834 |
Also CC. @echesakovMSFT and @CarolEidt |
This prevents us from CSE-ing two different SIMD operations that would be implemented using different instructions. I will investigate the assert and provide guidance
|
else | ||
{ | ||
assert(varTypeIsIntegral(baseType) || !compiler->compSupports(InstructionSet_AVX)); | ||
emit->emitIns_SIMD_R_R_R(ins, attr, targetReg, targetReg, targetReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is used for the comparison for float/double
when AVX isn't supported? Based on the instruction table, this is cmpps/cmppd
still; but I don't think that is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhhh, good catch.
I wonder how we'd change it here, though? hard-code the instruction here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to special-case one of the paths.
That's also possibly an interesting case for CSE... If the table gives a particular set of instructions but codegen can optionally special-case something further or treat it slightly differently, how should that be handled @briansull ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ValueNumber & CSE phase only uses the table to determine if the result type needs to be an input when generating the ValueNumber for the node.
If for two trees that have the same operation (i.e. GT_MUL or the same HW intrinsic) and all of its operands have the identical value numbers, then normally we would give the same value number.
For GT_CAST we incorporate the castto type as an extra operand to the value number.
I noticed that we also needed to do this for some SIMD and HW instrinsic nodes.
I determined that safest and easiest way to to do this was to examine the table of instructions and always incorporate the result type when there were two or more different valid instructions listed for a SIMD or HW instrinsic node.
This process is used for x86 and x64, I believe that we need to be more conservative on ARM64, so we always include an extra result type operand.
As as long as the table has two or more different instructions we will be good.
It would be bad to list all the same instructions or an illegal instruction and then use hand code logic to decide on the instruction. If you want to record that an entry relies upon hand coded logic I would recommend using a brkpt or nop instruction as a marker for this behavior in the table. We can add a check for this and assume that different instructions could be generated. I don't think that it should matter if AVX is supported or not when deciding if we need an extra result type operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it should matter if AVX is supported or not when deciding if we need an extra result type operand.
This would be a case of, when not using VEX
encoding (SSE-SSE4.2) all types would use the same instruction. But when using the VEX
encoding (AVX+), float
/double
would use different instructions that are more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment that float/double may use different instructions depending on the encoding available. Also hardcoded the instruction inside the integer / non-VEX path.
I did some initial debugging and found that
|
The fix is: line 8569 in valuenum.cpp in function:
|
An interesting failure... |
@Gnbrkm41 that's a new phase I added recently, probably a missing safety check. Let me take a look. |
Might be simplest for now to disable throw helper merging in this case. Will keep looking. |
@Gnbrkm41 see if this patch fixes your failures. Haven't validated it yet -- trying to create a simple local repro, but no luck so far. index 8ba6d2f8cc7..846677f482f 100644
--- a/src/coreclr/src/jit/flowgraph.cpp
+++ b/src/coreclr/src/jit/flowgraph.cpp
@@ -25869,6 +25869,14 @@ void Compiler::fgTailMergeThrows()
// and there is less jumbled flow to sort out later.
for (BasicBlock* block = fgLastBB; block != nullptr; block = block->bbPrev)
{
+ // Workaround: don't consider try entry blocks as candidates
+ // for merging; if the canonical throw is later in the same try,
+ // we'll create invalid flow.
+ if ((block->bbFlags & BBF_TRY_BEG) != 0)
+ {
+ continue;
+ }
+
// For throw helpers the block should have exactly one statement....
// (this isn't guaranteed, but seems likely)
Statement* stmt = block->firstStmt(); |
Ok, I can repro now. I'll put up a fix. |
Otherwise we may create a branch into the middle of a try. We could fix the transform, but if the first block of a try has a throw helper call, the rest of the try will subsequently be removed, so merging is not all that interesting. Addresses an issue that came up in dotnet#33924.
Otherwise we may create a branch into the middle of a try. We could fix the transform, but if the first block of a try has a throw helper call, the rest of the try will subsequently be removed, so merging is not all that interesting. Addresses an issue that came up in #33924.
Does this need an arm64 implementation? |
It will need one for |
@tannergooding your best bet is to use
will both give you a vectors with all bits set.
for only the bottom half of the vector. |
Note that I got slightly busy recently. I hope I have some time to work on intrinsifying ARM64, but I am not sure how quick can happen; Do you think it'll be okay if I can attempt the ARM part later (hopefully soon, next weekend?) and open a follow up PR? |
Co-Authored-By: Brian Sullivan <briansul@microsoft.com>
This reverts commit a708e533368cb6b8f71aa0ada3d931611dea1722.
* compExactlyDependsOn(isa) ... Should never be used in an assert.
* Hard-code ivals * Remove custom importation logic (and replace with HW_Category_SimpleSIMD) * Insert newlines
Just rebased, resolved conflicts and addressed feedbacks. I've locally checked that the changes for both ARM and xarch do in fact generate appropriate instructions and all the tests pass; Could I get a final review on this? Thanks! |
@@ -281,7 +281,9 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic, | |||
|
|||
if (!varTypeIsArithmetic(baseType)) | |||
{ | |||
assert((intrinsic == NI_Vector64_AsByte) || (intrinsic == NI_Vector128_As)); | |||
assert((intrinsic == NI_Vector64_AsByte) || (intrinsic == NI_Vector128_As) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echesakovMSFT, not related to this PR, but I think the first check in this is wrong. It should be intrinsic == NI_Vector64_As
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks so, however, I don't see Vector64.As intrinsic in hwintrinsiclistarm64.h, I will update #33308 to include this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -281,7 +281,9 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic, | |||
|
|||
if (!varTypeIsArithmetic(baseType)) | |||
{ | |||
assert((intrinsic == NI_Vector64_AsByte) || (intrinsic == NI_Vector128_As)); | |||
assert((intrinsic == NI_Vector64_AsByte) || (intrinsic == NI_Vector128_As) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks so, however, I don't see Vector64.As intrinsic in hwintrinsiclistarm64.h, I will update #33308 to include this.
I did four attempts to re-run testing for this PR - runtime (Installer Build and Test coreclr FreeBSD_x64 Debug) keeps failing with
Everything else is green, even though is reported as "non finished" - https://dev.azure.com/dnceng/public/_build/results?buildId=621331&view=logs&j=41e34ca2-d347-5e65-d632-b45724e78141. Merging so it would not conflict with #35594. Thanks @Gnbrkm41 for contribution! |
Resolves #30659
Note: I strongly advise you to ignore the changes made in c9bd4a9 "Auto-generate tests": They are automatically generated from the template and adds lots of LoCs.