-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep correct fieldSeq
for 0-offset fields.
#32085
Conversation
bd792b2
to
e1b2351
Compare
Hmm, I suspect that the "proper" way to do this is to add the address to the "zero offset field map". But then I happen to think that this kind of side maps are a terrible idea and that the it's best to keep this information in the IR, in special nodes -
Maybe this will also fix 13548? |
Thanks, have not seen that yet. I think yes. Link #13548. |
I think I do not understand, could you give me an example? Is it like
0-level in eval is not required, as non-CSE flag. The main issue that this PR solves is CSE when INDs have non-compatible types, but identical trees under IND nodes, after this change it can't happen because one tree will have |
Basically - an unary operator that also contains an
|
src/coreclr/src/jit/morph.cpp
Outdated
// Add the fieldSeq offset for all fields, even for 0 offset. | ||
// We will read these field seq when we ask for `CORINFO_CLASS_HANDLE` for these trees. | ||
// We can use `fieldSeq` from the dst because fieldHnd for src and dst must match. | ||
GenTreeIntCon* fldOffset = new (this, GT_CNS_INT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you could shorten this a bit by using the gtNewIconNode(unsigned fieldOffset, FieldSeqNode* fieldSeq)
overload I added a while ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That helped me to find another case where we did not create if the offset was zero:
runtime/src/coreclr/src/jit/morph.cpp
Line 6295 in 71d2911
GenTree* fldOffsetNode = new (this, GT_CNS_INT) GenTreeIntCon(TYP_INT, fldOffset, fieldSeq); |
the strange thing is that it was created with TYP_INT
instead of TYP_I_IMPL
. probably does't matter, but I will check the diffs.
src/coreclr/src/jit/lower.cpp
Outdated
// We should not access it after lowering, so we can drop it now. | ||
GenTree* zero = nullptr; | ||
GenTree* value = nullptr; | ||
if (op1->IsCnsIntOrI() && (op1->AsIntCon()->IconValue() == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsIntegralConst(0)
?
src/coreclr/src/jit/lower.cpp
Outdated
zero = op1; | ||
value = op2; | ||
} | ||
else if (op2->IsCnsIntOrI() && (op2->AsIntCon()->IconValue() == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically it's enough to check only op2
for constants because gtSetEvalOrder
tries to put constants in the second op.
src/coreclr/src/jit/lower.cpp
Outdated
|
||
if (zero != nullptr) | ||
{ | ||
LIR::Use use; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a "use" variable, you may want to reuse it.
I have checked how it works with "zero offset field map" and you are right, overall results are better. The only downside is that
The first indirection is done on
It is almost 3am, but my brain says that |
Working late? Not good. Does that lclvar contain a pointer (byref) to a local variable perhaps? VN attempts to track such pointers, including the associated field sequence, using The interesting part is that |
Clean up the existing code.
And don't add it as `ADD(,0)` in another.
e1b2351
to
e820fa8
Compare
PTAL @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments
@@ -4498,12 +4504,43 @@ bool Lowering::TryCreateAddrMode(GenTree* addr, bool isContainable) | |||
// Arguments: | |||
// node - the node we care about | |||
// | |||
void Lowering::LowerAdd(GenTreeOp* node) | |||
// Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to be in the (missing) function header for
GenTree* Lowering::LowerNode(GenTree* node)
GenTree* next = LowerAdd(node->AsOp()); | ||
if (next != nullptr) | ||
{ | ||
return next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is missing a proper function header comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see you capture 2-3 examples of regressions, and file new (or annotate existing) issues to ensure that they are addressed going forward.
{ | ||
// Append the zero field sequences | ||
zeroFieldSeq = GetFieldSeqStore()->Append(existingZeroOffsetFldSeq, zeroFieldSeq); | ||
} | ||
// Transfer the annotation to the new GT_ADDR node. | ||
fgAddFieldSeqForZeroOffset(op1, zeroFieldSeq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an assert here that zeroFieldSeq != nullptr
or is that self-evident from the fact that isZeroOffset
is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add it, but it looks self-evident.
Both values are initialized a few lines above with:
bool isZeroOffset = GetZeroOffsetFieldMap()->Lookup(tree, &zeroFieldSeq);
so isZeroOffset == true
gurantees zeroFieldSeq != nullptr
.
The problem with that code was that the condition was written before fgAddFieldSeqForZeroOffset
was introduced. When fgAddFieldSeqForZeroOffset
was created that logic was encapsulated there, but the condition was not deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification
@sandreenko do you see any systematic fix for the general issue of losing these zero offset annotations? This is not the first time we've run into this. Or is there at least some way to write a checker? |
Yes, so there are 2 types of regressions:
before:
def:
N007 ( 20, 14) CSE #4 (def)[000300] --CXG------- \--* IND long
N006 ( 18, 12) [000294] --CXG--N---- \--* ADD byref
N004 ( 17, 11) CSE #3 (def)[000295] --CXG------- +--* IND ref
N003 ( 15, 9) [000296] --CXG--N---- | \--* ADD byref $282
N001 ( 14, 5) CSE #2 (def)[000297] H-CXG------- | +--* CALL help r2r_ind byref HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280
N002 ( 1, 4) [000298] ------------ | \--* CNS_INT int 984 Fseq[MaxDaylightDelta] $44
N005 ( 1, 1) [000299] ------------ \--* CNS_INT long 8 Fseq[#FirstElem] $340
use: after: previous use: so before both indirections had VN
|
That is a great question, as I discussed with Mike above, I think we could have However, my coming changes will help with that as well, we will have real And yes, I would like to refactor |
FieldSeqNode* zeroOffsetFldSeq = nullptr; | ||
if (GetZeroOffsetFieldMap()->Lookup(srcAddr, &zeroOffsetFldSeq)) | ||
{ | ||
fieldSeqVN = | ||
vnStore->FieldSeqVNAppend(fieldSeqVN, vnStore->VNForFieldSeq(zeroOffsetFldSeq)); | ||
// Check that the zero offset field seq was attached for `srcAddr`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an x86 assert failure, that Brian helped me to debug and fix, the PR was updated.
I have also run SPMI tests over pri1 and internal collections and they have not shown any other asserts.
The previous code was added before TFS so I don't know the context, but I suspect it was dead code because we were checking this GetZeroOffsetFieldMap
for srcAddr
twice and it was either both true
and then the assert should happen, of both false
and we did not pass the check.
The failures are infra. |
This reverts commit 169c24e.
…31834) * Added ValueNumbering support for GT_SIMD and GT_HWINTRINSIC tree nodes * Allow SIMD and HW Intrinsics to be CSE candidates * Correctness fix for optAssertionPropMain - Zero out the bbAssertionIn values, as these can be referenced in RangeCheck::MergeAssertion and this is shared state with the CSE phase: bbCseIn * Improve the VNFOA_ArityMask * Update to use the new TARGET macros * Include node type when value numbering SIMDIntrinsicInit Mutate the gloabl heap when performing a HW_INTRINSIC memory store operation Printing of SIMD constants only support 0 * Disable CSE's for some special HW_INTRINSIC categories * Code review feedback * Record csdStructHnd; // The class handle, currently needed to create a SIMD LclVar in PerformCSE * Instead of asserting on a struct handle mismatch, we record it in csdStructHndMismatch and avoid making the candidate into a CSE * Fix the JITDUMP messages to print the CseIndex * add check for (newElemStructHnd != NO_CLASS_HANDLE) * Additional checks for SIMD struct types when setting csdStructHnd Added Mismatched Struct Handle assert in ConsiderCandidates * fix GenTreeSIMD::OperIsMemoryLoad for ARM64 Removed ismatched Struct Handle assert * Fix the printing of BitSets on Linux, change the printf format specifier * Added check for simdNode->OperIsMemoryLoad()) to fgValueNumberSimd Added bool OperIsMemoryLoad() to GenTreeSIMD, returns true for SIMDIntrinsicInitArray Added valuenumfuncs.h to src/coreclr/src/jit/CMakeLists.txt * Avoid calling gtGetStructHandleIfPresent to set csdStructHnd when we have a GT_IND node * Fix check for (newElemStructHnd != hashDsc->csdStructHnd) * added extra value number argument VNF_SimdType for Most SIMD operations added VNF_SimdType // A value number function to compose a SIMD type added vnDumpSimdType * Added bool methods vnEncodesResultTypeForSIMDIntrinsic and vnEncodesResultTypeForHWIntrinsic these return true when a SIMD or HW Instrinsic will use an extra arg to record the result type during value numbering Changes the ValueNumFuncDef to set the arity to zero when a -1 value is passed in Updated InitValueNumStoreStatics to fixup the arity of SIMD or HW Instrinsic that have an extra arg to record the result type Allow a type mismatchj when we have a GT_BLK as the lhs of an assignment, as it is used to zero out Simd structs * Fix for SIMD_WidenLo arg count * fix typo * Fix x86 build breaks Fix SIMD_WidenHi * Added method header comment for vnEncodesResultTypeForHWIntrinsic Added & VNFOA_ArityMask when assigning to vnfOpAttribs[] * Codereview feedback and some more comments * fix typo * Moved the code that sets the arg count for the three SIMD intrinsics * clang-format * Adjust CSE for SIMD types that are live across a call * Proposed fix for #32085 * Revert "Proposed fix for #32085" This reverts commit 169c24e. * Added better comments for optcse SIMD caller saved register heuristics * Added CONFIG_INTEGER: JitDisableSimdVN, Default 0, ValueNumbering of SIMD nodes and HW Intrinsic nodes enabled If 1, then disable ValueNumbering of SIMD nodes If 2, then disable ValueNumbering of HW Intrinsic nodes If 3, disable both SIMD and HW Intrinsic nodes * Moved JitDisableSimdVN from DEBUG to RETAIL
Note: The PR was rewritten, the body description has changed.
There are two changes in that PR:
e3f0fe0 Adds missing zero-offset field seq, this is a correctness fix;
e820fa8 Optimize
ADD(val, 0)
simple cases to avoidLEA(val, 0)
creation, fixes #13548.Diffs for SPC x64:
Diffs for FX libraries x64:
The diffs for FX libaries x64 and their analysis.
The improved methods are now optimizing
ADD(val, 0)
that we were creating for dst in the past (and keep creating), note there was nooffset != 0
check thereruntime/src/coreclr/src/jit/morph.cpp
Line 10344 in 25c092a
like:
The regressed method now can't cse trees like:
with
because codeGen sees that one access the struct
#FirstElem
and the other access its field#FirstElem._ticks
.In this case, it was not dangerous to CSE them in the past, but soon, with struct indirs, we would get a CSE assert from
IsCompatibleType
when the struct is a struct and its field is, for example, long.The another case is that now assertion propogation keeps tree like:
000194
is useless becauseIND
can't fail, because we have000187
, but after the change they have different VN, sooptAssertionIsNonNull
doesn't recognize that. I filled #32248 to track that, I believe there were preexisting cases like this.The commit
ADD(val, 0)
catches some cases revealed by #1735 inCompiler::fgMorphSmpOp(GenTree* tree, MorphAddrContext* mac)
forGT_ADD
, when ourcns1 + ns2 == 0
, but we don't delete that tree there. The whole method could be improved, there is a PR that makes it better #656, probably after that we will have time to refactor and clean up that logic.All these changes are preparing for actual struct indirections and
return struct
,struct call
changes,. From my experience, such changes always cause regressions somewhere, so I would like to merge them in as many separate PRs as possible to be able to find which change caused which regression faster.