Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Remove TYP_BLK and TYP_LCLBLK #83036

Merged
merged 21 commits into from
Mar 9, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Mar 6, 2023

This PR allows TYP_STRUCT locals to have block layouts and replaces uses
of TYP_BLK and TYP_LCLBLK with such locals instead.

There is still an invariant that any struct parameter local (even SIMD) has a non-block layout.

This is a precursor to #83005. I hit several cases where we are unable to create a proper local for a tree we wish to extract. This unification should greatly simplify those cases.

@ghost ghost assigned jakobbotsch Mar 6, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 6, 2023
@ghost
Copy link

ghost commented Mar 6, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR allows TYP_STRUCT locals to have block layouts and replaces uses
of TYP_BLK and TYP_LCLBLK with such locals instead.

This is a precursor to #83005. I hit several cases where we are unable to create a proper local for a tree we wish to extract. This unification should greatly simplify those cases.

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch changed the title Jit: Remove TYP_BLK and TYP_LCLBLK JIT: Remove TYP_BLK and TYP_LCLBLK Mar 6, 2023
@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

Need to look at TP regressions. One thing I noticed when I was debugging a bug before is that the change causes us to sometimes newly set compFloatingPointUsed if we have a new 16-byte sized struct local.

@jakobbotsch
Copy link
Member Author

@TIHan I have a SuperFileCheck error now because

Is matching the "add" in "addr-exposed" here:

;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+00H] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"

Can we do something to make this less fragile? E.g. make sure SuperFileCheck starts matching from the actual instructions.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Mar 7, 2023

@TIHan I added def1b9f on top of this PR, does that look ok?

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@@ -988,7 +988,7 @@ class LocalAddressVisitor final : public GenTreeVisitor<LocalAddressVisitor>
{
isWide = endOffset.Value() > m_compiler->lvaLclExactSize(lclNum);

if (varDsc->TypeGet() == TYP_BLK)
if ((varDsc->TypeGet() == TYP_STRUCT) && varDsc->GetLayout()->IsBlockLayout())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SingleAccretion Do you think removing this requires any downstream changes now that these are TYP_STRUCT?
I'll probably try that in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think removing this requires any downstream changes now that these are TYP_STRUCT?

I wouldn't expect any.

@TIHan
Copy link
Contributor

TIHan commented Mar 7, 2023

I added def1b9f on top of this PR, does that look ok?

I think that looks fine. In the future, I think I would like to have an output that looks like what Disasmo does.

@jakobbotsch
Copy link
Member Author

A large part of the linux-x64 and win-x64 MinOpts regressions are more instances of MSVC emitting more instructions due to changed field offsets/struct sizes. For example, in LinearScan::processBlockStartLocations we have the following diff that is causing some of the TP regression:

  mov     ecx, r8d
- lea     rdx, [rcx+rcx*2]
- add     rdx, rdx
- mov     rax, [rsi+rdx*8+130h]
+ lea     rdx, [rcx+6]
+ lea     rdx, [rdx+rdx*2]
+ add     rdx, rdx
+ mov     rax, [rsi+rdx*8]
  mov     dword ptr [rsi+rcx*4+0E6Ch], 0FFFFFFFFh

In the base the code needs to compute rsi+0x130+rcx*0x30, in the diff we need to compute rsi+0x120+rcx*0x30 and MSVC factors the latter as rsi+(rcx+6)*0x30, which requires one less 4-byte displacement but one more instruction.

If I just readd the types back to typelist.h, but change nothing else, then the TP diff for linux-x64 looks like the following. Still a bit to investigate here.

Overall (-0.10% to -0.06%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.10%
coreclr_tests.run.linux.x64.checked.mch -0.10%
libraries.crossgen2.linux.x64.checked.mch -0.06%
libraries.pmi.linux.x64.checked.mch -0.08%
libraries_tests.pmi.linux.x64.checked.mch -0.09%
MinOpts (-0.08% to +0.10%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.07%
coreclr_tests.run.linux.x64.checked.mch -0.08%
libraries.crossgen2.linux.x64.checked.mch +0.10%
libraries.pmi.linux.x64.checked.mch +0.03%
libraries_tests.pmi.linux.x64.checked.mch -0.02%
FullOpts (-0.12% to -0.06%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.10%
coreclr_tests.run.linux.x64.checked.mch -0.12%
libraries.crossgen2.linux.x64.checked.mch -0.06%
libraries.pmi.linux.x64.checked.mch -0.08%
libraries_tests.pmi.linux.x64.checked.mch -0.09%
Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 69,965,659,855 69,897,813,135 -0.10%
coreclr_tests.run.linux.x64.checked.mch 870,059,317,941 869,156,156,323 -0.10%
libraries.crossgen2.linux.x64.checked.mch 92,827,519,641 92,767,420,531 -0.06%
libraries.pmi.linux.x64.checked.mch 223,266,554,539 223,087,466,673 -0.08%
libraries_tests.pmi.linux.x64.checked.mch 512,049,243,371 511,580,122,471 -0.09%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 931,863,189 932,503,923 +0.07%
coreclr_tests.run.linux.x64.checked.mch 372,458,454,974 372,167,038,865 -0.08%
libraries.crossgen2.linux.x64.checked.mch 1,455,816 1,457,266 +0.10%
libraries.pmi.linux.x64.checked.mch 1,323,016,103 1,323,368,558 +0.03%
libraries_tests.pmi.linux.x64.checked.mch 6,266,291,782 6,264,893,799 -0.02%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 69,033,796,666 68,965,309,212 -0.10%
coreclr_tests.run.linux.x64.checked.mch 497,600,862,967 496,989,117,458 -0.12%
libraries.crossgen2.linux.x64.checked.mch 92,826,063,825 92,765,963,265 -0.06%
libraries.pmi.linux.x64.checked.mch 221,943,538,436 221,764,098,115 -0.08%
libraries_tests.pmi.linux.x64.checked.mch 505,782,951,589 505,315,228,672 -0.09%

@jakobbotsch jakobbotsch force-pushed the remove-TYP_BLK-TYP_LCLBLK branch from de8729d to ee37405 Compare March 8, 2023 12:28
@jakobbotsch
Copy link
Member Author

To get rid of the last tier-0 TP regressions I have added a fast-path for the 0-sized block layout in ClassLayoutTable. This layout is used in all non-x86 compilations now because we always allocate a local for the outgoing arg area, and it starts out being 0 sized. The TP diffs now when the old types are kept in typelist.h (to avoid the issue described above is):

linux-x64

Overall (-0.11% to -0.09%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.11%
coreclr_tests.run.linux.x64.checked.mch -0.11%
libraries.crossgen2.linux.x64.checked.mch -0.09%
libraries.pmi.linux.x64.checked.mch -0.10%
libraries_tests.pmi.linux.x64.checked.mch -0.10%
MinOpts (-0.09% to +0.01%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.01%
coreclr_tests.run.linux.x64.checked.mch -0.09%
libraries.pmi.linux.x64.checked.mch -0.01%
libraries_tests.pmi.linux.x64.checked.mch -0.04%
FullOpts (-0.13% to -0.09%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.11%
coreclr_tests.run.linux.x64.checked.mch -0.13%
libraries.crossgen2.linux.x64.checked.mch -0.09%
libraries.pmi.linux.x64.checked.mch -0.10%
libraries_tests.pmi.linux.x64.checked.mch -0.10%
Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 69,966,074,409 69,891,791,239 -0.11%
coreclr_tests.run.linux.x64.checked.mch 870,062,049,661 869,075,076,696 -0.11%
libraries.crossgen2.linux.x64.checked.mch 92,826,908,435 92,746,736,476 -0.09%
libraries.pmi.linux.x64.checked.mch 223,266,647,360 223,052,613,939 -0.10%
libraries_tests.pmi.linux.x64.checked.mch 512,050,738,854 511,525,077,882 -0.10%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 931,862,723 931,941,927 +0.01%
coreclr_tests.run.linux.x64.checked.mch 372,458,577,223 372,120,977,615 -0.09%
libraries.crossgen2.linux.x64.checked.mch 1,455,812 1,455,839 +0.00%
libraries.pmi.linux.x64.checked.mch 1,323,016,237 1,322,875,530 -0.01%
libraries_tests.pmi.linux.x64.checked.mch 6,266,294,731 6,263,864,779 -0.04%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
benchmarks.run.linux.x64.checked.mch 69,034,211,686 68,959,849,312 -0.11%
coreclr_tests.run.linux.x64.checked.mch 497,603,472,438 496,954,099,081 -0.13%
libraries.crossgen2.linux.x64.checked.mch 92,825,452,623 92,745,280,637 -0.09%
libraries.pmi.linux.x64.checked.mch 221,943,631,123 221,729,738,409 -0.10%
libraries_tests.pmi.linux.x64.checked.mch 505,784,444,123 505,261,213,103 -0.10%

win-x64

Overall (-0.10% to -0.07%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.09%
aspnet_block.run.windows.x64.checked.mch -0.08%
benchmarks.run.windows.x64.checked.mch -0.10%
coreclr_tests.run.windows.x64.checked.mch -0.10%
libraries.crossgen2.windows.x64.checked.mch -0.07%
libraries.pmi.windows.x64.checked.mch -0.09%
libraries_tests.pmi.windows.x64.checked.mch -0.10%
MinOpts (-0.08% to +0.09%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.03%
aspnet_block.run.windows.x64.checked.mch -0.04%
benchmarks.run.windows.x64.checked.mch +0.02%
coreclr_tests.run.windows.x64.checked.mch -0.08%
libraries.crossgen2.windows.x64.checked.mch +0.09%
libraries.pmi.windows.x64.checked.mch +0.03%
libraries_tests.pmi.windows.x64.checked.mch -0.01%
FullOpts (-0.12% to -0.07%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.10%
aspnet_block.run.windows.x64.checked.mch -0.10%
benchmarks.run.windows.x64.checked.mch -0.10%
coreclr_tests.run.windows.x64.checked.mch -0.12%
libraries.crossgen2.windows.x64.checked.mch -0.07%
libraries.pmi.windows.x64.checked.mch -0.09%
libraries_tests.pmi.windows.x64.checked.mch -0.10%
Details

All contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 140,787,762,012 140,667,277,838 -0.09%
aspnet_block.run.windows.x64.checked.mch 28,215,789,206 28,192,078,468 -0.08%
benchmarks.run.windows.x64.checked.mch 55,787,662,362 55,731,376,584 -0.10%
coreclr_tests.run.windows.x64.checked.mch 821,076,469,899 820,228,225,485 -0.10%
libraries.crossgen2.windows.x64.checked.mch 123,851,236,737 123,761,636,764 -0.07%
libraries.pmi.windows.x64.checked.mch 234,465,680,122 234,259,307,860 -0.09%
libraries_tests.pmi.windows.x64.checked.mch 507,084,866,061 506,593,442,035 -0.10%

MinOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 27,529,625,490 27,521,424,725 -0.03%
aspnet_block.run.windows.x64.checked.mch 6,648,603,275 6,645,824,165 -0.04%
benchmarks.run.windows.x64.checked.mch 427,797,388 427,867,047 +0.02%
coreclr_tests.run.windows.x64.checked.mch 367,078,708,881 366,791,259,496 -0.08%
libraries.crossgen2.windows.x64.checked.mch 1,712,892 1,714,351 +0.09%
libraries.pmi.windows.x64.checked.mch 1,337,808,295 1,338,145,808 +0.03%
libraries_tests.pmi.windows.x64.checked.mch 5,371,179,311 5,370,410,143 -0.01%

FullOpts contexts:

Collection Base # instructions Diff # instructions PDIFF
aspnet.run.windows.x64.checked.mch 113,258,136,522 113,145,853,113 -0.10%
aspnet_block.run.windows.x64.checked.mch 21,567,185,931 21,546,254,303 -0.10%
benchmarks.run.windows.x64.checked.mch 55,359,864,974 55,303,509,537 -0.10%
coreclr_tests.run.windows.x64.checked.mch 453,997,761,018 453,436,965,989 -0.12%
libraries.crossgen2.windows.x64.checked.mch 123,849,523,845 123,759,922,413 -0.07%
libraries.pmi.windows.x64.checked.mch 233,127,871,827 232,921,162,052 -0.09%
libraries_tests.pmi.windows.x64.checked.mch 501,713,686,750 501,223,031,892 -0.10%

Note that libraries.crossgen2 has only 15 MinOpts contexts, so I don't think it is a representative sample.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch jakobbotsch marked this pull request as ready for review March 8, 2023 16:52
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Mar 8, 2023

The fuzzlyn failures repro on main too (one of them is #83140, one of the others look like a similar related issue, the last one doesn't). I'll open issues for them.

Diffs. See #83036 (comment) for the explanation on the linux-x64 and win-x64 MinOpts TP regressions.

cc @dotnet/jit-contrib PTAL @SingleAccretion @EgorBo @AndyAyersMS

Comment on lines +7060 to +7066
// JIT32 encoder cannot handle GS cookie at fp+0 since NO_GS_COOKIE == 0.
// Add some padding if it is the last allocated local.
if ((lvaGSSecurityCookie != BAD_VAR_NUM) && (lvaGetDesc(lvaGSSecurityCookie)->GetStackOffset() == stkOffs))
{
lvaIncrementFrameSize(TARGET_POINTER_SIZE);
stkOffs -= TARGET_POINTER_SIZE;
}
Copy link
Member Author

@jakobbotsch jakobbotsch Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit this in some cases where we create a local for a small constant-sized stackalloc. We emit GS cookie check for those, but now that the stackalloc local is a TYP_STRUCT we sometimes are able to eliminate it and not allocate any stack space for it, causing us to end up with the GS cookie at fp+0.

Comment on lines -323 to -329
#if FEATURE_FIXED_OUT_ARGS

/* Is this the dummy variable representing GT_LCLBLK ? */
needSlot |= (lclNum == lvaOutgoingArgSpaceVar);

#endif // FEATURE_FIXED_OUT_ARGS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer necessary since lvaOutgoingArgSpaceVar is a normal address-exposed struct local now and handled above.

Comment on lines -9053 to -9064
case TYP_BYREF:
if (isZeroed)
{
// LclVars of TYP_BYREF can be zero-inited.
initVal = vnStore->VNForByrefCon(0);
}
else
{
// Here we have uninitialized TYP_BYREF
initVal = vnStore->VNForFunc(typ, VNF_InitVal, vnStore->VNForIntCon(lclNum));
}
break;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling for this case seemed to be identical with the default case below, so we can remove the entire switch in favor of the default case.

Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change!

src/coreclr/jit/layout.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lclvars.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Mar 9, 2023

The GC stress failures are eventpipe test timeouts that are preexisting.

@jakobbotsch jakobbotsch merged commit 5c7e6d6 into dotnet:main Mar 9, 2023
@jakobbotsch jakobbotsch deleted the remove-TYP_BLK-TYP_LCLBLK branch March 9, 2023 13:09
@ghost ghost locked as resolved and limited conversation to collaborators Apr 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants