-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant folding for RuntimeInformation.IsOSPlatform(OSPlatform) #83308
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsPartially fixes #738 where this API was acknowledged as not JIT friendly which I took personally 😄 This PR is based on previous PRs to make this happen, namely #78736 and #80431 static bool IsLinux() => RuntimeInformation.IsOSPlatform(OSPlatform.Linux); Old codegen (pre .NET 8.0): ; Assembly listing for method Program:IsLinux():bool
4883EC28 sub rsp, 40
FF15FE642200 call [System.Runtime.InteropServices.OSPlatform:get_Linux()]
488BC8 mov rcx, rax
FF1545A02300 call [System.OperatingSystem:IsOSPlatform(System.String):bool]
90 nop
4883C428 add rsp, 40
C3 ret
; Total bytes of code 25 New codegen (I'm on Windows): ; Assembly listing for method Program:IsLinux():bool
33C0 xor eax, eax
C3 ret
; Total bytes of code 3 Unfortunately, this only works if the operating system is different from current, otherwise the codegen is: ; Assembly listing for method Program:IsLinux():bool
48B85C9272B42C020000 mov rax, 0x22CB472925C
48BA2000200020002000 mov rdx, 0x20002000200020
480B10 or rdx, qword ptr [rax]
48B8770069006E006400 mov rax, 0x64006E00690077
4833C2 xor rax, rdx
48BA629272B42C020000 mov rdx, 0x22CB4729262
48B92000200020002000 mov rcx, 0x20002000200020
480B0A or rcx, qword ptr [rdx]
48BA64006F0077007300 mov rdx, 0x730077006F0064
4833D1 xor rdx, rcx
480BC2 or rax, rdx
0F94C0 sete al
0FB6C0 movzx rax, al
C3 ret
; Total bytes of code 82 It happens because of some phase ordering issue in JIT, namely JIT unrolls Also, this only works for Tier1, so it won't be folded for TC=0/AggressiveOptimization and crossgen/NativeAOT (NativeAOT needs #83043 for that) We recommend users the new APIs we introduced e.g. UPD: there are some improvements from using
|
A couple of diffs The main change is to use GT_ARR_LEN for string length during unrolling. That is then folded to a constant in VN after we fold Ideally,
(for immutable objets) @dotnet/jit-contrib PTAL, simple change. |
src/coreclr/jit/morph.cpp
Outdated
BasicBlock* remainderBlock = fgSplitBlockAfterStatement(block, stmt); | ||
BasicBlockFlags propagateFlagsToRemainder = block->bbFlags & BBF_GC_SAFE_POINT; | ||
// Conservatively mark all blocks as having IDX_LEN since we don't know where exactly it went | ||
BasicBlockFlags propagateFlagsToAll = block->bbFlags & BBF_HAS_IDX_LEN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use BBF_COPY_PROPAGATE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use
BBF_COPY_PROPAGATE
That will decrease throughput at this moment for no benefit - e.g. we'll mark 4 blocks with "has nullcheck" and will have to visit them all in earlyprop. Ideally, QmarkGenTree should store BB flags it needs to propagate IMO. I propose we add flags on demand here since we use QMARK only in a few specific cases (none of them need to propagate flags atm except string-unrolling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How bad is it?
If that's the case I think we should have comment about it. Also, can you add asserts that check for GT_NULLCHECK
, GT_MDARR_LENGTH
, GT_ALLOCOBJ
at least then (can use gtTreeContainsOper
for this)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add asserts that check for GT_NULLCHECK, GT_MDARR_LENGTH, GT_ALLOCOBJ at least then (can use gtTreeContainsOper for this)?
It will assert anyway if we have a block without e.g. HAS_NULLCHECK with an actuall nullcheck, so no need in an extra assert - if someone adds a new kind of qmark with a need to propagate a new block flag (prior my PR it was never needed) they will hit it. The extra regression from BBF_COPY_PROPAGATE
is another +0.02% or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will leave it up to you -- although I don't think 0.02% is that much to have this be always correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not urgent, let's have a CI run with BBF_COPY_PROPAGATE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one suggestion
Partially fixes #738 where this API was acknowledged as not JIT friendly which I took personally 😄 This PR is based on previous PRs to make this happen, namely #78736 and #80431
Old codegen (pre .NET 8.0):
New codegen (I'm on Windows):
Unfortunately, this only works if the operating system is different from current, otherwise the codegen is:
It happens because of some phase ordering issue in JIT, namely JIT unrolls
string.Equals("Windows", staticReadonlyField)
early in importer and then is unable to fold whatever vectorizer produced -- it can be done actually, but it needs a new JIT-EE API - I might add it separately if it's worth itAlso, this only works for Tier1, so it won't be folded for TC=0/AggressiveOptimization and crossgen/NativeAOT (NativeAOT needs #83043 for that)
We recommend users the new APIs we introduced e.g.
OperatingSystem.IsWindows
which don't need any JIT help, but it seems we still have plenty of users of the old API, see https://grep.app/search?q=IsOSPlatform&words=true&filter[lang][0]=C%23UPD: there are some improvements from using
GT_ARR_LEN
instead ofGT_IND
inimportervectorization.cpp
for string's length. It's needed so VN can foldGT_ARR_LEN(FROZEN_OBJ)
later.