-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable CSE for floating-point constants #44419
Conversation
src/coreclr/src/jit/gentree.cpp
Outdated
{ | ||
var_types targetType = tree->TypeGet(); | ||
level = 0; | ||
#if defined(TARGET_XARCH) | ||
/* We use fldz and fld1 to load 0.0 and 1.0, but all other */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to your change, but this isn't accurate anymore. fldz
and fld1
are from the x87 stack and we almost exclusively use SSE/SSE2 now (modulo some ABI things for 32-bit).
Now its that we use xorps
for 0
and load values for everything else.
This is just CSE of constants right? What is the behavior for floating-point values that compare equal but whose bitwise values are different ( |
The ValueNumbers of the two constants have to be the same, We don't use floating point compare to determine if two constant are equal. Value numbering already has to properly handle floating point constants properly. Tracing through the value number implementation, It eventually uses this function in jithashtable.h
|
8c088cd
to
f45d38b
Compare
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
f45d38b
to
2dbc6d2
Compare
@briansull anything holding this up other than a review? |
2dbc6d2
to
b62e1fa
Compare
This is ready for checkin again: |
ARM32:
ARM64:
|
src/coreclr/jit/morph.cpp
Outdated
// but fgMorphCastIntoHelper sets the return type wrong | ||
// leading to problems if we try to CSE this call | ||
// | ||
oper->gtType = TYP_DOUBLE; // CORINFO_HELP_LNG2DBL actually returns a TYP_DOUBLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an issue tracking fixing this so that LGN2DBL
properly sets TYP_DOUBLE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I will change the comment to be more clear here.
For this case the tree's type is TYP_FLOAT and the call to fgMorphCastIntoHelper
just leaves the return type as is. It assumes that the caller knows what he is doing. Since this is the code that is deciding to use this helper call and with my change we are now just fixing the return type to be correct.
src/coreclr/jit/morph.cpp
Outdated
// | ||
oper->gtType = TYP_DOUBLE; // CORINFO_HELP_LNG2DBL actually returns a TYP_DOUBLE | ||
|
||
// Add a Cast to TYP_FLOAT, nop on x86 using FPU stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we use the FPU stack anywhere except for returns on 32-bit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -3485,22 +3495,6 @@ bool Compiler::optIsCSEcandidate(GenTree* tree) | |||
return false; | |||
} | |||
|
|||
#ifdef TARGET_X86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need diffs for x86/x64 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X64 Asm Diffs:
Top file regressions (bytes):
239 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.01% of base)
123 : Microsoft.VisualBasic.Core.dasm (0.03% of base)
99 : System.Speech.dasm (0.03% of base)
75 : System.Runtime.Numerics.dasm (0.10% of base)
46 : System.Drawing.Common.dasm (0.01% of base)
7 : System.Net.Http.WinHttpHandler.dasm (0.01% of base)
4 : System.Private.Xml.dasm (0.00% of base)
Top file improvements (bytes):
-61 : System.Private.CoreLib.dasm (-0.00% of base)
-4 : System.Data.Common.dasm (-0.00% of base)
-4 : Microsoft.CSharp.dasm (-0.00% of base)
-1 : System.Net.Http.dasm (-0.00% of base)
X86 Asm Diffs:
Top file regressions (bytes):
31 : System.Drawing.Primitives.dasm (0.10% of base)
24 : Microsoft.VisualBasic.Core.dasm (0.01% of base)
18 : System.ComponentModel.TypeConverter.dasm (0.01% of base)
11 : System.Drawing.Common.dasm (0.00% of base)
11 : System.Linq.Expressions.dasm (0.00% of base)
10 : Microsoft.CodeAnalysis.CSharp.dasm (0.00% of base)
4 : System.Private.DataContractSerialization.dasm (0.00% of base)
Top file improvements (bytes):
-41 : System.Speech.dasm (-0.01% of base)
-27 : System.Private.CoreLib.dasm (-0.00% of base)
-13 : System.Data.Common.dasm (-0.00% of base)
-9 : FSharp.Core.dasm (-0.00% of base)
-3 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
-2 : System.Linq.Parallel.dasm (-0.00% of base)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Update the gtCostEx and gtCostSz values of float and double constants for Arm and Arm64 Fix the gtCosts for 16-bit ARM32 immediate values
b62e1fa
to
f101bb5
Compare
/azp run runtime-coreclr jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
All JitStress failures are due to |
This is ready for checkin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run SPMI diffs (on benchmarks, in particular)?
@@ -353,7 +353,29 @@ GenTree* Compiler::fgMorphCast(GenTree* tree) | |||
} | |||
else if (((tree->gtFlags & GTF_UNSIGNED) == 0) && (srcType == TYP_LONG) && varTypeIsFloating(dstType)) | |||
{ | |||
return fgMorphCastIntoHelper(tree, CORINFO_HELP_LNG2DBL, oper); | |||
oper = fgMorphCastIntoHelper(tree, CORINFO_HELP_LNG2DBL, oper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of this change or some other fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is required. CSE will assert because without this we can make a CSE with mismatched types (TYP_FLOAT and TYP_DOUBLE)
@@ -485,19 +486,21 @@ unsigned Compiler::optValnumCSE_Index(GenTree* tree, Statement* stmt) | |||
assert(vnStore->IsVNConstant(vnLibNorm)); | |||
key = vnStore->CoercedConstantValue<size_t>(vnLibNorm); | |||
|
|||
// We don't shared small offset constants when we require a reloc | |||
// We don't share small offset constants when we require a reloc | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain why not? Is it a legality thing or a profitability thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be incorrect to CSE,,
You can't eliminate a reloc using a CSE and adding an offset to a different constant.
How many diffs are there in benchmarks? Can you share out the details so we can cross-correlate with perf improvements we might see in the lab? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good.
Would still like to see a benchmarks diff summary. Presumably you have the data lying around already?
PerfScore Benchmark Improvements:
|
No description provided.