Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CSE for floating-point constants #44419

Merged
merged 2 commits into from
Feb 27, 2021

Conversation

briansull
Copy link
Contributor

No description provided.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 9, 2020
{
var_types targetType = tree->TypeGet();
level = 0;
#if defined(TARGET_XARCH)
/* We use fldz and fld1 to load 0.0 and 1.0, but all other */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your change, but this isn't accurate anymore. fldz and fld1 are from the x87 stack and we almost exclusively use SSE/SSE2 now (modulo some ABI things for 32-bit).

Now its that we use xorps for 0 and load values for everything else.

@tannergooding
Copy link
Member

This is just CSE of constants right? What is the behavior for floating-point values that compare equal but whose bitwise values are different (-0.0 vs +0.0) or values who are bitwise equivalent but compare inequal (double.NaN or any of the NaN representations in general)?

@briansull
Copy link
Contributor Author

briansull commented Nov 10, 2020

The ValueNumbers of the two constants have to be the same, We don't use floating point compare to determine if two constant are equal. Value numbering already has to properly handle floating point constants properly.

Tracing through the value number implementation, It eventually uses this function in jithashtable.h

template <typename T>
struct JitLargePrimitiveKeyFuncs : public JitKeyFuncsDefEquals<T>
{
    static unsigned GetHashCode(const T val)
    {
        // A static cast when T is a float or a double converts the value (i.e. 0.25 converts to 0)
        //
        // Instead we want to use all of the bits of a float to create the hash value
        // So we cast the address of val to a pointer to an equivalent sized unsigned int
        // This allows us to read the actual bit representation of a float type
        //

@ViktorHofer
Copy link
Member

// Auto-generated message

69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts.

To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others.

@AndyAyersMS
Copy link
Member

@briansull anything holding this up other than a review?

@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Feb 8, 2021
@JulieLeeMSFT JulieLeeMSFT linked an issue Feb 8, 2021 that may be closed by this pull request
@briansull
Copy link
Contributor Author

This is ready for checkin again:
@dotnet/jit-contrib PTAL
@tannergooding

@briansull
Copy link
Contributor Author

ARM32:

Top file regressions (bytes):
         294 : System.Data.Common.dasm (0.03% of base)
Top file improvements (bytes):
       -3200 : System.Private.CoreLib.dasm (-0.10% of base)
        -496 : Microsoft.VisualBasic.Core.dasm (-0.12% of base)
        -344 : System.Runtime.Numerics.dasm (-0.49% of base)
        -170 : System.Drawing.Common.dasm (-0.05% of base)
         -98 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
         -28 : FSharp.Core.dasm (-0.00% of base)
         -26 : System.Net.Http.dasm (-0.00% of base)
         -20 : System.Private.Xml.dasm (-0.00% of base)
         -18 : System.Drawing.Primitives.dasm (-0.05% of base)
         -14 : Microsoft.CSharp.dasm (-0.00% of base)
         -12 : System.Linq.Parallel.dasm (-0.00% of base)
         -10 : System.Speech.dasm (-0.00% of base)
         -10 : System.Net.Http.WinHttpHandler.dasm (-0.02% of base)
          -6 : Newtonsoft.Json.dasm (-0.00% of base)

ARM64:

Top file regressions (bytes):
          16 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.00% of base)
          12 : System.Runtime.Numerics.dasm (0.01% of base)
           8 : Microsoft.VisualBasic.Core.dasm (0.00% of base)
           8 : System.Speech.dasm (0.00% of base)

Top file improvements (bytes):
        -412 : System.Private.CoreLib.dasm (-0.01% of base)
         -20 : System.Data.Common.dasm (-0.00% of base)
         -12 : System.Drawing.Common.dasm (-0.00% of base)
          -4 : Microsoft.CSharp.dasm (-0.00% of base)

// but fgMorphCastIntoHelper sets the return type wrong
// leading to problems if we try to CSE this call
//
oper->gtType = TYP_DOUBLE; // CORINFO_HELP_LNG2DBL actually returns a TYP_DOUBLE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an issue tracking fixing this so that LGN2DBL properly sets TYP_DOUBLE?

Copy link
Contributor Author

@briansull briansull Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I will change the comment to be more clear here.

For this case the tree's type is TYP_FLOAT and the call to fgMorphCastIntoHelper just leaves the return type as is. It assumes that the caller knows what he is doing. Since this is the code that is deciding to use this helper call and with my change we are now just fixing the return type to be correct.

//
oper->gtType = TYP_DOUBLE; // CORINFO_HELP_LNG2DBL actually returns a TYP_DOUBLE

// Add a Cast to TYP_FLOAT, nop on x86 using FPU stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use the FPU stack anywhere except for returns on 32-bit...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -3485,22 +3495,6 @@ bool Compiler::optIsCSEcandidate(GenTree* tree)
return false;
}

#ifdef TARGET_X86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need diffs for x86/x64 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X64 Asm Diffs:

Top file regressions (bytes):
         239 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.01% of base)
         123 : Microsoft.VisualBasic.Core.dasm (0.03% of base)
          99 : System.Speech.dasm (0.03% of base)
          75 : System.Runtime.Numerics.dasm (0.10% of base)
          46 : System.Drawing.Common.dasm (0.01% of base)
           7 : System.Net.Http.WinHttpHandler.dasm (0.01% of base)
           4 : System.Private.Xml.dasm (0.00% of base)

Top file improvements (bytes):
         -61 : System.Private.CoreLib.dasm (-0.00% of base)
          -4 : System.Data.Common.dasm (-0.00% of base)
          -4 : Microsoft.CSharp.dasm (-0.00% of base)
          -1 : System.Net.Http.dasm (-0.00% of base)

X86 Asm Diffs:

Top file regressions (bytes):
          31 : System.Drawing.Primitives.dasm (0.10% of base)
          24 : Microsoft.VisualBasic.Core.dasm (0.01% of base)
          18 : System.ComponentModel.TypeConverter.dasm (0.01% of base)
          11 : System.Drawing.Common.dasm (0.00% of base)
          11 : System.Linq.Expressions.dasm (0.00% of base)
          10 : Microsoft.CodeAnalysis.CSharp.dasm (0.00% of base)
           4 : System.Private.DataContractSerialization.dasm (0.00% of base)

Top file improvements (bytes):
         -41 : System.Speech.dasm (-0.01% of base)
         -27 : System.Private.CoreLib.dasm (-0.00% of base)
         -13 : System.Data.Common.dasm (-0.00% of base)
          -9 : FSharp.Core.dasm (-0.00% of base)
          -3 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
          -2 : System.Linq.Parallel.dasm (-0.00% of base)

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Update the gtCostEx and gtCostSz values of  float and double constants for Arm and Arm64
Fix the gtCosts for 16-bit ARM32 immediate values
@briansull
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@briansull
Copy link
Contributor Author

All JitStress failures are due to
#48727

@briansull
Copy link
Contributor Author

This is ready for checkin
@dotnet/jit-contrib PTAL

@briansull
Copy link
Contributor Author

Sample Diff with a CSE

image

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run SPMI diffs (on benchmarks, in particular)?

@@ -353,7 +353,29 @@ GenTree* Compiler::fgMorphCast(GenTree* tree)
}
else if (((tree->gtFlags & GTF_UNSIGNED) == 0) && (srcType == TYP_LONG) && varTypeIsFloating(dstType))
{
return fgMorphCastIntoHelper(tree, CORINFO_HELP_LNG2DBL, oper);
oper = fgMorphCastIntoHelper(tree, CORINFO_HELP_LNG2DBL, oper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of this change or some other fix?

Copy link
Contributor Author

@briansull briansull Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required. CSE will assert because without this we can make a CSE with mismatched types (TYP_FLOAT and TYP_DOUBLE)

@@ -485,19 +486,21 @@ unsigned Compiler::optValnumCSE_Index(GenTree* tree, Statement* stmt)
assert(vnStore->IsVNConstant(vnLibNorm));
key = vnStore->CoercedConstantValue<size_t>(vnLibNorm);

// We don't shared small offset constants when we require a reloc
// We don't share small offset constants when we require a reloc
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain why not? Is it a legality thing or a profitability thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be incorrect to CSE,,
You can't eliminate a reloc using a CSE and adding an offset to a different constant.

@briansull
Copy link
Contributor Author

briansull commented Feb 26, 2021

Here is a diff from the SPMI Benchmarks where we hoist a Load of a double constant out of a loop using a CSE

image

@AndyAyersMS
Copy link
Member

How many diffs are there in benchmarks?

Can you share out the details so we can cross-correlate with perf improvements we might see in the lab?

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

Would still like to see a benchmarks diff summary. Presumably you have the data lying around already?

@briansull
Copy link
Contributor Author

PerfScore Benchmark Improvements:

Top method improvements (percentages):
       -4.30 (-16.20% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - <>c:<.cctor>b__3_4(Vector):Color:this
       -4.30 (-16.20% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - <>c:<.cctor>b__3_7(Vector):Color:this
      -76.85 (-9.26% of base) : Bytemark\Bytemark\Bytemark.dasm -Fourier:DoFPUTransIteration(System.Double[],System.Double[],int):long
     -142.40 (-7.43% of base) : Bytemark\Bytemark\Bytemark.dasm - NeuralJagged:randomize_wts()
       -7.40 (-7.30% of base) : Benchstones\BenchF\NewtR\NewtR\NewtR.dasm - Benchstone.BenchF.NewtR:Bench():bool
     -142.00 (-6.66% of base) : Bytemark\Bytemark\Bytemark.dasm - Neural:randomize_wts()
       -7.40 (-6.01% of base) : Benchstones\BenchF\Secant\Secant\Secant.dasm - Benchstone.BenchF.Secant:Bench():bool
       -4.30 (-5.95% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - Color:Brightness():float:this
       -5.80 (-5.84% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:SqrtDoubleTest()
      -77.80 (-5.07% of base) : Benchstones\BenchF\Lorenz\Lorenz\Lorenz.dasm - Benchstone.BenchF.Lorenz:Bench():bool
     -243.40 (-5.04% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm -Algorithms.ScalarFloatRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
       -4.90 (-5.03% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:AbsSingleTest()
       -4.90 (-5.03% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:AbsDoubleTest()
     -243.00 (-5.02% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.ScalarFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
     -243.40 (-5.02% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.ScalarDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
     -243.00 (-5.00% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.ScalarDoubleRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
      -13.47 (-4.95% of base) : Benchstones\BenchF\Bisect\Bisect\Bisect.dasm - Benchstone.BenchF.Bisect:Inner(byref,byref,byref,byref)
      -33.80 (-4.80% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - Color:ChangeHue(float):this
      -27.30 (-4.65% of base) : Benchstones\BenchF\NewtE\NewtE\NewtE.dasm - Benchstone.BenchF.NewtE:Bench():bool
       -5.80 (-4.42% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:SqrtSingleTest()
     -107.80 (-4.24% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedWithADT>b__0(int):this (2 methods)
     -107.40 (-4.23% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass5_0:<RenderMultiThreadedNoADT>b__0(int):this (2 methods)
       -4.90 (-4.18% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:RoundSingleTest()
       -4.90 (-4.18% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:CeilingSingleTest()
       -4.90 (-4.18% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:FloorSingleTest()
       -4.90 (-4.18% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:CeilingDoubleTest()
       -4.90 (-4.18% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:FloorDoubleTest()
       -4.90 (-4.18% of base) : Math\Functions\Functions\Functions.dasm - Functions.MathTests:RoundDoubleTest()
       -6.30 (-3.53% of base) : Benchstones\BenchF\Bisect\Bisect\Bisect.dasm - Benchstone.BenchF.Bisect:Bench():bool
     -213.57 (-3.40% of base) : Benchstones\BenchF\DMath\DMath\DMath.dasm - Benchstone.BenchF.DMath:Bench(int):bool
       -1.90 (-3.34% of base) : SciMark\SciMark\SciMark.dasm - SciMark2.FFT:num_flops(int):double
      -18.25 (-2.67% of base) : Burgers\Burgers\Burgers.dasm - Burgers:Main():int
      -31.65 (-2.13% of base) : Burgers\Burgers\Burgers.dasm - Burgers:GetCalculated2(int,int,double,double,double,System.Double[]):System.Double[]
      -31.25 (-2.13% of base) : Burgers\Burgers\Burgers.dasm - Burgers:GetCalculated0(int,int,double,double,double,System.Double[]):System.Double[]
      -31.25 (-2.10% of base) : Burgers\Burgers\Burgers.dasm - Burgers:GetCalculated1(int,int,double,double,double,System.Double[]):System.Double[]
       -1.90 (-1.71% of base) : FractalPerf\FractalPerf\FractalPerf.dasm - FractalPerf.Fractal:.ctor(double,double,double,double):this
      -79.60 (-1.50% of base) : Benchstones\BenchF\LLoops\LLoops\LLoops.dasm - Benchstone.BenchF.LLoops:Init():this
       -3.10 (-1.38% of base) : SciMark\SciMark\SciMark.dasm - SciMark2.kernel:measureSOR(int,double,SciMark2.Random):double
      -49.30 (-1.19% of base) : Bytemark\Bytemark\Bytemark.dasm - NeuralJagged:read_data_file():this
       -5.10 (-1.16% of base) : Bytemark\Bytemark\Bytemark.dasm - Fourier:Run():double:this
      -49.30 (-0.92% of base) : Bytemark\Bytemark\Bytemark.dasm - Neural:read_data_file():this
      -43.20 (-0.85% of base) : BenchmarksGame\mandelbrot\mandelbrot-2\mandelbrot-2.dasm - BenchmarksGame.Mandelbrot_2:DoBench(int,System.IO.MemoryStream,bool)
      -59.70 (-0.85% of base) : Bytemark\Bytemark\Bytemark.dasm - Huffman:DoHuffIteration(System.Byte[],System.Byte[],System.Byte[],int,int,huff_node[]):long
       -2.55 (-0.84% of base) : SIMD\RayTracer\RayTracer\RayTracer.dasm - RayTracer:Shade(ISect,Scene,int):Color:this
       -1.40 (-0.69% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.VectorDoubleStrictRenderer:RenderMultiThreadedWithADT(float,float,float,float,float):this
       -1.40 (-0.69% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.VectorDoubleStrictRenderer:RenderMultiThreadedNoADT(float,float,float,float,float):this
       -1.40 (-0.69% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.VectorDoubleRenderer:RenderMultiThreadedNoADT(float,float,float,float,float):this
       -1.40 (-0.69% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.VectorDoubleRenderer:RenderMultiThreadedWithADT(float,float,float,float,float):this
      -28.13 (-0.56% of base) : Burgers\Burgers\Burgers.dasm - Burgers:GetCalculated3(int,int,double,double,double,System.Double[]):System.Double[]
       -8.60 (-0.54% of base) : Benchstones\BenchF\Whetsto\Whetsto\Whetsto.dasm - Benchstone.BenchF.Whetsto:Bench():bool
       -2.30 (-0.48% of base) : Benchstones\BenchF\Regula\Regula\Regula.dasm - Benchstone.BenchF.Regula:Inner(byref,byref,double,double,int,byref)
       -5.50 (-0.41% of base) : BenchmarksGame\n-body\n-body-3\n-body-3.dasm - BenchmarksGame.NBodySystem:Energy():double:this
       -4.50 (-0.36% of base) : Bytemark\Bytemark\Bytemark.dasm - ByteMark:bench_with_confidence(int,byref,byref,byref):int
      -53.70 (-0.34% of base) : Benchstones\BenchF\FFT\FFT\FFT.dasm - Benchstone.BenchF.FFT:FastFourierT(System.Double[],System.Double[],int)
       -1.40 (-0.02% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.VectorDoubleRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this
       -1.40 (-0.02% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - Algorithms.VectorDoubleStrictRenderer:RenderSingleThreadedNoADT(float,float,float,float,float):this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Allow GT_CNS_DBL to be a CSE candidate
6 participants