Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Outdated] Implementation of CSE for GT_CNS_INT benefits ARM64. #36831

Closed
wants to merge 30 commits into from

Conversation

briansull
Copy link
Contributor

No description provided.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 21, 2020
@briansull
Copy link
Contributor Author

briansull commented May 21, 2020

Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 protononjit.dll
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -590112 (-1.13% of base)
    diff is an improvement.
Top file improvements (bytes):
     -329316 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-6.71% of base)
      -63604 : Microsoft.CodeAnalysis.VisualBasic.dasm (-1.04% of base)
      -59336 : Microsoft.CodeAnalysis.CSharp.dasm (-1.05% of base)
      -13328 : System.Private.Xml.dasm (-0.30% of base)
      -12288 : System.Data.Common.dasm (-0.77% of base)
      -11120 : System.Private.DataContractSerialization.dasm (-1.04% of base)
      -10736 : System.Private.CoreLib.dasm (-0.20% of base)
       -8736 : System.DirectoryServices.dasm (-1.54% of base)
       -7976 : System.Text.Json.dasm (-1.33% of base)
       -7352 : System.Linq.Parallel.dasm (-1.02% of base)
       -6264 : System.Management.dasm (-1.28% of base)
       -4372 : System.Data.OleDb.dasm (-1.10% of base)
       -3532 : System.Net.Http.dasm (-0.43% of base)
       -3348 : System.Data.Odbc.dasm (-1.24% of base)
       -2996 : Microsoft.CodeAnalysis.dasm (-0.15% of base)
       -2996 : System.Text.RegularExpressions.dasm (-1.12% of base)
       -2812 : System.Collections.Immutable.dasm (-0.60% of base)
       -2744 : System.ComponentModel.TypeConverter.dasm (-0.80% of base)
       -2700 : System.DirectoryServices.AccountManagement.dasm (-0.72% of base)
       -2508 : System.Net.Security.dasm (-0.79% of base)
       -2344 : System.Reflection.MetadataLoadContext.dasm (-0.82% of base)
       -2204 : Newtonsoft.Json.dasm (-0.24% of base)
       -2004 : System.IO.Compression.dasm (-1.82% of base)

Top file regressions (bytes):
        1444 : System.Linq.dasm (0.42% of base)
        1388 : System.ComponentModel.Composition.dasm (0.41% of base)
        1324 : System.CodeDom.dasm (0.54% of base)
        1272 : System.Security.Cryptography.Xml.dasm (0.61% of base)

Top method improvements (percentages):
         -72 (-33.33% of base) : System.Private.DataContractSerialization.dasm - System.Runtime.Serialization.ExtensionDataReader:Reset():this
         -88 (-30.56% of base) : System.Private.DataContractSerialization.dasm - System.Runtime.Serialization.Json.XmlJsonWriter:InitializeWriter():this
         -72 (-28.57% of base) : System.Net.Requests.dasm - System.Net.CommandStream:InitCommandPipeline(System.Net.WebRequest,System.Net.CommandStream+PipelineEntry[],bool):this
        -164 (-27.89% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceProcesses:FastSerialization.IFastSerializable.ToStream(FastSerialization.Serializer):this
         -48 (-27.27% of base) : System.Windows.Extensions.dasm - System.Media.SoundPlayer:CleanupStreamData():this
        -128 (-26.89% of base) : System.Private.CoreLib.dasm - System.HashCode:Combine(int,int,int,int,int,int,int,int):int
         -72 (-26.47% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlBaseWriter:SetOutput(System.Xml.XmlStreamNodeWriter):this
         -56 (-26.42% of base) : System.IO.Pipes.dasm - System.IO.Pipes.PipeStream:Init(int,int,int):this
         -48 (-26.09% of base) : System.IO.FileSystem.dasm - System.IO.FileSystemInfo:Init(long):this
         -32 (-25.81% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlBaseReader:MoveToNode(XmlNode):this
         -84 (-25.61% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.UnhandledTraceEvent:.ctor():this
         -56 (-25.45% of base) : System.Security.Cryptography.X509Certificates.dasm - System.Security.Cryptography.X509Certificates.X509Certificate2:Reset():this
         -24 (-25.00% of base) : System.DirectoryServices.AccountManagement.dasm - System.DirectoryServices.AccountManagement.RangeRetriever:Reset():this
         -24 (-25.00% of base) : System.Private.Xml.Linq.dasm - System.Xml.Linq.XNodeReader:Close():this
         -24 (-25.00% of base) : System.ServiceProcess.ServiceController.dasm - System.ServiceProcess.ServiceController:Refresh():this
         -48 (-24.49% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLoadedModules:CheckClassInvarients():this
        -144 (-24.00% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.ImmutableArrayExtensions:HasAnyErrors(System.Collections.Immutable.ImmutableArray`1[__Canon]):bool (2 methods)
        -104 (-23.64% of base) : System.Private.CoreLib.dasm - System.HashCode:Combine(int,int,int,int,int,int,int):int

Top method improvements (bytes):
       -6816 (-15.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrPrivateTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
       -6700 (-15.85% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.ClrTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
       -5980 (-10.15% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.KernelTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
       -5720 (-20.50% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.AspNet.AspNetTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
       -1948 (-20.24% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Clr.ClrRundownTraceEventParser:EnumerateTemplates(System.Func`3[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.EventFilterResponse, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]],System.Action`1[[Microsoft.Diagnostics.Tracing.TraceEvent, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):this
       -1784 (-19.70% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlBinaryReader:ReadNode():bool:this
       -1368 (-2.21% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.CtfTraceEventSource:InitEventMap():System.Collections.Generic.Dictionary`2[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.Diagnostics.Tracing.ETWMapping, Microsoft.Diagnostics.Tracing.TraceEvent, Version=2.0.49.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]
       -1300 (-4.15% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.RegexCompiler:GenerateOneCode():this
       -1228 (-4.75% of base) : System.Private.Xml.dasm - System.Xml.Xsl.IlGen.XmlILMethods:.cctor()

}
else
{
// Arm64 allows any arbitrary 16-bit constant to be loaded into a register halfword
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HWIntrinsics recently got some support in lowering where we will generate better code if all inputs to a NI_Vector128_Create call are constant.
Is there any handling that should be added here to help account for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because the Lower phase runs after the CSE phase

@briansull briansull force-pushed the cse-const-shared branch 2 times, most recently from e192f27 to 61527dd Compare June 23, 2020 01:00
case GT_CNS_STR:
// Uses movw/movt
costSz = 7;
costEx = 3;
goto COMMON_CNS;

case GT_CNS_LNG:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why this is needed since the PR is for Arm64? Isn't GT_CNS_LNG never created for 64-bit JIT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I worded this poorly. I meant to say the PR title is implying this is just Arm64 work and should likely be updated if its also touching arm32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It current;y enables CSE's of constants across all platforms.
On ARM64 we get the largest benefits..

@briansull briansull changed the title [WIP] Implementation of CSE for GT_CNS_INT benefits ARM64. Implementation of CSE for GT_CNS_INT benefits ARM64. Jun 30, 2020
@@ -2705,6 +2705,8 @@ void Compiler::fgInitArgInfo(GenTreeCall* call)
size_t addrValue = (size_t)call->gtEntryPoint.addr;
GenTree* indirectCellAddress = gtNewIconHandleNode(addrValue, GTF_ICON_FTN_ADDR);
indirectCellAddress->SetRegNum(REG_R2R_INDIRECT_PARAM);
// Don't attempt to CSE this constant
indirectCellAddress->SetDoNotCSE();
Copy link
Member

@kunalspathak kunalspathak Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that we are not CSEing indirect cell addresses.

We can perhaps CSE the following code pattern:

        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
        D63F0000          blr     x0
        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
        D63F0000          blr     x0
        AA0003EF          mov     x15, x0
       ...
        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
                          mov   xR, x0     ; store x0 in some register xR
        D63F0000          blr     x0
        ...
                          mov   x0, xR     ; retrieve xR into x0
        D63F0000          blr     x0
        AA0003EF          mov     x15, x0
       ...

With this, we can get an improvement of 8 bytes + 1 elimination of memory access.
I wrote an analyzer asm to find out how many addresses are CSE candidates and the number is huge. From what I noticed, it would by little over 2MB of size reduction.

Processed 191816 methods. Found 29246 methods containing 259123 groups.

Details: cse-candidates.txt

Update: Certainly not for this PR.

@BruceForstall , @CarolEidt , @AndyAyersMS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeAsm code that I used to find out CSE candidates: dotnet/jitutils#273

Copy link
Contributor Author

@briansull briansull Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this limitation because it was causing an assert failure in LSRA.
I will re-check this to see if it still happens and add a comment if so.

Copy link
Contributor Author

@briansull briansull Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indirection of r11 is never materialized in our current IR:
so we can't create a CSE for these indirections

Generating: N011 (  2,  8) [000039] -------N----        t39 =    CNS_INT(h) long   0xd1ffab1e ftn REG x11 $140
IN0002:             adrp    x11, [RELOC #0xd1ffab1e]
IN0003:             add     x11, x11, #0
                                                              /--*  t39    long   
Generating: N013 (???,???) [000144] ------------       t144 = *  PUTARG_REG long   REG x11
                                                              /--*  t144   long   arg0 in x11
Generating: N015 ( 16, 11) [000005] --C---------         t5 = *  CALL help r2r_ind ref    HELPER.CORINFO_HELP_READYTORUN_NEW REG x0 $1c0
IN0004:             ldr     x0, [x11]
Call: GCvars=0000000000000000 {}, gcrefRegs=180000 {x19 x20}, byrefRegs=0000 {}
IN0005:             blr     x0

Copy link
Contributor Author

@briansull briansull Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CSE of Constants are enable for arm32 we will hit an assert when we CSE this address:
(So I will add an #ifdef to prevent this on Arm32)

Assertion failed 'candidates != candidateBit' in 'System.Array:ConvertAll(System.__Canon[],System.Converter2[__Canon,__Canon]):System.__Canon[]' during 'LSRA build intervals' (IL size 64)`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Is there a follow-up issue to investigate this ARM failure?

@kunalspathak
Copy link
Member

    stubAddrArg  = gtNewIconHandleNode(addr, GTF_ICON_FTN_ADDR);

should we mark this also as SetDoNotCSE()?


Refers to: src/coreclr/src/jit/morph.cpp:8397 in 99b1da8. [](commit_id = 99b1da839ecc7903e531417c8cc910580054101e, deletion_comment = False)

@briansull
Copy link
Contributor Author

briansull commented Jul 1, 2020

Sample improvement in BenchmarksGame.RegexRedux_5:
We create two CSE's for constants: cse2 in x24 and cse3 in x19
These are examples of shared offset CSE's as they generate multiple address values that differ by a small constant:

        D28A5418          movz    x24, #0x52a0
        F2B11EB8          movk    x24, #0x88f5 LSL #16
        F2CFFF38          movk    x24, #0x7ff9 LSL #32
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST

        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST

        D28C6313          movz    x19, #0x6318
        F2B118F3          movk    x19, #0x88c7 LSL #16
        F2CFFF33          movk    x19, #0x7ff9 LSL #32
        F9000C13          str     x19, [x0,#24]
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[Int32]):System.Threading.Tasks.Task`1[Int32]

        91002261          add     x1, x19, #8
        F9000C01          str     x1, [x0,#24]

        91004261          add     x1, x19, #16
        F9000C01          str     x1, [x0,#24]

**Before**
; Total bytes of code 1588, prolog size 28, PerfScore 440.80, (MethodHash=1582adb1) for method BenchmarksGame.RegexRedux_5:Bench(System.IO.TextReader,bool):int
**After**
; Total bytes of code 1376, prolog size 28, PerfScore 395.60, (MethodHash=1582adb1) for method BenchmarksGame.RegexRedux_5:Bench(System.IO.TextReader,bool):int

Here is the new Jitted code generated for this method:

; Assembly listing for method BenchmarksGame.RegexRedux_5:Bench(System.IO.TextReader,bool):int
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T14] (  4,  4   )     ref  ->  x19         class-hnd
;  V01 arg1         [V01,T15] (  3,  3   )    bool  ->  x20        
;  V02 loc0         [V02,T01] ( 15, 14.50)     ref  ->  x21         class-hnd exact
;  V03 loc1         [V03,T58] (  2,  1.50)     int  ->  x22         ld-addr-op
;  V04 loc2         [V04,T31] (  3,  2.50)     ref  ->  x23         class-hnd
;  V05 loc3         [V05,T33] (  3,  2   )     ref  ->  x25         class-hnd
;  V06 loc4         [V06,T34] (  3,  2   )     ref  ->  x26         class-hnd
;  V07 loc5         [V07,T35] (  3,  2   )     ref  ->  x27         class-hnd
;  V08 loc6         [V08,T36] (  3,  2   )     ref  ->  x28         class-hnd
;  V09 loc7         [V09,T37] (  3,  2   )     ref  ->  [fp+0x28]   class-hnd
;  V10 loc8         [V10,T38] (  3,  2   )     ref  ->  [fp+0x20]   class-hnd
;  V11 loc9         [V11,T39] (  3,  2   )     ref  ->  [fp+0x18]   class-hnd
;  V12 loc10        [V12,T40] (  3,  2   )     ref  ->  [fp+0x10]   class-hnd
;  V13 loc11        [V13,T41] (  3,  2   )     ref  ->  x19         class-hnd
;  V14 loc12        [V14,T32] (  4,  2   )     int  ->   x0         ld-addr-op
;# V15 OutArgs      [V15    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;  V16 tmp1         [V16,T16] (  2,  4   )     ref  ->  x21         class-hnd exact "NewObj constructor temp"
;  V17 tmp2         [V17,T04] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V18 tmp3         [V18,T05] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V19 tmp4         [V19,T06] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V20 tmp5         [V20,T07] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V21 tmp6         [V21,T08] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V22 tmp7         [V22,T09] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V23 tmp8         [V23,T10] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V24 tmp9         [V24,T11] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V25 tmp10        [V25,T12] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V26 tmp11        [V26,T13] (  4,  8   )     ref  ->   x0         class-hnd exact "NewObj constructor temp"
;  V27 tmp12        [V27,T02] ( 11, 11   )     ref  ->  x20         class-hnd exact "dup spill"
;  V28 tmp13        [V28,T19] (  3,  3   )     ref  ->  x24         class-hnd "impAppendStmt"
;  V29 tmp14        [V29,T20] (  3,  3   )     ref  ->  x20         class-hnd "impAppendStmt"
;  V30 tmp15        [V30,T21] (  3,  3   )     ref  ->  x25         class-hnd "impAppendStmt"
;  V31 tmp16        [V31,T22] (  3,  3   )     ref  ->  x26         class-hnd "impAppendStmt"
;  V32 tmp17        [V32,T23] (  3,  3   )     ref  ->  x20         class-hnd "impAppendStmt"
;  V33 tmp18        [V33,T24] (  3,  3   )     ref  ->  x20         class-hnd "impAppendStmt"
;  V34 tmp19        [V34,T25] (  3,  3   )     ref  ->  x28         class-hnd "impAppendStmt"
;  V35 tmp20        [V35,T26] (  3,  3   )     ref  ->  x27         class-hnd "impAppendStmt"
;  V36 tmp21        [V36,T27] (  3,  3   )     ref  ->  x19         class-hnd "impAppendStmt"
;  V37 tmp22        [V37,T28] (  3,  3   )     ref  ->  x19         class-hnd "impAppendStmt"
;  V38 tmp23        [V38,T42] (  2,  2   )     ref  ->  x20         class-hnd "Local could be aliased or is pinned"
;  V39 tmp24        [V39,T29] (  3,  3   )     ref  ->  x19         class-hnd "impAppendStmt"
;  V40 tmp25        [V40,T17] (  2,  4   )     ref  ->  x19         class-hnd "Inlining Arg"
;* V41 tmp26        [V41    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V42 tmp27        [V42    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V43 tmp28        [V43    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V44 tmp29        [V44    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V45 tmp30        [V45    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V46 tmp31        [V46    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V47 tmp32        [V47    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V48 tmp33        [V48    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V49 tmp34        [V49    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;* V50 tmp35        [V50    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;  V51 tmp36        [V51,T18] (  2,  4   )     ref  ->   x0         "argument with side effect"
;  V52 tmp37        [V52,T43] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V53 tmp38        [V53,T44] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V54 tmp39        [V54,T45] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V55 tmp40        [V55,T46] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V56 tmp41        [V56,T47] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V57 tmp42        [V57,T48] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V58 tmp43        [V58,T49] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V59 tmp44        [V59,T50] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V60 tmp45        [V60,T51] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V61 tmp46        [V61,T52] (  2,  2   )     ref  ->   x3         "argument with side effect"
;  V62 tmp47        [V62,T53] (  2,  2   )     ref  ->  x21         "argument with side effect"
;  V63 tmp48        [V63,T54] (  2,  2   )     ref  ->  x22         "argument with side effect"
;  V64 tmp49        [V64,T55] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V65 tmp50        [V65,T56] (  2,  2   )     ref  ->   x1         "argument with side effect"
;  V66 cse0         [V66,T57] (  3,  1.50)     ref  ->  x21         "CSE - moderate"
;  V67 cse1         [V67,T30] (  3,  3   )     ref  ->  x19         "CSE - moderate"
;  V68 cse2         [V68,T00] ( 20, 19.50)    long  ->  x24         "CSE - aggressive"
;  V69 cse3         [V69,T03] ( 11, 11   )    long  ->  x19         "CSE - aggressive"
;
; Lcl frame size = 32

G_M21070_IG01:
        A9B87BFD          stp     fp, lr, [sp,#-128]!
        A90353F3          stp     x19, x20, [sp,#48]
        A9045BF5          stp     x21, x22, [sp,#64]
        A90563F7          stp     x23, x24, [sp,#80]
        A9066BF9          stp     x25, x26, [sp,#96]
        A90773FB          stp     x27, x28, [sp,#112]
        910003FD          mov     fp, sp
        AA0003F3          mov     x19, x0
        2A0103F4          mov     w20, w1
						;; bbWeight=1    PerfScore 7.50
G_M21070_IG02:
        D29F8B00          movz    x0, #0xfc58
        F2B11E20          movk    x0, #0x88f1 LSL #16
        F2CFFF20          movk    x0, #0x7ff9 LSL #32
        94000000          bl      CORINFO_HELP_NEWSFAST
        AA0003F5          mov     x21, x0
        AA1303E0          mov     x0, x19
        F9400261          ldr     x1, [x19]
        F9402821          ldr     x1, [x1,#80]
        F9401021          ldr     x1, [x1,#32]
        D63F0020          blr     x1
        910022AE          add     x14, x21, #8
        AA0003EF          mov     x15, x0
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        F94006B3          ldr     x19, [x21,#8]
        B9400A76          ldr     w22, [x19,#8]
        D2863600          movz    x0, #0x31b0
        F2A20000          movk    x0, #0x1000 LSL #16
        F2C03740          movk    x0, #442 LSL #32
        F9400000          ldr     x0, [x0]
        94000000          bl      System.Text.RegularExpressions.RegexCache:GetOrAdd(System.String):System.Text.RegularExpressions.Regex
        D2860C02          movz    x2, #0x3060
        F2A20002          movk    x2, #0x1000 LSL #16
        F2C03742          movk    x2, #442 LSL #32
        F9400042          ldr     x2, [x2]
        AA1303E1          mov     x1, x19
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Text.RegularExpressions.Regex:Replace(System.String,System.String):System.String:this
        910022AE          add     x14, x21, #8
        AA0003EF          mov     x15, x0
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        D289B400          movz    x0, #0x4da0
        F2B11EA0          movk    x0, #0x88f5 LSL #16
        F2CFFF20          movk    x0, #0x7ff9 LSL #32
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        D28C6313          movz    x19, #0x6318
        F2B118F3          movk    x19, #0x88c7 LSL #16
        F2CFFF33          movk    x19, #0x7ff9 LSL #32
        F9000C13          str     x19, [x0,#24]
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[Int32]):System.Threading.Tasks.Task`1[Int32]
        AA0003F7          mov     x23, x0
        D28A5418          movz    x24, #0x52a0
        F2B11EB8          movk    x24, #0x88f5 LSL #16
        F2CFFF38          movk    x24, #0x7ff9 LSL #32
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        91002261          add     x1, x19, #8
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        AA0003F9          mov     x25, x0
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        91004261          add     x1, x19, #16
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        AA0003FA          mov     x26, x0
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        91006261          add     x1, x19, #24
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        AA0003FB          mov     x27, x0
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        91008261          add     x1, x19, #32
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        AA0003FC          mov     x28, x0
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        9100A261          add     x1, x19, #40
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        F90017A0          str     x0, [fp,#40]	// [V09 loc7]
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        9100C261          add     x1, x19, #48
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
						;; bbWeight=1    PerfScore 92.00
G_M21070_IG03:
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        F90013A0          str     x0, [fp,#32]	// [V10 loc8]
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        9100E261          add     x1, x19, #56
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        F9000FA0          str     x0, [fp,#24]	// [V11 loc9]
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        91010261          add     x1, x19, #64
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        F9000BA0          str     x0, [fp,#16]	// [V12 loc10]
        AA1803E0          mov     x0, x24
        94000000          bl      CORINFO_HELP_NEWSFAST
        9100200E          add     x14, x0, #8
        AA1503EF          mov     x15, x21
        94000000          bl      CORINFO_HELP_ASSIGN_REF
        91012261          add     x1, x19, #72
        F9000C01          str     x1, [x0,#24]
        AA0003E1          mov     x1, x0
        9108A300          add     x0, x24, #552
        94000000          bl      System.Threading.Tasks.Task:Run(System.Func`1[__Canon]):System.Threading.Tasks.Task`1[__Canon]
        AA0003F3          mov     x19, x0
        72001E9F          tst     w20, #255
        54001120          beq     G_M21070_IG06
						;; bbWeight=1    PerfScore 27.00
G_M21070_IG04:
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F8          mov     x24, x0
        F9400FA0          ldr     x0, [fp,#24]
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1803E0          mov     x0, x24
        F9400302          ldr     x2, [x24]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F4          mov     x20, x0
        AA1903E0          mov     x0, x25
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1403E0          mov     x0, x20
        F9400282          ldr     x2, [x20]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F9          mov     x25, x0
        AA1A03E0          mov     x0, x26
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1903E0          mov     x0, x25
        F9400322          ldr     x2, [x25]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003FA          mov     x26, x0
        F94017A0          ldr     x0, [fp,#40]
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1A03E0          mov     x0, x26
        F9400342          ldr     x2, [x26]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F4          mov     x20, x0
        F94013A0          ldr     x0, [fp,#32]
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1403E0          mov     x0, x20
        F9400282          ldr     x2, [x20]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F4          mov     x20, x0
        AA1C03E0          mov     x0, x28
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1403E0          mov     x0, x20
        F9400282          ldr     x2, [x20]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003FC          mov     x28, x0
        AA1B03E0          mov     x0, x27
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1C03E0          mov     x0, x28
        F9400382          ldr     x2, [x28]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003FB          mov     x27, x0
        AA1303E0          mov     x0, x19
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1B03E0          mov     x0, x27
        F9400362          ldr     x2, [x27]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
						;; bbWeight=0.50 PerfScore 70.25
G_M21070_IG05:
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F3          mov     x19, x0
        F9400BA0          ldr     x0, [fp,#16]
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[__Canon][System.__Canon]:get_Result():System.__Canon:this
        AA0003E1          mov     x1, x0
        AA1303E0          mov     x0, x19
        F9400262          ldr     x2, [x19]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F3          mov     x19, x0
        2A1603E0          mov     w0, w22
        94000000          bl      System.Number:Int32ToDecStr(int):System.String
        AA0003F4          mov     x20, x0
        F94006A0          ldr     x0, [x21,#8]
        B9400800          ldr     w0, [x0,#8]
        D2863701          movz    x1, #0x31b8
        F2A20001          movk    x1, #0x1000 LSL #16
        F2C03741          movk    x1, #442 LSL #32
        F9400035          ldr     x21, [x1]
        AA1503F6          mov     x22, x21
        94000000          bl      System.Number:Int32ToDecStr(int):System.String
        AA0003E3          mov     x3, x0
        AA1503E2          mov     x2, x21
        AA1603E0          mov     x0, x22
        AA1403E1          mov     x1, x20
        94000000          bl      System.String:Concat(System.String,System.String,System.String,System.String):System.String
        AA0003E1          mov     x1, x0
        AA1303E0          mov     x0, x19
        F9400262          ldr     x2, [x19]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        94000000          bl      System.Console:get_Out():System.IO.TextWriter
        AA0003F3          mov     x19, x0
        AA1703E0          mov     x0, x23
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[Int32][System.Int32]:get_Result():int:this
        94000000          bl      System.Number:Int32ToDecStr(int):System.String
        AA0003E1          mov     x1, x0
        AA1303E0          mov     x0, x19
        F9400262          ldr     x2, [x19]
        F9404042          ldr     x2, [x2,#128]
        F9400C42          ldr     x2, [x2,#24]
        D63F0040          blr     x2
        1400002B          b       G_M21070_IG07
						;; bbWeight=0.50 PerfScore 33.50
G_M21070_IG06:
        910AC300          add     x0, x24, #688
        D2800121          mov     x1, #9
        94000000          bl      CORINFO_HELP_NEWARR_1_OBJ
        AA0003F4          mov     x20, x0
        AA1403E0          mov     x0, x20
        F9400FA2          ldr     x2, [fp,#24]	// [V11 loc9]
        52800001          mov     w1, #0
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        AA1903E2          mov     x2, x25
        52800021          mov     w1, #1
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        AA1A03E2          mov     x2, x26
        52800041          mov     w1, #2
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        F94017A2          ldr     x2, [fp,#40]	// [V09 loc7]
        52800061          mov     w1, #3
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        F94013A2          ldr     x2, [fp,#32]	// [V10 loc8]
        52800081          mov     w1, #4
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        AA1C03E2          mov     x2, x28
        528000A1          mov     w1, #5
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        AA1B03E2          mov     x2, x27
        528000C1          mov     w1, #6
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        AA1303E2          mov     x2, x19
        528000E1          mov     w1, #7
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        F9400BA2          ldr     x2, [fp,#16]	// [V12 loc10]
        52800101          mov     w1, #8
        94000000          bl      CORINFO_HELP_ARRADDR_ST
        AA1403E0          mov     x0, x20
        94000000          bl      System.Threading.Tasks.Task:WaitAll(System.Threading.Tasks.Task[])
						;; bbWeight=0.50 PerfScore 16.25
G_M21070_IG07:
        AA1703E0          mov     x0, x23
        B940001F          ldr     wzr, [x0]
        94000000          bl      System.Threading.Tasks.Task`1[Int32][System.Int32]:get_Result():int:this
						;; bbWeight=1    PerfScore 4.50
G_M21070_IG08:
        A94773FB          ldp     x27, x28, [sp,#112]
        A9466BF9          ldp     x25, x26, [sp,#96]
        A94563F7          ldp     x23, x24, [sp,#80]
        A9445BF5          ldp     x21, x22, [sp,#64]
        A94353F3          ldp     x19, x20, [sp,#48]
        A8C87BFD          ldp     fp, lr, [sp],#128
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 7.00

; Total bytes of code 1376, prolog size 28, PerfScore 395.60, (MethodHash=1582adb1) for method BenchmarksGame.RegexRedux_5:Bench(System.IO.TextReader,bool):int

// Default 0, enable the CSE of Constants, including nearby offsets. (only for ARM64)
// If 1, disable all the CSE of Constants
// If 2, enable the CSE of Constants but don't combine with nearby offsets. (only for ARM64)
// If 3, enable the CSE of Constants including nearby offsets. (all platforms)
Copy link
Member

@kunalspathak kunalspathak Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(all platforms) [](start = 62, length = 16)

just for completion, add (all platforms). I didn't check at other places yet, but we will not have "CSE + nearby offset" optimization for other platforms by default?

Copy link
Contributor Author

@briansull briansull Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have disabled CSE of constants by default for non-ARM64 platforms.

see optcse.cpp lines 726-742

    bool disableConstCSE = false;

    int configValue = JitConfig.JitDisableConstCSE();

    // all platforms - disable CSE of constant values when config is 1
    if (configValue == 1)
    {
        disableConstCSE = true;
    }
#if !defined(TARGET_ARM64)
    // non-ARM64 platforms - also disable CSE of constant values when config is 0 or 2
    if ((configValue == 0) || (configValue == 2))
    {
        disableConstCSE = true;
    }
#endif

@TamarChristinaArm
Copy link
Contributor

That looks awesome @briansull , great improvement!. Unrelated question to this, is CORINFO_HELP_ARRADDR_ST void? in which case we can avoid repeatedly setting x0 before the calls if it knows they're still live.

#endif // TARGET_ARM64

// All Platforms - also allow to combine with nearby offsets, when config is 3
if (configValue == 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configValue [](start = 8, length = 11)

are you considering adding a JITStress pipeline or something equivalent to test this config value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning on adding that. I want to be able to experiment on enabling this for the other platforms.

@briansull
Copy link
Contributor Author

@TamarChristinaArm Yes CORINFO_HELP_ARRADDR_ST has no return value, but it follows the normal calling convention, which kills the incoming argument values in x0, w1 and x2. So they have to be reloaded at each call site.

@TamarChristinaArm
Copy link
Contributor

@TamarChristinaArm Yes CORINFO_HELP_ARRADDR_ST has no return value, but it follows the normal calling convention, which kills the incoming argument values in x0, w1 and x2. So they have to be reloaded at each call site.

@briansull Ah ok, I wasn't sure if you guys strictly followed the platform calling convention for internal calls too.

Ugh.. sorry I should stop responding near bed-time :) it's a argument register, no guarantee it would ever be live after a call. sorry for the noise :)

@briansull briansull force-pushed the cse-const-shared branch 2 times, most recently from cf42eab to 5092e9c Compare July 6, 2020 15:56
@briansull
Copy link
Contributor Author

@dotnet/jit-contrib PTAL

@briansull
Copy link
Contributor Author

Test including JitStress=1 are all passing,
failed one test, due to current known issue:
pauseonstart test failures #38847

@briansull
Copy link
Contributor Author

Will rebase to resolve conflicts,
I want to add the above tests to the priority zero catagory to increase our coverage of Floating point optimizations

…GT_CNS_STR

ARM64
      -11528 : System.Private.CoreLib.dasm (-0.10% of base)
252 total methods with Code Size differences (222 improved, 30 regressed), 20696 unchanged.
X64
         886 : System.Private.CoreLib.dasm (0.03% of base)
229 total methods with Code Size differences (16 improved, 213 regressed), 20918 unchanged.
…ested loop

We also will record that the block has a call if the duplicated code contains a call.
… specific register,

so clear the RegNum if it was set in the original expression.
Add support for COMPLUS_JitDisableConstCSE
Const CSE may create an assignment node for the target of a calli: indirect call
…he definition, sets csdConstDefValue and csdConstDefVN

Implemented shared consts CSE's with a 16-bit offset for x64, can be enabled using COMPLUS_JitDisableConstCSE=3
Update JitDisableConstCSE to support various options 1-5
…tions

Coverage for Floating point CSEs, conditional branch AssertionProp and NaN handling
@briansull
Copy link
Contributor Author

@dotnet/jit-contrib PTAL

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid using magic because it is very hard to read and understand.

Nit: I think the review could be faster if you create several PRs put of this: with magic division, correctness fixes and CSE optimizations.

// Also set the value number on the relop.
if (curAssertion->assertionKind == OAK_EQUAL)
// set foldResult to either 0 or 1
bool foldResult = assertionKindIsEqual;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a correctness fix? Were we getting wrong VN pair for != in the past?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is incorrectly assigning the wrong constant value number. I found similar code
It was occasionally happening in the framework dlls and the the beq_r8 test cases.
Our code has two chances to fold the expression, the first one will fold compares with two constant arguments,
which is the common case. If that doesn't fold then it tries folding using the value numbers and that path was wrong and caused the beq_r8 tests to fail with my CSE changes.

@@ -4947,6 +4955,12 @@ GenTree* Compiler::optVNConstantPropOnJTrue(BasicBlock* block, GenTree* test)
//
Compiler::fgWalkResult Compiler::optVNConstantPropCurStmt(BasicBlock* block, Statement* stmt, GenTree* tree)
{
// Don't perform const prop on expressions marked with GTF_DONT_CSE
if (!tree->CanCSE())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was it unprofitable or incorrect to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a constant prop here would replace the CSE LclVar with the original constant.
Essentially undoing the CSE of the constant.

@@ -6353,6 +6356,19 @@ class Compiler
// number, this will reflect it; otherwise, NoVN.
};

#if defined(TARGET_XARCH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this block live in target.h?

@@ -2853,6 +2861,19 @@ struct GenTreeOp : public GenTreeUnOp
assert(oper == GT_NOP || oper == GT_RETURN || oper == GT_RETFILT || OperIsBlk(oper));
}

// returns true if we will use MagicNumber multiplication for this node.
bool usesMagicNumberDivision(Compiler* comp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we prefer to start function with an upper case unless we create them with a prefix or in an old class that does not follow the current cc, so UsesMagicNumberDivision/CheckMagicNumberDivision etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as part of #39021

@@ -922,6 +928,8 @@ struct GenTree
#define GTF_OVERFLOW 0x10000000 // Supported for: GT_ADD, GT_SUB, GT_MUL and GT_CAST.
// Requires an overflow check. Use gtOverflow(Ex)() to check this flag.

#define GTF_DIV_USE_MAGIC 0x80000000 // GT_DIV -- Uses MagicNumber multiplication to compute this division by a constant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly against introducing any magic to our source base, Use_Reciprocal_Mul_Or_Shift or something similar to old DIV_BY_CONST\DIV_USE_NO_DIV looks better in the long term for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are commits about this magic division related to implementation of CSE and should be in this PR or could be extracted?

Copy link
Contributor Author

@briansull briansull Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this code to prevent regressions. It is a big win to perform a division by a constant using the reciprocal multiplication. There were cases where we would CSE the constant and replace it with a CSE local variable, thus preventing the optimization. I didn't invent the term Magic number division as it was already is use in our code base. I will try to come up with a new term to use as the GTF flag.

I could extract the part of the change and check it in separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I factored out all of the prerequisite code changes for this into a new PR

#39021

@briansull
Copy link
Contributor Author

This PR has been replaced by #39096

@briansull briansull changed the title Implementation of CSE for GT_CNS_INT benefits ARM64. [Outdated] Implementation of CSE for GT_CNS_INT benefits ARM64. Jul 10, 2020
@stephentoub
Copy link
Member

This PR has been replaced by #39096

Can this be closed then?

@briansull briansull closed this Jul 15, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants