Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression with tail.callvirt transformation to helper call #45527

Merged
merged 5 commits into from
Dec 10, 2020

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented Dec 3, 2020

The changes fixes the following issue:

  1. JIT transforms tail. callvirt #<generic virtual method> to helper-based tail call
  2. this argument of the call is computed during another call with side effect
  3. JIT is required to compute and pass the target function pointer address to IL_STUB_StoreTailCallArgs. Since the call is virtual in order to compute such address using CORINFO_HELP_VIRTUAL_FUNC_PTR helper call the JIT would need to pass this value to the helper
  4. during that transformation the JIT cloned a tree that corresponds to this and use the cloned tree to compute the target function pointer address
  5. as a consequence such tree along with its side effects will be evaluated twice

This is reproduces using the following IL code:

call       instance class Runtime_45250.Program/Func`1<!!0> Runtime_45250.Program/FuncGetter::Get<string>()
tail.
callvirt   instance void class Runtime_45250.Program/Func`1<string>::Run()

The IL snippet corresponds to C# code below with small modification of adding tail. prefix:

static void Run(FuncGetter funcGetter)
{
    funcGetter.Get<string>().Run();
}

Originally the issue was only reproduced on arm. However, this is due to the fact that on other platforms the JIT would use fast tail calls for such cases. If the environment has set COMPlus_FastTailCalls=0 than the issue would reproduce everywhere where portable tail call helpers are used. In order to disallow using fast tail calls and always use helpers-based tail calls all the test cases I am adding contain localloc in the caller method.

Before:

In fgMorphTailCallViaHelpers a tree that corresponds to the above-described problem:

fgMorphTailCallViaHelpers (before):
               [000007] --CXG-------              *  CALLV ind void   Func`1[__Canon][System.__Canon].Run
               [000005] --C-G------- this in rcx  \--*  CALL      ref    FuncGetter.Get
               [000004] ------------ this in rcx     +--*  LCL_VAR   ref    V00 arg0         
               [000006] H----------- arg1            \--*  CNS_INT(h) long   0x7ff9f44a2558 method

would be transformed into a call to dispatcher and an IL stub:

fgMorphTailCallViaHelpers (after):
               [000020] --CXG+------              *  COMMA     void  
               [000007] --CXG+------              +--*  CALL      void   ILStubClass.IL_STUB_StoreTailCallArgs
               [000031] -ACXG-----L- arg0 SETUP   |  +--*  ASG       ref   
               [000030] D------N----              |  |  +--*  LCL_VAR   ref    V04 tmp3         
               [000005] --CXG+------              |  |  \--*  CALL      ref    FuncGetter.Get
               [000004] -----+------ this in rcx  |  |     +--*  LCL_VAR   ref    V00 arg0         
               [000006] H----+------ arg1 in rdx  |  |     \--*  CNS_INT(h) long   0x7ff9f44a2558 method
               [000034] -ACXG-----L- arg1 SETUP   |  +--*  ASG       long  
               [000033] D------N----              |  |  +--*  LCL_VAR   long   V05 tmp4         
               [000019] --CXG+------              |  |  \--*  CALL help long   HELPER.CORINFO_HELP_VIRTUAL_FUNC_PTR
               [000026] -ACXG-----L- arg0 SETUP   |  |     +--*  ASG       ref   
               [000025] D------N----              |  |     |  +--*  LCL_VAR   ref    V03 tmp2         
               [000014] --CXG+------              |  |     |  \--*  CALL      ref    FuncGetter.Get
               [000015] -----+------ this in rcx  |  |     |     +--*  LCL_VAR   ref    V00 arg0         
               [000016] H----+------ arg1 in rdx  |  |     |     \--*  CNS_INT(h) long   0x7ff9f44a2558 method
               [000027] ------------ arg0 in rcx  |  |     +--*  LCL_VAR   ref    V03 tmp2         
               [000017] H----+------ arg1 in rdx  |  |     +--*  CNS_INT(h) long   0x7ff9f44a2960 token
               [000018] H----+------ arg2 in r8   |  |     \--*  CNS_INT(h) long   0x7ff9f44a27d0 token
               [000032] ------------ arg0 in rcx  |  +--*  LCL_VAR   ref    V04 tmp3         
               [000035] ------------ arg1 in rdx  |  \--*  LCL_VAR   long   V05 tmp4         
               [000009] --CXG+------              \--*  CALL      void   System.Runtime.CompilerServices.RuntimeHelpers.DispatchTailCalls
               [000013] -----+------ arg0 in rcx     +--*  ADDR      long  
               [000012] ----G+-N----                 |  \--*  LCL_VAR   long  (AX) V02 ReturnAddress
               [000011] H----+------ arg1 in rdx     +--*  CNS_INT(h) long   0x7ff9f4168490 ftn
               [000010] -----+------ arg2 in r8      \--*  CNS_INT   long   0

During that transformation a tree [000005] was cloned and

  1. used to compute the target function pointer address with CORINFO_HELP_VIRTUAL_FUNC_PTR and
  2. passed as an argument to IL_STUB_StoreTailCallArgs

After:
The cloning is replaced with spilling of the tree to a local before calling StoreTailCallArgs

fgMorphTailCallViaHelpers (after):
               [000022] -ACXG+------              *  COMMA     void  
               [000035] -ACXG-------              +--*  COMMA     void  
               [000015] -ACXG+------              |  +--*  ASG       ref   
               [000014] D----+-N----              |  |  +--*  LCL_VAR   ref    V03 tmp2         
               [000005] --CXG+------              |  |  \--*  CALL      ref    FuncGetter.Get
               [000004] -----+------ this in rcx  |  |     +--*  LCL_VAR   ref    V00 arg0         
               [000006] H----+------ arg1 in rdx  |  |     \--*  CNS_INT(h) long   0x7ff9f4102558 method
               [000007] --CXG+------              |  \--*  CALL      void   ILStubClass.IL_STUB_StoreTailCallArgs
               [000029] -ACXG-----L- arg1 SETUP   |     +--*  ASG       long  
               [000028] D------N----              |     |  +--*  LCL_VAR   long   V04 tmp3         
               [000020] --CXG+------              |     |  \--*  CALL help long   HELPER.CORINFO_HELP_VIRTUAL_FUNC_PTR
               [000017] -----+------ arg0 in rcx  |     |     +--*  LCL_VAR   ref    V03 tmp2         
               [000018] H----+------ arg1 in rdx  |     |     +--*  CNS_INT(h) long   0x7ff9f4102960 token
               [000019] H----+------ arg2 in r8   |     |     \--*  CNS_INT(h) long   0x7ff9f41027d0 token
               [000030] ------------ arg1 in rdx  |     +--*  LCL_VAR   long   V04 tmp3         
               [000016] -----+------ arg0 in rcx  |     \--*  LCL_VAR   ref    V03 tmp2         
               [000009] --CXG+------              \--*  CALL      void   System.Runtime.CompilerServices.RuntimeHelpers.DispatchTailCalls
               [000013] -----+------ arg0 in rcx     +--*  ADDR      long  
               [000012] ----G+-N----                 |  \--*  LCL_VAR   long  (AX) V02 ReturnAddress
               [000011] H----+------ arg1 in rdx     +--*  CNS_INT(h) long   0x7ff9f3dc8490 ftn
               [000010] -----+------ arg2 in r8      \--*  CNS_INT   long   0

In addition to the regression test I updated more_tailcalls.cs/more_tailcalls.ik test suite and included the failing pattern.

Before:

BEGIN EXECUTION
 "F:\echesako\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe" more_tailcalls.dll
Static non-generic: OK in 217 ms
Static non-generic small: OK in 201 ms
Static non-generic retbuf: OK in 202 ms
Static non-generic long: OK in 198 ms
Static non-generic S16: OK in 203 ms
Static void: OK in 202 ms
Instance non-generic: OK in 215 ms
Instance non-generic retbuf: OK in 210 ms
Abstract class non-generic: OK in 202 ms
Abstract class non-generic retbuf: OK in 208 ms
Interface non-generic: OK in 211 ms
Interface non-generic retbuf: OK in 206 ms
Static calli: OK in 204 ms
Static calli retbuf: OK in 203 ms
Instance calli: OK in 217 ms
Instance calli retbuf: OK in 217 ms
Static calli without args: OK in 5 ms
calli to an instance method on a value type: OK in 4 ms
calli to an instance method on a value type with explicit this: OK in 3 ms
Value type instance call: OK in 201 ms
Instance with GC: OK in 6 ms
Count up with heap int: OK in 225 ms
Count up with byref to heap: OK in 199 ms
Static generic string: OK in 8 ms
Static generic object: OK in 0 ms
Static generic int: OK in 3 ms
Static generic 2 string object: OK in 5 ms
Static generic 2 string int: OK in 5 ms
Static generic 1 string: OK in 5 ms
Static generic 1 object: OK in 0 ms
Static generic 1 int: OK in 4 ms
Static generic 0: OK in 3 ms
Instance generic 4: OK in 9 ms
Virtual instance generic 4: OK in 12 ms
Interface generic 4: OK in 10 ms
Interface generic forward G: OK in 6 ms
Interface generic 0: OK in 10 ms
Interface generic without generics on method: OK in 5 ms
Abstract generic with generic on method 1: OK in 3 ms
Abstract generic with generic on method 2: OK in 2 ms
Abstract generic without generic on method 1: OK in 1 ms
Abstract generic without generic on method 2: OK in 1 ms
Instantiating stub direct: OK in 845 ms
Virtual call where computing "this" has side effects: FAIL (expected 1, got 2)
One or more failures in tailcall-via-help test
Expected: 100
Actual: 1
END EXECUTION - FAILED
FAILED

After:

BEGIN EXECUTION
 "F:\echesako\git\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\corerun.exe" more_tailcalls.dll
Static non-generic: OK in 210 ms
Static non-generic small: OK in 207 ms
Static non-generic retbuf: OK in 207 ms
Static non-generic long: OK in 201 ms
Static non-generic S16: OK in 204 ms
Static void: OK in 207 ms
Instance non-generic: OK in 205 ms
Instance non-generic retbuf: OK in 220 ms
Abstract class non-generic: OK in 214 ms
Abstract class non-generic retbuf: OK in 206 ms
Interface non-generic: OK in 216 ms
Interface non-generic retbuf: OK in 206 ms
Static calli: OK in 209 ms
Static calli retbuf: OK in 206 ms
Instance calli: OK in 208 ms
Instance calli retbuf: OK in 211 ms
Static calli without args: OK in 5 ms
calli to an instance method on a value type: OK in 4 ms
calli to an instance method on a value type with explicit this: OK in 4 ms
Value type instance call: OK in 202 ms
Instance with GC: OK in 6 ms
Count up with heap int: OK in 224 ms
Count up with byref to heap: OK in 199 ms
Static generic string: OK in 8 ms
Static generic object: OK in 0 ms
Static generic int: OK in 3 ms
Static generic 2 string object: OK in 6 ms
Static generic 2 string int: OK in 5 ms
Static generic 1 string: OK in 5 ms
Static generic 1 object: OK in 0 ms
Static generic 1 int: OK in 4 ms
Static generic 0: OK in 3 ms
Instance generic 4: OK in 10 ms
Virtual instance generic 4: OK in 10 ms
Interface generic 4: OK in 11 ms
Interface generic forward G: OK in 7 ms
Interface generic 0: OK in 9 ms
Interface generic without generics on method: OK in 5 ms
Abstract generic with generic on method 1: OK in 3 ms
Abstract generic with generic on method 2: OK in 2 ms
Abstract generic without generic on method 1: OK in 1 ms
Abstract generic without generic on method 2: OK in 1 ms
Instantiating stub direct: OK in 875 ms
Virtual call where computing "this" has side effects: OK in 6 ms
All tailcall-via-help succeeded
Expected: 100
Actual: 100
END EXECUTION - PASSED
PASSED

I also validate that the F# program that was misbehaving originally in #45250 executes correctly now:

F:\echesako\Runtime_45250>%CORE_ROOT%\CoreRun.exe bin\Release\netcoreapp5.0\Runtime_45250.dll
u_tyar_constraints - entry
u_tyar_constraints - exit
u_list_revi - enter
u_tyar_constraint: 1
u_list_revi - exit

@@ -0,0 +1,10 @@
<Project Sdk="Microsoft.NET.Sdk.IL">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we cannot represent this tail call pattern in C#, would it make sense to use an fsproj instead of (or in addition to) ilproj? JIT test infra already support F# (common props, dependencies etc. are set), e.g.: https://github.com/dotnet/runtime/blob/fae35941e16310d815460475810f069578e6e774/src/tests/JIT/Directed/tailcall/mutual_recursion.fsproj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add the F# test, but I am not sure how much value the test would bring. We added many tests written in IL in the past that were designed to specifically expose some of the tail. call issues found during JIT stress testing.

The original F# program in #45250 was failing during execution of some internal F# function (invokeFast2) and it would be harder to debug that program and reason about the issue in comparison with the constructed minimal test repro.

@erozenfeld suggested to extend https://github.com/dotnet/runtime/blob/master/src/tests/JIT/Directed/tailcall/more_tailcalls.cs test suite and include the failing pattern in there, so I would say this should be our approach in order to increase test coverage for such tail call scenarios.

@echesakov echesakov changed the title Add regression test for #45250 Fix #45250 Dec 4, 2020
@echesakov echesakov force-pushed the Runtime_45250 branch 2 times, most recently from 37313cd to 6388f91 Compare December 8, 2020 04:44
@ViktorHofer
Copy link
Member

// Auto-generated message

69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts.

To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others.

@echesakov
Copy link
Contributor Author

The three failures in jitstress pipelines are #45326

@echesakov echesakov marked this pull request as ready for review December 9, 2020 18:44
@echesakov
Copy link
Contributor Author

@dotnet/jit-contrib This is ready for review, please take a look.

@JulieLeeMSFT We should consider back-porting this to 5.0 (see dotnet/fsharp#10454)

@JulieLeeMSFT
Copy link
Member

We should consider back-porting this to 5.0 (see dotnet/fsharp#10454)

Does this impact all architecture?
What is the risk level?

12/11 is code complete date for 5.0.2.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@echesakov
Copy link
Contributor Author

I ran jit-diff with pmi on framework libraries on win-x64.

There is a little point of comparing the changes in a "usual" way since fast tail calls would be used in most of the places instead of helper-based tail calls:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 51604912
Total bytes of diff: 51604910
Total bytes of delta: -2 (-0.00% of base)
    diff is an improvement.

Top file improvements (bytes):
          -2 : FSharp.Core.dasm (-0.00% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 267 unchanged.

Top method improvements (bytes):
          -1 (-0.78% of base) : FSharp.Core.dasm - action@647-12[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this (2 methods)
          -1 (-0.78% of base) : FSharp.Core.dasm - ContinueWithPostOrQueue@661[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this (2 methods)

Top method improvements (percentages):
          -1 (-0.78% of base) : FSharp.Core.dasm - action@647-12[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this (2 methods)
          -1 (-0.78% of base) : FSharp.Core.dasm - ContinueWithPostOrQueue@661[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this (2 methods)

2 total methods with Code Size differences (2 improved, 0 regressed), 340826 unchanged.
Found 1 files with textual diffs.

The saved 2 bytes of code size come from spilling this to a temp before the call to StoreArgsStub:

@@ -981831,33 +981832,33 @@ G_M35118_IG01:
 G_M35118_IG02:
 00000F mov      rdx, gword ptr [rcx+8]
 000013 add      rdx, 32
-000017 mov      r8, gword ptr [rdx]
-00001A mov      rsi, gword ptr [r8+8]
-00001E vmovupd  ymm0, ymmword ptr[rcx+16]
-000023 vmovupd  ymmword ptr[rsp+20H], ymm0
-000029 mov      rcx, gword ptr [rdx]
-00002C mov      rcx, gword ptr [rcx+8]
-000030 mov      rdx, 0xD1FFAB1E
-00003A mov      r8, 0xD1FFAB1E
-000044 call     CORINFO_HELP_VIRTUAL_FUNC_PTR
-000049 mov      r8, rax
-00004C lea      rdx, bword ptr [rsp+20H]
-000051 mov      rcx, rsi
-000054 call     ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,System.Numerics.Vector`1[Single],long)
-000059 lea      rcx, bword ptr [rsp+58H]
-00005E lea      r8, bword ptr [rsp+48H]
-000063 mov      rdx, 0xD1FFAB1E
-00006D call     System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
-000072 mov      rax, gword ptr [rsp+48H]
-						;; bbWeight=1    PerfScore 23.00
+000017 mov      rdx, gword ptr [rdx]
+00001A mov      rdx, gword ptr [rdx+8]
+00001E mov      rsi, rdx
+000021 vmovupd  ymm0, ymmword ptr[rcx+16]
+000026 vmovupd  ymmword ptr[rsp+20H], ymm0
+00002C mov      rcx, rdx
+00002F mov      rdx, 0xD1FFAB1E
+000039 mov      r8, 0xD1FFAB1E
+000043 call     CORINFO_HELP_VIRTUAL_FUNC_PTR
+000048 mov      r8, rax
+00004B lea      rdx, bword ptr [rsp+20H]
+000050 mov      rcx, rsi
+000053 call     ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,System.Numerics.Vector`1[Single],long)
+000058 lea      rcx, bword ptr [rsp+58H]
+00005D lea      r8, bword ptr [rsp+48H]
+000062 mov      rdx, 0xD1FFAB1E
+00006C call     System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
+000071 mov      rax, gword ptr [rsp+48H]
+						;; bbWeight=1    PerfScore 19.50
 G_M35118_IG03:
-000077 vzeroupper 
-00007A add      rsp, 80
-00007E pop      rsi
-00007F ret      
+000076 vzeroupper 
+000079 add      rsp, 80
+00007D pop      rsi
+00007E ret      
 						;; bbWeight=1    PerfScore 2.75
 
-; Total bytes of code 128, prolog size 15, PerfScore 42.25, instruction count 29 (MethodHash=e22976d1) for method action@647-12[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this
+; Total bytes of code 127, prolog size 15, PerfScore 38.65, instruction count 29 (MethodHash=e22976d1) for method action@647-12[Vector`1][System.Numerics.Vector`1[System.Single]]:Invoke(Microsoft.FSharp.Core.Unit):Microsoft.FSharp.Control.AsyncReturn:this

To have more reasonable results I set COMPlus_FastTailCalls=0 and re-ran the comparison


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 52264033
Total bytes of diff: 52262064
Total bytes of delta: -1969 (-0.00% of base)
    diff is an improvement.

Top file improvements (bytes):
       -1969 : FSharp.Core.dasm (-0.06% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 267 unchanged.

Top method improvements (bytes):
         -86 (-12.13% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long,System.__Canon):long (2 methods)
         -56 (-13.83% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon (2 methods)
         -49 (-8.52% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int16,Int64][System.Int16,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int16,__Canon],short,long,System.__Canon):long (2 methods)
         -48 (-8.41% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long,System.__Canon):long (2 methods)
         -48 (-8.41% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Byte,Int64][System.Byte,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Byte,__Canon],ubyte,long,System.__Canon):long (2 methods)
         -48 (-8.41% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int64,Int64][System.Int64,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int64,__Canon],long,long,System.__Canon):long (2 methods)
         -47 (-8.29% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int32,Int64][System.Int32,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int32,__Canon],int,long,System.__Canon):long (2 methods)
         -43 (-7.35% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Double,Int64][System.Double,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Double,__Canon],double,long,System.__Canon):long (2 methods)
         -25 (-7.67% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Byte,Int64][System.Byte,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Byte,__Canon],ubyte,long):System.__Canon (2 methods)
         -24 (-7.34% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int16,Int64][System.Int16,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int16,__Canon],short,long):System.__Canon (2 methods)
         -22 (-6.69% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Double,Int64][System.Double,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Double,__Canon],double,long):System.__Canon (2 methods)
         -21 (-3.12% of base) : FSharp.Core.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeFunction(System.Type,Microsoft.FSharp.Core.FSharpFunc`2[[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):System.Object (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Byte][System.Byte]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Int16][System.Int16]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Int32][System.Int32]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Double][System.Double]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Vector`1][System.Numerics.Vector`1[System.Single]]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Int64][System.Int64]:System.IDisposable.Dispose():this (2 methods)
         -21 (-5.74% of base) : FSharp.Core.dasm - convFunc@1150-1[__Canon][System.__Canon]:Invoke(System.__Canon,System.Type):System.String:this (2 methods)
         -19 (-5.94% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon (2 methods)

Top method improvements (percentages):
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Byte][System.Byte]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Int16][System.Int16]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Int32][System.Int32]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Double][System.Double]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Vector`1][System.Numerics.Vector`1[System.Single]]:System.IDisposable.Dispose():this (2 methods)
         -21 (-16.28% of base) : FSharp.Core.dasm - get_Publish@138-4[Int64][System.Int64]:System.IDisposable.Dispose():this (2 methods)
         -56 (-13.83% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon (2 methods)
         -86 (-12.13% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long,System.__Canon):long (2 methods)
         -49 (-8.52% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int16,Int64][System.Int16,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int16,__Canon],short,long,System.__Canon):long (2 methods)
         -48 (-8.41% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long,System.__Canon):long (2 methods)
         -48 (-8.41% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Byte,Int64][System.Byte,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Byte,__Canon],ubyte,long,System.__Canon):long (2 methods)
         -48 (-8.41% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int64,Int64][System.Int64,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int64,__Canon],long,long,System.__Canon):long (2 methods)
         -47 (-8.29% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int32,Int64][System.Int32,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int32,__Canon],int,long,System.__Canon):long (2 methods)
         -25 (-7.67% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Byte,Int64][System.Byte,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Byte,__Canon],ubyte,long):System.__Canon (2 methods)
         -43 (-7.35% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Double,Int64][System.Double,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Double,__Canon],double,long,System.__Canon):long (2 methods)
         -24 (-7.34% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int16,Int64][System.Int16,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int16,__Canon],short,long):System.__Canon (2 methods)
         -22 (-6.69% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Double,Int64][System.Double,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Double,__Canon],double,long):System.__Canon (2 methods)
         -19 (-5.94% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon (2 methods)
         -19 (-5.94% of base) : FSharp.Core.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int64,Int64][System.Int64,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int64,__Canon],long,long):System.__Canon (2 methods)
         -21 (-5.74% of base) : FSharp.Core.dasm - convFunc@1150-1[__Canon][System.__Canon]:Invoke(System.__Canon,System.Type):System.String:this (2 methods)

322 total methods with Code Size differences (322 improved, 0 regressed), 340506 unchanged.
Found 1 files with textual diffs.

An example of the code changes with disabled fast tail calls

@@ -24515,41 +24507,35 @@ G_M51115_IG08:
 0000C2 mov      rax, qword ptr [rdi]
 0000C5 mov      rax, qword ptr [rax+72]
 0000C9 call     gword ptr [rax+32]Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon][System.__Canon,System.__Canon]:Invoke(System.__Canon):System.__Canon:this
-0000CC mov      r14, rax
-0000CF mov      rcx, rdi
-0000D2 mov      rdx, rbx
-0000D5 mov      rax, qword ptr [rdi]
-0000D8 mov      rax, qword ptr [rax+72]
-0000DC call     gword ptr [rax+32]Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon][System.__Canon,System.__Canon]:Invoke(System.__Canon):System.__Canon:this
-0000DF mov      rdi, rax
-0000E2 mov      rcx, rsi
-0000E5 mov      rdx, 0xD1FFAB1E
-0000EF call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
-0000F4 mov      rdx, rax
-0000F7 mov      rcx, rdi
-0000FA mov      r8, 0xD1FFAB1E
-000104 call     CORINFO_HELP_VIRTUAL_FUNC_PTR
-000109 mov      r8, rax
-00010C mov      rcx, r14
-00010F mov      rdx, rbp
-000112 call     ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,long,long)
-000117 lea      rcx, bword ptr [rsp+68H]
-00011C lea      r8, bword ptr [rsp+28H]
-000121 mov      rdx, 0xD1FFAB1E
-00012B call     System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
-000130 mov      rax, gword ptr [rsp+28H]
-						;; bbWeight=0.50 PerfScore 11.88
+0000CC mov      rdi, rax
+0000CF mov      rcx, rsi
+0000D2 mov      rdx, 0xD1FFAB1E
+0000DC call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
+0000E1 mov      rdx, rax
+0000E4 mov      rcx, rdi
+0000E7 mov      r8, 0xD1FFAB1E
+0000F1 call     CORINFO_HELP_VIRTUAL_FUNC_PTR
+0000F6 mov      r8, rax
+0000F9 mov      rdx, rbp
+0000FC mov      rcx, rdi
+0000FF call     ILStubClass:IL_STUB_StoreTailCallArgs(System.Object,long,long)
+000104 lea      rcx, bword ptr [rsp+68H]
+000109 lea      r8, bword ptr [rsp+28H]
+00010E mov      rdx, 0xD1FFAB1E
+000118 call     System.Runtime.CompilerServices.RuntimeHelpers:DispatchTailCalls(long,long,long)
+00011D mov      rax, gword ptr [rsp+28H]
+						;; bbWeight=0.50 PerfScore 8.00
 G_M51115_IG09:
-000135 add      rsp, 64
-000139 pop      rbx
-00013A pop      rbp
-00013B pop      rsi
-00013C pop      rdi
-00013D pop      r14
-00013F ret      
+000122 add      rsp, 64
+000126 pop      rbx
+000127 pop      rbp
+000128 pop      rsi
+000129 pop      rdi
+00012A pop      r14
+00012C ret      
 						;; bbWeight=0.50 PerfScore 1.88
 
-; Total bytes of code 320, prolog size 27, PerfScore 70.31, instruction count 87 (MethodHash=52ff3854) for method Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon
+; Total bytes of code 301, prolog size 27, PerfScore 64.54, instruction count 81 (MethodHash=52ff3854) for method Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon

The second call to Invoke is removed and the resuls of the first call is stored in rdi.

I attached the files with collected diffs.

normal.txt
noFastTailCalls.txt

@echesakov
Copy link
Contributor Author

echesakov commented Dec 9, 2020

@JulieLeeMSFT Please find the answers below:

Does this impact all architecture?

Yes, although it has more chances to fail on arm32 since the platform does not support fast tail calls. In other words, there is no alternative (and usually more preferred) way of transforming tail prefixed calls on that platform. It is also true on x86 - but we use a different tail call helper mechanism on that platform (except a case when localloc is used in the caller and the JIT switches to portable helper-based tail calls). You can also see a proof that this would impact all the platforms by observing that the added test has failed everywhere.

I am in the middle of collecting the code diffs for win-arm32. If the diffs will turn out to be the same as what I saw on win-x64 with disabled fast tail calls then this would prove that FSharp.Core.dll will work incorrectly on that platform in 5.0.

What is the risk level?

Moderate, due to the difficulty of triggering the failing pattern in the testing. Even with the added tests we might've missed some other edge cases.

12/11 is code complete date for 5.0.2.

Just to confirm my understanding of what that means - this is the date when the changes must go in to release/5.0 branch?

@AndyAyersMS
Copy link
Member

What is the risk level?

Moderate, due to the difficulty of triggering the failing pattern in the testing. Even with the added tests we might've missed some other edge cases.

I don't see this change as particularly risky, and it is fixing a serious codegen bug that is a 5.0 regression. Seems like an obvious candidate for porting to 5.0.

@erozenfeld
Copy link
Member

Does this impact all architecture?

Yes, although it has more chances to fail on arm32 since the platform does not support fast tail calls.

Just to be pedantic: this doesn't affect x86 since we use a different helper call mechanism on that architecture.

Copy link
Member

@erozenfeld erozenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
@echesakov
Copy link
Contributor Author

Just to be pedantic: this doesn't affect x86 since we use a different helper call mechanism on that architecture.

I thought this way as well until my test has failed on x86 and I discovered that

runtime/src/coreclr/jit/morph.cpp

Lines 18980 to 18992 in 8e29c00

bool Compiler::fgCanTailCallViaJitHelper()
{
#ifndef TARGET_X86
// On anything except X86 we have no faster mechanism available.
return false;
#else
// The JIT helper does not properly handle the case where localloc was used.
if (compLocallocUsed)
return false;
return true;
#endif
}

the JIT switches to portable helpers on x86 when localloc is used in the caller and localloc was used in the test to prevent fast tail calls.

@echesakov
Copy link
Contributor Author

echesakov commented Dec 9, 2020

I finished collecting the diffs on win-arm. All the code changes occur in FSharp.Core.dll:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 43520
Total bytes of diff: 43118
Total bytes of delta: -402 (-0.92% of base)
    diff is an improvement.

Top file regressions (bytes):
           2 : 1494.dasm (0.93% of base)
           2 : 1479.dasm (1.25% of base)
           2 : 1481.dasm (0.49% of base)
           2 : 1501.dasm (0.85% of base)

Top file improvements (bytes):
         -24 : 530.dasm (-9.16% of base)
         -20 : 1486.dasm (-2.99% of base)
         -20 : 485.dasm (-8.77% of base)
         -20 : 469.dasm (-8.77% of base)
         -20 : 6862.dasm (-5.78% of base)
         -18 : 5862.dasm (-14.52% of base)
         -18 : 1974.dasm (-0.13% of base)
         -16 : 3454.dasm (-4.40% of base)
         -16 : 516.dasm (-6.90% of base)
         -16 : 544.dasm (-6.90% of base)
         -14 : 501.dasm (-6.31% of base)
         -14 : 435.dasm (-6.31% of base)
         -14 : 1487.dasm (-6.54% of base)
         -12 : 5860.dasm (-10.71% of base)
         -10 : 5602.dasm (-2.21% of base)
         -10 : 1488.dasm (-2.31% of base)
         -10 : 5600.dasm (-2.37% of base)
         -10 : 9631.dasm (-8.77% of base)
         -10 : 4959.dasm (-2.37% of base)
         -10 : 4961.dasm (-2.21% of base)

72 total files with Code Size differences (68 improved, 4 regressed), 130 unchanged.

Top method regressions (bytes):
           2 ( 0.93% of base) : 1494.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeUnion(Microsoft.FSharp.Reflection.UnionCaseInfo,System.Object[],Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object
           2 ( 1.25% of base) : 1479.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeRecord(System.Type,System.Object[],Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object
           2 ( 0.49% of base) : 1481.dasm - Microsoft.FSharp.Reflection.FSharpValue:GetRecordFields(System.Object,Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object[]
           2 ( 0.85% of base) : 1501.dasm - Microsoft.FSharp.Reflection.FSharpValue:GetExceptionFields(System.Object,Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object[]

Top method improvements (bytes):
         -24 (-9.16% of base) : 530.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon
         -20 (-2.99% of base) : 1486.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeFunction(System.Type,Microsoft.FSharp.Core.FSharpFunc`2[[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):System.Object
         -20 (-8.77% of base) : 485.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int16,Int64][System.Int16,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int16,__Canon],short,long):System.__Canon
         -20 (-8.77% of base) : 469.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Byte,Int64][System.Byte,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Byte,__Canon],ubyte,long):System.__Canon
         -20 (-5.78% of base) : 6862.dasm - convFunc@1150-1[__Canon][System.__Canon]:Invoke(System.__Canon,System.Type):System.String:this
         -18 (-14.52% of base) : 5862.dasm - get_Publish@138-4[Byte][System.Byte]:System.IDisposable.Dispose():this
         -18 (-0.13% of base) : 1974.dasm - Microsoft.FSharp.Quotations.FSharpExpr:GetLayout(bool):Microsoft.FSharp.Text.StructuredPrintfImpl.Layout:this
         -16 (-4.40% of base) : 3454.dasm - Microsoft.FSharp.Linq.RuntimeHelpers.Adapters:RewriteTupleType(System.Type,Microsoft.FSharp.Core.FSharpFunc`2[[Microsoft.FSharp.Collections.FSharpList`1[[System.Type, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], FSharp.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a],[Microsoft.FSharp.Collections.FSharpList`1[[System.Type, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], FSharp.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):System.Type
         -16 (-6.90% of base) : 516.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Double,Int64][System.Double,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Double,__Canon],double,long):System.__Canon
         -16 (-6.90% of base) : 544.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int64,Int64][System.Int64,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int64,__Canon],long,long):System.__Canon
         -14 (-6.31% of base) : 501.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int32,Int64][System.Int32,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int32,__Canon],int,long):System.__Canon
         -14 (-6.31% of base) : 435.dasm - Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon
         -14 (-6.54% of base) : 1487.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeTuple(System.Object[],System.Type):System.Object
         -12 (-10.71% of base) : 5860.dasm - get_Publish@138-4[__Canon][System.__Canon]:System.IDisposable.Dispose():this
         -10 (-2.21% of base) : 5602.dasm - BindResult@1471[Byte][System.Byte]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[Byte]):Microsoft.FSharp.Control.AsyncReturn:this
         -10 (-2.31% of base) : 1488.dasm - Microsoft.FSharp.Reflection.FSharpValue:GetTupleFields(System.Object):System.Object[]
         -10 (-2.37% of base) : 5600.dasm - BindResult@1471[__Canon][System.__Canon]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[__Canon]):Microsoft.FSharp.Control.AsyncReturn:this
         -10 (-8.77% of base) : 9631.dasm - mkDelayedSeq@471[__Canon][System.__Canon]:Invoke(Microsoft.FSharp.Core.Unit):System.Collections.Generic.IEnumerator`1[__Canon]:this
         -10 (-2.37% of base) : 4959.dasm - CreateAsyncResultAsync@511[__Canon][System.__Canon]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[__Canon]):Microsoft.FSharp.Control.AsyncReturn:this
         -10 (-2.21% of base) : 4961.dasm - CreateAsyncResultAsync@511[Byte][System.Byte]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[Byte]):Microsoft.FSharp.Control.AsyncReturn:this

Top method regressions (percentages):
           2 ( 1.25% of base) : 1479.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeRecord(System.Type,System.Object[],Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object
           2 ( 0.93% of base) : 1494.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeUnion(Microsoft.FSharp.Reflection.UnionCaseInfo,System.Object[],Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object
           2 ( 0.85% of base) : 1501.dasm - Microsoft.FSharp.Reflection.FSharpValue:GetExceptionFields(System.Object,Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object[]
           2 ( 0.49% of base) : 1481.dasm - Microsoft.FSharp.Reflection.FSharpValue:GetRecordFields(System.Object,Microsoft.FSharp.Core.FSharpOption`1[BindingFlags]):System.Object[]

Top method improvements (percentages):
         -18 (-14.52% of base) : 5862.dasm - get_Publish@138-4[Byte][System.Byte]:System.IDisposable.Dispose():this
         -12 (-10.71% of base) : 5860.dasm - get_Publish@138-4[__Canon][System.__Canon]:System.IDisposable.Dispose():this
         -24 (-9.16% of base) : 530.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon
         -10 (-8.77% of base) : 9631.dasm - mkDelayedSeq@471[__Canon][System.__Canon]:Invoke(Microsoft.FSharp.Core.Unit):System.Collections.Generic.IEnumerator`1[__Canon]:this
         -20 (-8.77% of base) : 485.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int16,Int64][System.Int16,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int16,__Canon],short,long):System.__Canon
         -20 (-8.77% of base) : 469.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Byte,Int64][System.Byte,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Byte,__Canon],ubyte,long):System.__Canon
         -16 (-6.90% of base) : 516.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Double,Int64][System.Double,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Double,__Canon],double,long):System.__Canon
         -16 (-6.90% of base) : 544.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int64,Int64][System.Int64,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int64,__Canon],long,long):System.__Canon
         -14 (-6.54% of base) : 1487.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeTuple(System.Object[],System.Type):System.Object
         -14 (-6.31% of base) : 501.dasm - Microsoft.FSharp.Core.FSharpFunc`2[Int32,Int64][System.Int32,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Int32,__Canon],int,long):System.__Canon
         -14 (-6.31% of base) : 435.dasm - Microsoft.FSharp.Core.FSharpFunc`2[__Canon,Int64][System.__Canon,System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[__Canon,__Canon],System.__Canon,long):System.__Canon
         -20 (-5.78% of base) : 6862.dasm - convFunc@1150-1[__Canon][System.__Canon]:Invoke(System.__Canon,System.Type):System.String:this
         -16 (-4.40% of base) : 3454.dasm - Microsoft.FSharp.Linq.RuntimeHelpers.Adapters:RewriteTupleType(System.Type,Microsoft.FSharp.Core.FSharpFunc`2[[Microsoft.FSharp.Collections.FSharpList`1[[System.Type, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], FSharp.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a],[Microsoft.FSharp.Collections.FSharpList`1[[System.Type, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], FSharp.Core, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]):System.Type
          -4 (-3.51% of base) : 2143.dasm - Microsoft.FSharp.Control.AsyncActivation`1[Vector`1][System.Numerics.Vector`1[System.Single]]:CallContinuation(System.Numerics.Vector`1[Single]):Microsoft.FSharp.Control.AsyncReturn:this
          -2 (-3.23% of base) : 6599.dasm - expr@244:Invoke(Microsoft.FSharp.Quotations.FSharpExpr):Microsoft.FSharp.Text.StructuredPrintfImpl.Layout:this
         -20 (-2.99% of base) : 1486.dasm - Microsoft.FSharp.Reflection.FSharpValue:MakeFunction(System.Type,Microsoft.FSharp.Core.FSharpFunc`2[[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Object, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]):System.Object
         -10 (-2.37% of base) : 5600.dasm - BindResult@1471[__Canon][System.__Canon]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[__Canon]):Microsoft.FSharp.Control.AsyncReturn:this
         -10 (-2.37% of base) : 4959.dasm - CreateAsyncResultAsync@511[__Canon][System.__Canon]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[__Canon]):Microsoft.FSharp.Control.AsyncReturn:this
         -10 (-2.31% of base) : 1488.dasm - Microsoft.FSharp.Reflection.FSharpValue:GetTupleFields(System.Object):System.Object[]
         -10 (-2.21% of base) : 5602.dasm - BindResult@1471[Byte][System.Byte]:Invoke(Microsoft.FSharp.Control.AsyncActivation`1[Byte]):Microsoft.FSharp.Control.AsyncReturn:this

72 total methods with Code Size differences (68 improved, 4 regressed), 130 unchanged.
Found 202 files with textual diffs.

The PR does fix a regression with calling a function twice

diff --git "a/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\base\\530.dasm" "b/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\diff\\530.dasm"
index ca9882b2331..5ecbf52374b 100644
--- "a/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\base\\530.dasm"
+++ "b/C:\\echesako\\git\\runtime\\artifacts\\spmi\\asm.windows.arm.Checked\\diff\\530.dasm"
@@ -5,24 +5,22 @@
 ; partially interruptible
 ; Final local variable assignments
 ;
-;  V00 TypeCtx      [V00,T02] (  6,  5   )     int  ->   r5        
-;  V01 arg0         [V01,T01] (  7,  5   )     ref  ->   r4         class-hnd
-;  V02 arg1         [V02,T00] ( 14,  8   )  struct (16) [sp+0x40]   do-not-enreg[SFA] multireg-arg double-align
+;  V00 TypeCtx      [V00,T01] (  6,  5   )     int  ->   r4        
+;  V01 arg0         [V01,T02] (  5,  4   )     ref  ->   r5         class-hnd
+;  V02 arg1         [V02,T00] ( 10,  6   )  struct (16) [sp+0x40]   do-not-enreg[SFA] multireg-arg double-align
 ;  V03 arg2         [V03    ] (  4,  2   )    long  ->  [sp+0x50]  
 ;  V04 loc0         [V04,T03] (  4,  3   )     ref  ->   r6         class-hnd
 ;* V05 loc1         [V05    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
 ;  V06 OutArgs      [V06    ] (  1,  1   )  lclBlk (20) [sp+0x00]   "OutgoingArgSpace"
 ;  V07 tmp1         [V07    ] (  2,  1   )     ref  ->  [sp+0x20]   do-not-enreg[X] must-init addr-exposed "Return value for tail call dispatcher"
 ;  V08 ReturnAddress[V08    ] (  2,  1   )     int  ->  [sp+0x3C]   do-not-enreg[X] addr-exposed "Return address"
-;  V09 tmp3         [V09,T06] (  2,  2   )     int  ->   r0         "argument with side effect"
+;  V09 tmp3         [V09,T05] (  2,  2   )     int  ->   r0         "argument with side effect"
 ;  V10 tmp4         [V10    ] (  2,  1   )     ref  ->  [sp+0x1C]   do-not-enreg[X] must-init addr-exposed "Return value for tail call dispatcher"
-;  V11 tmp5         [V11,T04] (  2,  2   )     ref  ->   r0         "argument with side effect"
-;  V12 tmp6         [V12,T05] (  2,  2   )     ref  ->   r6         "argument with side effect"
-;  V13 tmp7         [V13,T07] (  2,  2   )     int  ->  [sp+0x18]   "argument with side effect"
-;  V14 cse0         [V14,T08] (  2,  1   )     ref  ->   r0         "CSE - moderate"
-;  V15 cse1         [V15,T09] (  2,  1   )     ref  ->   r0         "CSE - moderate"
-;  V16 rat0         [V16,T10] (  2,  1   )     int  ->  [sp+0x50]   do-not-enreg[] V03.lo(offs=0x00) "field V03.lo (fldOffset=0x0)"
-;  V17 rat1         [V17,T11] (  2,  1   )     int  ->  [sp+0x54]   do-not-enreg[] V03.hi(offs=0x04) "field V03.hi (fldOffset=0x4)"
+;  V11 tmp5         [V11,T04] (  3,  3   )     ref  ->   r5         "tail call thisptr"
+;  V12 tmp6         [V12,T06] (  2,  2   )     int  ->   r0         "argument with side effect"
+;  V13 cse0         [V13,T07] (  2,  1   )     ref  ->   r0         "CSE - moderate"
+;  V14 rat0         [V14,T08] (  2,  1   )     int  ->  [sp+0x50]   do-not-enreg[] V03.lo(offs=0x00) "field V03.lo (fldOffset=0x0)"
+;  V15 rat1         [V15,T09] (  2,  1   )     int  ->  [sp+0x54]   do-not-enreg[] V03.hi(offs=0x04) "field V03.hi (fldOffset=0x4)"
 ;
 ; Lcl frame size = 44
 
@@ -35,12 +33,12 @@ G_M14368_IG01:
             str     r2, [sp+0x20]	// [V07 tmp1]
             str     r2, [sp+0x1c]	// [V10 tmp4]
             str     r0, [r11-0x14]
-            mov     r5, r0
-            mov     r4, r1
+            mov     r4, r0
+            mov     r5, r1
 						;; bbWeight=1    PerfScore 10.00
 G_M14368_IG02:
-            mov     r0, r5
-            mov     r1, r4
+            mov     r0, r4
+            mov     r1, r5
             movw    r3, 0xd1ff
             movt    r3, 0xd1ff
             blx     r3		// CORINFO_HELP_ISINSTANCEOFCLASS
@@ -50,7 +48,7 @@ G_M14368_IG02:
 						;; bbWeight=1    PerfScore 8.00
 G_M14368_IG03:
             mov     r0, r6
-            mov     r1, r5
+            mov     r1, r4
             movw    r2, 0xd1ff
             movt    r2, 0xd1ff
             movw    r3, 0xd1ff
@@ -94,35 +92,23 @@ G_M14368_IG05:
             ldr     r1, [sp+0x4c]	// [V02 arg1+0x0c]
             str     r0, [sp]	// [V06 OutArgs]
             str     r1, [sp+0x04]	// [V06 OutArgs+0x04]
-            mov     r0, r4
-            ldr     r1, [r4]
-            ldr     r1, [r1+44]
-            ldr     r1, [r1+16]
-            blx     r1		// hackishModuleName:hackishMethodName(System.Numerics.Vector`1[Single]):System.__Canon:this
-            mov     r6, r0
-            ldr     r2, [sp+0x40]	// [V02 arg1]
-            ldr     r3, [sp+0x44]	// [V02 arg1+0x04]
-            ldr     r0, [sp+0x48]	// [V02 arg1+0x08]
-            ldr     r1, [sp+0x4c]	// [V02 arg1+0x0c]
-            str     r0, [sp]	// [V06 OutArgs]
-            str     r1, [sp+0x04]	// [V06 OutArgs+0x04]
-            mov     r0, r4
-            ldr     r1, [r4]
+            mov     r0, r5
+            ldr     r1, [r5]
             ldr     r1, [r1+44]
             ldr     r1, [r1+16]
             blx     r1		// hackishModuleName:hackishMethodName(System.Numerics.Vector`1[Single]):System.__Canon:this
-            mov     r1, r5
+            mov     r5, r0
+            mov     r0, r5
+            mov     r1, r4
             movw    r2, 0xd1ff
             movt    r2, 0xd1ff
             movw    r3, 0xd1ff
             movt    r3, 0xd1ff
             blx     r3		// CORINFO_HELP_VIRTUAL_FUNC_PTR
-            str     r0, [sp+0x18]	// [V13 tmp7]
-            mov     r0, r6
-            ldr     r2, [sp+0x18]	// [V13 tmp7]
-            str     r2, [sp]	// [V06 OutArgs]
-            ldr     r2, [sp+0x50]	// [V16 rat0]
-            ldr     r3, [sp+0x54]	// [V17 rat1]
+            str     r0, [sp]	// [V06 OutArgs]
+            ldr     r2, [sp+0x50]	// [V14 rat0]
+            ldr     r3, [sp+0x54]	// [V15 rat1]
+            mov     r0, r5
             movw    r1, 0xd1ff
             movt    r1, 0xd1ff
             ldr     r1, [r1]
@@ -135,7 +121,7 @@ G_M14368_IG05:
             movt    r3, 0xd1ff
             blx     r3		// hackishModuleName:hackishMethodName()
             ldr     r0, [sp+0x1c]	// [V10 tmp4]
-						;; bbWeight=0.50 PerfScore 23.50
+						;; bbWeight=0.50 PerfScore 17.50
 G_M14368_IG06:
             add     sp, 44
             pop     {r4,r5,r6,r11,lr}
@@ -143,7 +129,7 @@ G_M14368_IG06:
             bx      lr
 						;; bbWeight=0.50 PerfScore 2.00
 
-; Total bytes of code 262, prolog size 22, PerfScore 87.20, instruction count 104 (MethodHash=8524c7df) for method Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon
+; Total bytes of code 238, prolog size 22, PerfScore 78.80, instruction count 92 (MethodHash=8524c7df) for method Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:InvokeFast(Microsoft.FSharp.Core.FSharpFunc`2[Vector`1,__Canon],System.Numerics.Vector`1[Single],long):System.__Canon
 ; ============================================================
 
 Unwind Info:
@@ -155,7 +141,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 131 (0x00083) Actual length = 262 (0x000106)
+  Function Length   : 119 (0x00077) Actual length = 238 (0x0000ee)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)

@echesakov
Copy link
Contributor Author

I spent last couple hours trying to create a test where a virtual tail call would require null check.
I wasn't able to construct such example. If the conditions are mutually exclusive the transformation could be simplified slightly.
However I decided not to change the logic and add a TODO-Review comment instead in case we decide to re-visit the implementation later. I don't think this should block fixing the issue for 5.0.

@echesakov echesakov changed the title Fix #45250 Fix regression with tail.callvirt transformation to helper call Dec 10, 2020
@echesakov echesakov merged commit 42b685c into dotnet:master Dec 10, 2020
@echesakov echesakov deleted the Runtime_45250 branch December 10, 2020 04:40
@JulieLeeMSFT
Copy link
Member

Just to confirm my understanding of what that means - this is the date when the changes must go in to release/5.0 branch?

That’s what I understand.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Jit codegen for ARM32 produces incorrect behaviour
6 participants