Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Add VSD additional param for DIRECT calls. #16267

Merged
merged 4 commits into from
Feb 14, 2018

Conversation

sandreenko
Copy link

@sandreenko sandreenko commented Feb 8, 2018

The discussion was in #15910.
The PR was reverted because of #16201.

Fix #15150.

Sergey Andreenko added 3 commits February 6, 2018 10:55
expect many failures for arm in lower because we delete: // Change the call type, so we can add the extra indirection here, rather than in codegen
@sandreenko sandreenko added bug Product bug (most likely) area-CodeGen labels Feb 8, 2018
@sandreenko
Copy link
Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

@sandreenko
Copy link
Author

sandreenko commented Feb 8, 2018

A job that shows the regression in System.Configuration.ConfigurationManager.Tests.

@sandreenko
Copy link
Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

@sandreenko
Copy link
Author

sandreenko commented Feb 9, 2018

Performance and CQ results:
benchview did not show any impact of PR that was reverted;

CoreCLR jit-diff and spmi show code size regressions that are really improvements(I am not sure which call is faster), before:
call qword ptr [r11]System.Collections.Immutable.ImmutableDictionary2[__Canon,__Canon][System.__Canon,System.__Canon]:ContainsKey(ref):bool:this
after:
call [System.Collections.Immutable.ImmutableDictionary2[__Canon,__Canon][System.__Canon,System.__Canon]:ContainsKey(ref):bool:this]

because gtControlExpr is considered as contained after this fix, so it returns true for target->AsIndir()->Base()->isContainedIntOrIImmed() now and codegenxarch does an additional optimization in genCallInstruction, but the code size is bigger because we need an additional PC-relative offset.

IR before:

Generating: N2636 (  3, 10) [002054] ------------      t2054 =    CNS_INT(h) long   0x1f5b5e18998 ftn REG r11
IN0324:        lea      r11, [(reloc 0x1f5b5e18998)]
                                                             /--*  t2054  long
Generating: N2638 (  5, 12) [002055] -c----------      t2055 = *  IND       long   REG NA
                                                             /--*  t2051  ref    arg1 in rdx
                                                             +--*  t2052  ref    this in rcx
                                                             +--*  t2053  int    arg2 in r8
                                                             +--*  t2055  long   control expr
Generating: N2640 ( 54, 38) [000526] --CXG-------              *  CALLV stub void   System.Collections.Generic.ICollection`1[__Canon][System.__Canon].CopyTo
                                                        GC regs: 00000006 {rcx rdx} => 00000002 {rcx}
                                                        GC regs: 00000002 {rcx} => 00000000 {}
IN0325:        cmp      dword ptr [rcx], ecx
                                                        Call: GCvars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}
Added IP mapping: 0x04E6 CALL_INSTRUCTION (G_M38387_IG74,ins#25,ofs#138)
IN0326:        call     qword ptr [r11]System.Collections.Generic.ICollection`1[__Canon][System.__Canon]:CopyTo(ref,int):this

IR after:

Generating: N2648 (  3, 10) [001836] ------------      t1836 =    CNS_INT(h) long   0x143e35c8998 ftn REG r11
IN0323:        lea      r11, [(reloc 0x143e35c8998)]
                                                             /--*  t1836  long
Generating: N2650 (???,???) [002077] ------------      t2077 = *  PUTARG_REG long   REG r11
Generating: N2652 (  1,  1) [000525] ------------       t525 =    CNS_INT   int    0 REG r8
IN0324:        xor      r8d, r8d
                                                             /--*  t525   int
Generating: N2654 (???,???) [002078] ------------      t2078 = *  PUTARG_REG int    REG r8
Generating: N2656 (  3, 10) [002079] -c----------      t2079 =    CNS_INT(h) long   0x143e35c8998 ftn REG NA
                                                             /--*  t2079  long
Generating: N2658 (  5, 12) [002080] -c----------      t2080 = *  IND       long   REG NA
                                                             /--*  t2075  ref    arg2 in rdx
                                                             +--*  t2076  ref    this in rcx
                                                             +--*  t2077  long   arg1 in r11
                                                             +--*  t2078  int    arg3 in r8
                                                             +--*  t2080  long   control expr
Generating: N2660 ( 57, 49) [000526] --CXG-------              *  CALLV stub void   System.Collections.Generic.ICollection`1[__Canon][System.__Canon].CopyTo
                                                        GC regs: 00000006 {rcx rdx} => 00000002 {rcx}
                                                        GC regs: 00000002 {rcx} => 00000000 {}
IN0325:        cmp      dword ptr [rcx], ecx
                                                        Call: GCvars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}
Added IP mapping: 0x04E6 CALL_INSTRUCTION (G_M38387_IG74,ins#25,ofs#141)
IN0326:        call     [System.Collections.Generic.ICollection`1[__Canon][System.__Canon]:CopyTo(ref,int):this]

The important part is that long control expr is marked as contained after the fix.

Also superPmi for desktop shows some other small improvements, for example:

--       mov      rdx, gword ptr [rbx]
--       mov      r8, rdx
--       mov      r11d, dword ptr [r8+8]
++       mov      r8, gword ptr [rbx]
++       mov      rdx, r8
++       mov      r11d, dword ptr [rdx+8]
         cmp      esi, r11d
         jae      G_M11280_IG06
         movsxd   rax, esi
--       mov      r8, gword ptr [r8+8*rax+16]
++       mov      rdx, gword ptr [rdx+8*rax+16]
         cmp      edi, r11d
         jae      G_M11280_IG06
         movsxd   r11, edi
--       mov      r11, gword ptr [rdx+8*r11+16]
--       mov      rdx, r8
--       mov      r8, r11
++       mov      r8, gword ptr [r8+8*r11+16]
         lea      r11, [(reloc)]
         cmp      dword ptr [rcx], ecx
--       call     qword ptr [r11]hackishModuleName:hackishMethodName()
++       call     [hackishModuleName:hackishMethodName()]

so we save 2 moves there.

For big offsets, we still have an additional mov and gtControlExpr is not marked as contained, but I saw such examples only with desktop libraries:

       mov      r11, 0xD1FFAB1E
++     mov      rax, 0xD1FFAB1E
       cmp      dword ptr [rcx], ecx
--     call     gword ptr [r11]hackishModuleName:hackishMethodName():ref:this
++     call     gword ptr [rax]hackishModuleName:hackishMethodName():ref:this

@sandreenko
Copy link
Author

PTAL @dotnet/jit-contrib

@sandreenko
Copy link
Author

Also want to notice that our NonStandardArgs need to be rewritten because:

  1. for tail call we add VSD arguments in fgMorphTailCall and in fgMorphArgs they are not considered as non-standard, nobody knows how it affects us;
  2. nonStandardArgs array has reg field, that is useless, because GenTree* has gtRegNum;
  3. Size of nonStandardArgs is not connected with GetNonStandardAddedArgCount (it is why the regression appeared);
  4. We spend time doing linear seach in nonStandardArgs array to replace morphed arguments;
  5. We use nonStandardArgs only once to determinate is current argument standart or not ( isNonStandard = nonStandardArgs.FindReg(argx, &nonStdRegNum);) that we can do much faster, for example if we keep indexes of nonStandart args ( vector<bool> isThisIndexNonStandart;)

but it will be much easier to change when we delete legacy_backend.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreCLR jit-diff and spmi show code size regressions that are really improvements(I am not sure which call is faster), before:
call qword ptr [r11]System.Collections.Immutable.ImmutableDictionary2[__Canon,__Canon][System.__Canon,System.__Canon]:ContainsKey(ref):bool:this
after:
call [System.Collections.Immutable.ImmutableDictionary2[__Canon,__Canon][System.__Canon,System.__Canon]:ContainsKey(ref):bool:this]

Presumably (?) the old version is smaller/faster, since call [r11] is smaller than call [rel32], and r11 has already been loaded up. Plus, the new version needs 2 relocs, one for the lea and one for the call. I'm not sure if this is bad or not.

// R11 = PInvoke cookie param
return 2;
}
// R11 = Virtual stub param
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit: it bothers me that registers are given in the comments, but I don't know which platform (or platforms) the registers refer to.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to change comments there in a separate PR.

@sandreenko sandreenko merged commit 3dd6cf1 into dotnet:master Feb 14, 2018
@sandreenko sandreenko deleted the armR2Rtest branch November 2, 2018 22:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen bug Product bug (most likely)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants