Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiered JIT: redundant compilations #76402

Open
EgorBo opened this issue Sep 29, 2022 · 10 comments
Open

Tiered JIT: redundant compilations #76402

EgorBo opened this issue Sep 29, 2022 · 10 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Sep 29, 2022

To record @AndyAyersMS's thoughts I came up with a quick repro:

public class Program
{
    public static void Main()
    {
        for (int i = 0; i < 100; i++)
        {
            // Promote Test to Tier1
            Test();
            Thread.Sleep(16);
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static int Test()
    {
        return Property;
    }

    private static int Property => 42;
}

Run this code with DOTNET_JitDisasmSummary=1 on .NET 7.0 RC1 and it's going to print:

   ...
   4: JIT compiled Program:Main() [Tier0, IL size=27, code size=94]
   5: JIT compiled Program:Test():int [Tier0, IL size=6, code size=23]
   6: JIT compiled Program:get_Property():int [Tier0, IL size=3, code size=11]
   7: JIT compiled Program:Test():int [Tier1, IL size=6, code size=6]
   8: JIT compiled Program:get_Property():int [Tier1, IL size=3, code size=6]

get_Property was compiled twice (Tier0 and Tier1) despite the fact it's super trivial (like e.g. any auto-property) - only 6 bytes of IL and we wasted some time on it.

We should consider allowing inlining for very small methods in Tier0, potentially, this might even improve JIT's TP because call IR nodes are slow to process. Only if they're small and don't contain control-flow

category:cq
theme:tiering
skill-level:expert
cost:large
impact:medium

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 29, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 29, 2022
@ghost
Copy link

ghost commented Sep 29, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

To record @AndyAyersMS's thoughts I came up with a quick repro:

public class Program
{
    public static void Main()
    {
        for (int i = 0; i < 100; i++)
        {
            Test();
            Thread.Sleep(16);
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static int Test()
    {
        return Property;
    }

    private static int Property => 42;
}

Run this code with DOTNET_JitDisasmSummary=1 on .NET 7.0 RC1 and it's going to print:

   ...
   4: JIT compiled Program:Main() [Tier0, IL size=27, code size=94]
   5: JIT compiled Program:Test():int [Tier0, IL size=6, code size=23]
   6: JIT compiled Program:get_Property():int [Tier0, IL size=3, code size=11]
   7: JIT compiled Program:Test():int [Tier1, IL size=6, code size=6]
   8: JIT compiled Program:get_Property():int [Tier1, IL size=3, code size=6]

get_Property was compiled twice (Tier0 and Tier1) despite the fact it's super trivial (like e.g. any auto-property) and we wasted some time on it.

As a quick solution we should consider inlining for very small calls in Tier0, potentially, this might even improve JIT's TP because call IR nodes are slow to process.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added this to the 8.0.0 milestone Sep 29, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 29, 2022
@EgorBo EgorBo self-assigned this Sep 29, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Sep 29, 2022

AvaloniaILSpy app, R2R=0, TC=1:

More than 3000 methods made it to Tier1 with IL<= 8 bytes

@EgorBo
Copy link
Member Author

EgorBo commented Sep 29, 2022

In fact, most of the Tier1 compilations in that app (~11500k methods made it to Tier1) are quite small:
image

Histogram: X axis is IL size

@EgorBo
Copy link
Member Author

EgorBo commented Sep 30, 2022

Potential easy fix: Increase call-counting threshold for methods below 16 bytes (e.g. 30 -> 100) on the VM side.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 1, 2022

Did a quick prototype:

  1. Inline only <=8 bytes of IL
  2. Give up if an inlinee has any control flow (branches/switches)
  3. Experimented with other limitations like max inlining depth, number of locals
  4. Introduced a new VM API to get method IL size quickly (cached via hashtable)

The number of compilation reduced by 3000 but the start up time slightly regressed any way 😢

@EgorBo
Copy link
Member Author

EgorBo commented Oct 11, 2022

BingSNR:

image

Most popular IL size of methods is 5-6 bytes of IL.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 11, 2022

so we emit thousands of redundant call-counting stubs/precodes/methods

19% of all methods made it to Tier1

@EgorBo
Copy link
Member Author

EgorBo commented Oct 12, 2022

For BingSNR my fairly simple prototype lowers overall number of "jitted functions" from 240k to 200k (and my prototype ignores calls inside simple calls, e.g. a chain of properties)

@EgorBo
Copy link
Member Author

EgorBo commented Jun 9, 2023

Moving to Future as my attempt to enable limitted inlining in tier0 even slightly regressed startup

@clamp03
Copy link
Member

clamp03 commented Dec 2, 2024

@EgorBo Hi, I checked other redundant compilation case. I tested on ARM64 and RISC-V. (Release Build)
For Span.IndexerBench.CoveredIndex1 in performance benchmark, Span.IndexerBench:TestCoveredIndex1 is compiled with the same tiered compilation configuration twice.

$ export DOTNET_JitDisasmSummary=1
$ ./corerun MicroBenchmarks.dll -i --filter Span.IndexerBench.CoveredIndex1 | grep "Span.IndexerBench:TestCoveredIndex1"
7475: JIT compiled Span.IndexerBench:TestCoveredIndex1(System.Span`1[ubyte],int,int) [Instrumented Tier0, IL size=107, code size=500]
7486: JIT compiled Span.IndexerBench:TestCoveredIndex1(System.Span`1[ubyte],int,int) [Tier1-OSR @0x5d with Dynamic PGO, IL size=107, code size=124]
9530: JIT compiled Span.IndexerBench:TestCoveredIndex1(System.Span`1[ubyte],int,int) [Instrumented Tier0, IL size=107, code size=500]
9532: JIT compiled Span.IndexerBench:TestCoveredIndex1(System.Span`1[ubyte],int,int) [Tier1-OSR @0x5d with Dynamic PGO, IL size=107, code size=124]
9742: JIT compiled Span.IndexerBench:TestCoveredIndex1(System.Span`1[ubyte],int,int) [Tier1 with Dynamic PGO, IL size=107, code size=180]

What I checked are

  • 7475 and 9530 generate the same machine codes.
  • optimization tier in NativeCodeVersion of the method is OptimizationTier0 for 7475 and OptimizationTier0Instrumentation for 9530
    And it seems that 7475 is instrumented caused by
    opts.jitFlags->Set(JitFlags::JIT_FLAG_BBINSTR);
    Promote Tier0 methods with loops to InstrumentedTier0 #81051
  • 9530 compilation is triggered from background thread of TieredCompilationManager. It compiles with OptimizationTier0Instrumentation because the previous optimization tier of the method is OptimizationTier0.

I think 9530 and 9532 are redundant compilation.
Is it intended behavior or known issue (I tried to find the issue, but I cannot)?
And if I misunderstand, could you explain?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants