Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGO: Guarded isinst #55907

Closed
wants to merge 8 commits into from
Closed

PGO: Guarded isinst #55907

wants to merge 8 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 19, 2021

Closes #55325
Insert class probes for all isinst and castclass opcodes so we can later insert a fast path for a class that showed up in PGO data.

using System;
using System.Runtime.CompilerServices;
using System.Threading;

public interface IInterface
{
    void DoWork();
}

public class Impl : IInterface
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public virtual void DoWork() => Console.Write("DoWork");
}


public class Program
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void Test(object o)
    {
        if (o is IInterface i)
            i.DoWork();

        /*
         The above ^ is optimized into:
        
           if (o != null && (o.GetType() == typeof(Impl) || ISINSTANCEOFINTERFACE(o, typeof(IInterface))))
           {
               o.DoWork();
           }
                
        */
    }

    static void Main()
    {
        for (int i = 0; i < 100; i++)
        {
            // Promote Test from tier0 to tier1
            Test(new Impl());
            Thread.Sleep(16);
        }
    }
}

Codegen for Test diff: https://www.diffchecker.com/4pbk6Dls (DOTNET_TieredPGO=1)
I'll check if it has any affect on TE benchmarks.

/cc @dotnet/jit-contrib @AndyAyersMS

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 19, 2021
@jakobbotsch
Copy link
Member

Codegen for Test diff: https://www.diffchecker.com/CoL2hV21 (DOTNET_TieredPGO=1)

Looks like there is a superfluous null-check in the hot path now, can we get rid of it?

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

Codegen for Test diff: https://www.diffchecker.com/CoL2hV21 (DOTNET_TieredPGO=1)

I'll see what I can do there, but for now I'm mostly interested in perf numbers because without proper benefits these additional class probes will be a useless overhead for tier0.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

E.g. in this case:
image

it inserts a fast-path for ObjectDefaultComparer<Message> in a benchmark that serializes a Message struct a lot: https://www.diffchecker.com/2NnaKYKC

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

Btw, we already even in Default mode expand casts to non-sealed classes with a quick fast-path that can be wrong - PGO can help here as well, let me add CORINFO_HELP_CHKCASTCLASS support. see sharplab.io

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

Plaintext-MVC benchmark:
image

called from

module Microsoft.AspNetCore.Mvc.Core <<Microsoft.AspNetCore.Mvc.Core!Microsoft.AspNetCore.Mvc.Routing.ControllerRequestDelegateFactory+<>c__DisplayClass10_0::<CreateRequestDelegate>b__0(class [Microsoft.AspNetCore.Http.Abstractions]Microsoft.AspNetCore.Http.HttpContext)>>

module Microsoft.AspNetCore.Mvc.Core <<Microsoft.AspNetCore.Mvc.Core!Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker::InvokeNextActionFilterAwaitedAsync()>>

It seems this benchmark is improved by this PR:

Base RPS PR RPS diff
2850989 2924180 2.57%
^ (I've ran both 8 times in a row using crank's `--iterations 8`)

@jkotas
Copy link
Member

jkotas commented Jul 19, 2021

we already even in Default mode expand casts to non-sealed classes with a quick fast-path that can be wrong

What do you mean that it can be wrong? I do not see anything wrong with it.

@jkotas
Copy link
Member

jkotas commented Jul 19, 2021

What is our test coverage for these PGO-specific expansions?

@AndyAyersMS
Copy link
Member

What is our test coverage for these PGO-specific expansions?

There are nightly runtime outerloop and weekend libraries tests under PGO:

We also run TE scenarios nightly via ASP.NET team's harness.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

we already even in Default mode expand casts to non-sealed classes with a quick fast-path that can be wrong

What do you mean that it can be wrong? I do not see anything wrong with it.

@jkotas By "wrong" I mean the fast path is not taken, and even creates some overhead in cases where object is actually a subclass. Example (Default mode, Main):
image

We have a fast path for A in TestMethod but it won't be taken since I actually pass an instance of B. In my PR this problem is fixed - A is replaced with likelyClass which is B.

@AndyAyersMS
Copy link
Member

@EgorBo any idea how many class probes this adds? Seems like (as we discussed elsewhere) many of these will be redundant with existing class probes, and I wonder whether we should hold off on this until we have some idea how to better address/exploit that redundancy.

Note we currently need to produce the same schema for both static and dynamic PGO, so if we enable this for dynamic PGO we'll also enable it for static PGO and thus potentially increase the size of the profile data in SPC and similar.

@jkotas
Copy link
Member

jkotas commented Jul 19, 2021

There are nightly runtime outerloop and weekend libraries tests under PGO:

Good. Do we have an idea whether the hot paths in these tests provide good enough functional coverage? Should we be adding these optional PGO-specific expansions to JIT stress or something like that so that we get larger functional matrix coverage for them?

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

@EgorBo any idea how many class probes this adds? Seems like (as we discussed elsewhere) many of these will be redundant with existing class probes, and I wonder whether we should hold off on this until we have some idea how to better address/exploit that redundancy.

Note we currently need to produce the same schema for both static and dynamic PGO, so if we enable this for dynamic PGO we'll also enable it for static PGO and thus potentially increase the size of the profile data in SPC and similar.

Not sure yet, I was mostly interested in perf numbers, I've ran most of the TE benchmarks, a few of them look slightly improved, but I expected a bit more. I still see various CastHelpers in the hot paths, but I suspect those are connected with shared generics and this PR just gives up when it sees them. So I probably will close it for now, will try to guess a likelyclass from virtual calls inside blocks dominated by isinst/casts in future.

PS: Plaintext-MVC is definitely improved by 2.3%, I've just had another round of 8 iterations of Main vs PR and the improvements are stable again.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2021

Good. Do we have an idea whether the hot paths in these tests provide good enough functional coverage? Should we be adding these optional PGO-specific expansions to JIT stress or something like that so that we get larger functional matrix coverage for them?

There was an idea to invoke "Main", say, 100 times in the test runner for runtime tests (the ones without static states) or maybe even in tests for Libs. Prototype: #52874 Currently a very little of them make it to tier1 without TC=0.

@AndyAyersMS
Copy link
Member

Do we have an idea whether the hot paths in these tests provide good enough functional coverage?

Until recently the jit had more or less the same logic with static and dynamic PGO. Static PGO data is pervasive in our assemblies now so we get some PGO coverage all the time just from that.

Coverage for dynamic PGO alone is not great (and similarly, it must be said, for tiered compilation in general). Last I looked in coreclr outlerloop only about 2500 methods make it to Tier1 with default tiering options. Libraries may fare better ... at least the orchestration code probably gets a good workout. For TE runs we get ~10K methods jitted with dynamic PGO:

   10034   22.82%  HAS_DYNAMIC_PROFILE
    1751    3.98%  HAS_STATIC_PROFILE
     543    1.24%  HAS_LIKELY_CLASS
    3010    6.85%  HAS_CLASS_PROFILE
   11785   26.81%  HAS_EDGE_PROFILE
   11785   26.81%  HAS_PGO

I have an experimental harness I started working up which repeatedly invokes main on tests with pauses to let tiering catch up once we hit enough invocations (similar in spirit to the assembly load context harness) to try and get more methods to tier up and hence give us a way to increase coverage of PGO only functionality. But it seems like we'll need some sort of annotation on tests themselves beyond what we already have for the ALC harness. I was also trying to find a good auto-exclude policy (eg if second iteration fails, assume test is not compatible.). This testing would be fairly slow so would also end up being night/weekend sort of thing.

There are ways to drive tests with out of band PGO data (sideloaded text format pgo) but I haven't tried leveraging this much during testing. It's not clear how durable the PGO data would be.

We also have some stress randomization capabilities within PGO in the jit, but they're not enabled in testing yet. We have not tried running PGO + JitStress or PGO + GcStress. We probably should (at least the latter).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PGO: Optimize isinst with class probes
4 participants