Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGO: class profile details we need to get right #48549

Open
1 of 4 tasks
AndyAyersMS opened this issue Feb 20, 2021 · 3 comments
Open
1 of 4 tasks

PGO: class profile details we need to get right #48549

AndyAyersMS opened this issue Feb 20, 2021 · 3 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 20, 2021

  • If we can't record a class because it's collectible, we still need to consider putting something into the table. Otherwise we may overstate the likelihood of other entries. There should be some well-known invalid handle we can store there. Fixed via Use unknown placeholder for collectible class typehandle #48707.
  • Allow the sampling strategy to vary depending on use case. Dynamic PGO will want to use a small fixed window so the buffer has some recent observation bias; Static PGO should use the count value so it is not overly biased towards recent samples.
  • Would be nice to come up with a scheme where table size could vary dynamically. Current default of 8 is likely too large as many sites will either never be hit or be monomorphic, but smaller fixed sizes will prevent understanding the distribution of classes at more complex sites. Ideally perhaps the table size starts at 1 (or even 0) and then grows somehow. Engineering all this seems tricky as part of the current appeal is that the storage needs are fixed at jit time. Also the probing must be concurrency safe without locking and we can't have too much extra overhead.
  • Look critically at the class profile merging done by dotnet-pgo. See notes below.

For the latter we should also consider enabling devirtualization (at least the analysis, if not the transformation) at Tier0 as there's no need to probe the class at a virtual or interface site if we know what the class must be. Need to experiment to see what fraction of class probes fall into that category.

category:cq
theme:profile-feedback

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 20, 2021
@AndyAyersMS AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Feb 20, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Feb 20, 2021
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Feb 24, 2021
instead of null, so that we won't over-estimate likelihood of other types at a
call site where we see both collectible and non-collectible types.

Addresses part of dotnet#48549.
AndyAyersMS added a commit that referenced this issue Mar 3, 2021
instead of null, so that we won't over-estimate likelihood of other types at a
call site where we see both collectible and non-collectible types.

Addresses part of #48549.
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 14, 2021

Type Histogram Merging

From the 21301.4 optimization data collection, we have 4 .mibc files with data for System.Collections.Generic.Dictionary`2<System.__Canon,System.__Canon>.FindValue(__Canon). We want to merge these into a combined histogram.

Looking at the offset 52 class probe we see (summarizing details below)

scenario table size count
TechEmpower 8 45573
Orchard Core 8 19598
Hello World 40 72561
First Time 8 21558

What should the merged table look like? Today we simply concatenate all the tables, producing a table of size 64, and sum all the counts, giving a net count of 159290. We then histogram the table and end up determining that the dominant entry is System.RuntimeType, with a a likelihood of 26/64 = 0.43.

However, on a count-weighted basis, we arguably should would estimate the likelihood of System.RuntimeType as

[ 45573 * (8/8) + 19598 * (7/8) + 72561 * (11/32) + 21558 * (0/8) ] / 159290 = 24943 / 159290 = 0.156

So because System.RuntimeType is common in the lighter-weight scenarios, we appear to overstate its overall likelihood.

Also note that since the default table size in the runtime is 8, we can guess that the hello world .mibc is already the result of a merge, and presumably we'd end up with yet another value, if we'd done a count-weighted merge instead of a concatenate-merge for the Hello World collection.

What would a count-weighted merge look like? If we want to keep the .mibc entries in the current table format, the correct approach seems to be some kind of probabilistic resampling of the individual tables.

Since we do N-way merges as a series of pairwise merges, let's look at the two table case first. Suppose we have tables Ti with counts Ci and sizes Si. Then a count-weighted merging algorithm would be something like the following:

double P0 = Table0.Count / (Table0.Count + Table1.Count);
random R = ...
table Merge = ...
  Merge.Count = C0 + C1;
  Merge.Size = S0 + S1;     

for (i = 0; i < Table0.Size + Table1.Size; i++)
{
   double pi = R.NextDouble(0, 1);
   if (pi < P0)
   {
      int s0 = R.NextInt(0, Table0.Size);
      Merge[i] = Table0[s0];
   }
   else 
   {
      int s1 = R.NextInt(0, Table1.Size);
      Merge[i] = Table1[s1];
   }
}

We have some flexibility here in choosing the merged table size, but it does not seem unreasonable to make it larger than either input table, to try and better represent the full diversity of the merge results.

This algorithm randomly samples from each table proportional to the count of observations in the table. To keep this semi-deterministic, the random seed would need to be some joint property of the tables, perhaps the hash of the method name augmented by the IL offset of the probe and the total count, or something similar.

However, this won't pass the associativity test; that is merging N entries will be order sensitive. If we have three tables, merge(merge(T0, T1), T2) != merge(T0, merge(T1, T2)). This seems undesirable (though the "errors" here should be random).

To have a fully general associative merge we'd need to form actual histograms and then merge them on a weight-adjusted basis. This requires a new schema type, analogous to GetLikelyClass but extended to represent all possible results.

;; techempower

    {
      "ILOffset": 52,
      "InstrumentationKind": "TypeHandleHistogramIntCount",
      "Other": -2147483648,
      "Data": 45573
    },
    {
      "ILOffset": 52,
      "InstrumentationKind": "TypeHandleHistogramTypeHandle",
      "Other": -2147483648,
      "Data": [
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType"
      ]
    },

;; orchard core

    {
      "ILOffset": 52,
      "InstrumentationKind": "TypeHandleHistogramIntCount",
      "Other": -2147483648,
      "Data": 19598
    },
    {
      "ILOffset": 52,
      "InstrumentationKind": "TypeHandleHistogramTypeHandle",
      "Other": -2147483648,
      "Data": [
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[S.P.CoreLib]System.RuntimeType",
        "[Castle.Core]Castle.DynamicProxy.Generators.CacheKey"
      ]
    },

;; hello world
{
"ILOffset": 52,
"InstrumentationKind": "TypeHandleHistogramIntCount",
"Other": -2147483648,
"Data": 72561
},
{
"ILOffset": 52,
"InstrumentationKind": "TypeHandleHistogramTypeHandle",
"Other": -2147483648,
"Data": [
"[S.P.CoreLib]System.RuntimeType",
"[S.P.CoreLib]System.RuntimeType",
"[S.P.CoreLib]System.RuntimeType",
"[S.P.CoreLib]System.RuntimeType",
"[System.Linq.Expressions]System.Linq.Expressions.Expression11<System.Func2<object,object>>",
"[S.P.CoreLib]System.RuntimeType",
"[S.P.CoreLib]System.RuntimeType",
"[S.P.CoreLib]System.RuntimeType",
"[NuGet.Frameworks]NuGet.Frameworks.NuGetFramework",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[NuGet.Frameworks]NuGet.Frameworks.NuGetFramework",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"UnknownType(0)",
"UnknownType(0)",
"UnknownType(0)",
"UnknownType(0)",
"UnknownType(0)",
"UnknownType(0)",
"UnknownType(0)",
"UnknownType(0)",
"[Microsoft.CodeAnalysis.CSharp]Microsoft.CodeAnalysis.CSharp.Syntax.AttributeSyntax",
"[Microsoft.CodeAnalysis.CSharp]Microsoft.CodeAnalysis.CSharp.CodeGen.DummyLocal",
"[S.P.CoreLib]System.RuntimeType",
"[Microsoft.CodeAnalysis.CSharp]Microsoft.CodeAnalysis.CSharp.CodeGen.DummyLocal",
"[S.P.CoreLib]System.RuntimeType",
"[Microsoft.CodeAnalysis.CSharp]Microsoft.CodeAnalysis.CSharp.CodeGen.DummyLocal",
"[S.P.CoreLib]System.RuntimeType",
"[S.P.CoreLib]System.RuntimeType",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[NuGet.Frameworks]NuGet.Frameworks.NuGetFramework",
"[NuGet.Frameworks]NuGet.Frameworks.NuGetFramework",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
"[NuGet.Frameworks]NuGet.Frameworks.NuGetFramework",
"[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation"
]
},

;; first time

    {
      "ILOffset": 52,
      "InstrumentationKind": "TypeHandleHistogramIntCount",
      "Other": -2147483648,
      "Data": 21558
    },
    {
      "ILOffset": 52,
      "InstrumentationKind": "TypeHandleHistogramTypeHandle",
      "Other": -2147483648,
      "Data": [
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
        "[NuGet.Frameworks]NuGet.Frameworks.NuGetFramework",
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation",
        "[Microsoft.Build]Microsoft.Build.Construction.XmlDocumentWithLocation"
      ]
    },

@jakobbotsch -- FYI in case you want to look into class profile merging.

@AndyAyersMS
Copy link
Member Author

I think we are good enough now for .NET 6 -- we can revisit if we get evidence that the table size, merging, or sampling strategies are causing particular problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Development

No branches or pull requests

1 participant