Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional filter capabilities to dotnet-pgo tool. #89853

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Aug 2, 2023

Mono adopted PGO in .net7 as a replacement for a solution called profiled AOT in mono/mono, primarly used by Android SDK. In the replaced solution there was a concept of a "stop trigger", meaning that the user could collect methods up to a stop trigger (a method name) meaning that the profiled AOT image would only include methods up to that point and nothing more.

When using EventPipe and nettrace there is limited ability to get the same fine grained control over what methods that ends up in the nettrace file. dotnet-monitor includes ways to stop tracing if it hits for example a specific method, but due to the nature of EventPipe, there could still be additional methods added to the trace when closing session. dotnet-trace currently don't offer any ability to do something similar, but if implemented, it would probably come with similar limitations as dotnet-monitor.

Adding better filter capabilities to dotnet-pgo would add additional capabilities, giving users more control on what methods that gets included into the generated mibc file. That would give Mono's profiled AOT better control to include methods up to a stop trigger.

dotnet-pgo already had capabilities to include events based on timestamp interval. This commit extends that to select the lower/upper timestamp based on a regular expression matching methods. This commit also adds additional filter capabilities to include/exclude methods based on regular expressions, giving better control on what methods to include/exclude in the generated mibc file.

The additional filter capabilities could be used by Android SDK to replace the stop trigger features used in old profiled AOT solution. Android team could then profile the startup of a MAUI application and use generated nettrace file with dotnet-pgo passing in --exclude-events-after-method .*Main.* --exclude-methods .*Main.*, to include methods up to executing Main method in the generated mibc file.

The new filter capabilities also add ways to create a mibc file only including/exclude methods from a specific assemblies, namespaces or a combination of all.

Mono adopted PGO in .net7 as a replacement for a
solution called profiled AOT in mono/mono used by Android
SDK. In the replaced solution there was a concept of
a stop trigger, meaning that the user could collect methods
up to a stop trigger (a method name) meaning that the
profiled AOT image would only include methods up to that point.

When using EventPipe and nettrace there is limited ability to
get the same fine grained control over what methods that ends
up in the nettrace file. dotnet-monitor includes ways to stop
tracing if it hits for example a specific method, but due to the
nature of EventPipe, there could still be additional methods added
to the trace when closing session. dotnet-trace currently don't offer
any ability to do something similar, but if implemented, it
would probably come with the same limitations as dotnet-monitor.

Adding better filter capabilities to dotnet-pgo would add additional
capabilities, giving users more control on what methods that gets
included into the generated mibc file. That would give Mono's
profiled AOT better control to include methods up to a stop trigger.

dotnet-pgo alread had capabilities to include events based on timestamp
interval. This commit extends that to select the lower/upper timestamp
based on a regular expression matching methods. This commit also
adds capabilities to add a method include/exclude filters using regular
expression, giving users fine grained control on what methods to
include/exclude in the generated mibc file.

The additional filter capabilities could be used by Android SDK to
for example create a mibc file including all methods up to Main,
replacing the stop trigger features used in old profiled AOT solution.
Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@am11 am11 requested a review from jakobbotsch August 3, 2023 15:44
Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I would expect that users should emit their own events that they can use to correlate when their scenarios start and stop running. This is for instance the approach that Benchmark.NET uses (via its WorkloadActual/Start and WorkloadActual/Stop events).
Looking for when a method starts jitting is not going to be a reliable way to correlate it with the execution of that method (at least for coreclr), though it might work in some cases.

I'm likewise not sure that it makes sense to use GetR2REntryPoint events to make this correlation. Does it? Is this event lazily fired the first time a R2R method starts executing?

Just to be clear: I'm not against taking the changes if it addresses a Mono scenario, but in that case we need to document some of the limitations of these command-line options. I would like to get @davidwrighton's opinion on it.

@lateralusX
Copy link
Member Author

lateralusX commented Aug 4, 2023

Generally, I would expect that users should emit their own events that they can use to correlate when their scenarios start and stop running. This is for instance the approach that Benchmark.NET uses (via its WorkloadActual/Start and WorkloadActual/Stop events). Looking for when a method starts jitting is not going to be a reliable way to correlate it with the execution of that method (at least for coreclr), though it might work in some cases.

I'm likewise not sure that it makes sense to use GetR2REntryPoint events to make this correlation. Does it? Is this event lazily fired the first time a R2R method starts executing?

Just to be clear: I'm not against taking the changes if it addresses a Mono scenario, but in that case we need to document some of the limitations of these command-line options. I would like to get @davidwrighton's opinion on it.

I don't see that there are any limitations in the command line options as is, they are just giving the user additional capabilities to filter the methods that ends up in the generated mibc file, extending the filter capabilities that we already have in --exclude-events-before and --exclude-events-after command line options. The new command line options introduced here just gives user ability to instead use a method in the nettrace file as markers for the before/after timestamps used to filter what methods that will end up in the mibc file. The --include-methods and --exclude-methods gives users even more power to include/exclude methods based on regexp in the mibc file and extending the filter capabilities of the of the tool even further.

For solutions like Mono's profiled AOT, they use dotnet-trace (or dotnet-monitor) to collect a nettrace file -> dotnet-pgo to create the mibc file/files -> Mono AOT compiler to AOT compile all methods included in the mibc files. Since all is using generic tooling, doing custom events as part of that solution would mean injecting custom tooling into the toolchain, but the profile AOT scenario is just about what methods that has been jitted/loaded by the runtime up to some selected point in time events already existing and handled by the pgo toolchain. But again, the command line arguments implemented in this PR is not related to any specific solutions or scenarios, just extending the filter capabilities of the dotnet-pgo tooling.

The Mono solution we are replacing is https://github.com/mono/mono/blob/main/mono/profiler/aot.c and the Android SDK team is using their own variant of it here, https://github.com/jonathanpeppers/android-profiled-aot. We can replace all parts of that solution with generic net-trace, dotnet-pgo (with extended filtering in this PR) and Mono AOT compiler accepting mibc files as profiles (implemented in .net7) making it into a generic Mono profiled AOT solution build on generic .net runtime and diagnostics tooling without any need for custom tooling.

@jakobbotsch
Copy link
Member

I don't see that there are any limitations in the command line options as is, they are just giving the user additional capabilities to filter the methods that ends up in the generated mibc file, extending the filter capabilities that we already have in --exclude-events-before and --exclude-events-after command line options. The new command line options introduced here just gives user ability to instead use a method in the nettrace file as markers for the before/after timestamps used to filter what methods that will end up in the mibc file.

What I mean by limitations is that exclude-events-before-method and exclude-events-after-method gave me the impression that the trace contains an event that indicates when a method was executed. It is not the case (on coreclr) -- things like R2R, tiering, inlining, multicore JIT will affect when and whether these "jitting started" events are fired by the runtime. I don't think the user has a reasonable way to predict the exact timings/ordering of the firing of these JIT events. I would be less concerned if we renamed the CLI options to exclude-events-before-jitting-method and exclude-events-after-jitting-method to make it more clear, but I would still prefer to push users towards firing their own events to indicate when their scenarios are running.

Since all is using generic tooling, doing custom events as part of that solution would mean injecting custom tooling into the toolchain

PerfView has a powerful mechanism to describe custom events to indicate starting/stopping points (search for StopOnEtwEvent in http://htmlpreview.github.io/?https://github.com/Microsoft/perfview/blob/main/src/PerfView/SupportFiles/UsersGuide.htm). It could be nice to model a solution against this.

I'm still a bit confused about R2RGetEntryPoint. Does Mono fire this event?

@lateralusX
Copy link
Member Author

I'm still a bit confused about R2RGetEntryPoint. Does Mono fire this event?

No, but didn't see any reason to exclude them from a generic filter implementation.

@lateralusX
Copy link
Member Author

PerfView has a powerful mechanism to describe custom events to indicate starting/stopping points (search for StopOnEtwEvent in http://htmlpreview.github.io/?https://github.com/Microsoft/perfview/blob/main/src/PerfView/SupportFiles/UsersGuide.htm). It could be nice to model a solution against this.

These are tools in the toolchain are automated, generating a complete different artifact as result, converting it into a mibc file passed to aot compiler. The toolchain is cross platform and can't rely on Windows only UI tooling.

@lateralusX
Copy link
Member Author

. I would be less concerned if we renamed the CLI options to exclude-events-before-jitting-method and exclude-events-after-jitting-method to make it more clear,

We could rename them to be more clear and only apply that filter mechanism to JitStart event if that makes things simpler/clearer. The --include-methods, --exclude-method on the other hand are straightforward and could apply to bith JIT and R2R methods.

@jakobbotsch
Copy link
Member

I'm still a bit confused about R2RGetEntryPoint. Does Mono fire this event?

No, but didn't see any reason to exclude them from a generic filter implementation.

I'm not sure the way these events are fired correlates with any form of ordering -- coreclr may simply fire the event for all methods in a R2R image as soon as it is loaded. I need to double check.

These are tools in the toolchain are automated, generating a complete different artifact as result, converting it into a mibc file passed to aot compiler. The toolchain is cross platform and can't rely on Windows only UI tooling.

I didn't mean to switch the tooling to use PerfView. I meant we could introduce a similar powerful mechanism in dotnet-pgo, allowing a flexible syntax to specify the start/stopping points that would support custom events fired by the customer. But I understand if that's a bit more work than you'd like on your plate. :-)

We could rename them to be more clear and only apply that filter mechanism to JitStart event if that makes things simpler/clearer. The --include-methods, --exclude-method on the other hand are straightforward and could apply to bith JIT and R2R methods.

That sounds good to me. I would be fine with the changes if we do that.

@lateralusX
Copy link
Member Author

I didn't mean to switch the tooling to use PerfView. I meant we could introduce a similar powerful mechanism in dotnet-pgo, allowing a flexible syntax to specify the start/stopping points that would support custom events fired by the customer. But I understand if that's a bit more work than you'd like on your plate. :-)

I see, we could potentially do that long term, we also have plans to include similar powerful features into dotnet-trace directly, that would be more inline with the perfview feature.

@lateralusX
Copy link
Member Author

I'm not sure the way these events are fired correlates with any form of ordering -- coreclr may simply fire the event for all methods in a R2R image as soon as it is loaded. I need to double check.

Jupp, these events can fire in different order based on multithreading, what gets called, on Mono we will fire them when we JIT methods or load from AOT image, and that happens when we are about to execute a specific method.

@lateralusX
Copy link
Member Author

That sounds good to me. I would be fine with the changes if we do that.

OK, great, lets do it like that then, I will rename the before/after to include jitting and only apply it to jitted events. The more generic include/exclude methods can be applied to both jitted and R2R methods.

@lateralusX
Copy link
Member Author

lateralusX commented Aug 4, 2023

@jakobbotsch So to summarize what we discussed and actions to get this PR merged:

  • Will rename the --exclude-events-before-method, --exclude-events-after-method to --exclude-events-before-jitting-method and --exclude-events-after-jitting-method and only apply it to jitted method events.
  • If we are guaranteed that events are sorted by timestamp in the .etlx file (.nettrace doesn't guarantee that) we could drop custom sorting and apply filter logic directly when iterating methods.
  • Keep --include-methods/--exclude-methods as is since they are generic add lots of flexibility to decide what methods you include/exclude in the created mibc file. and mirrors parts of whats currently done by the Android SDK team in a bunch of custom tooling.

Sounds good?

@jakobbotsch
Copy link
Member

@jakobbotsch So to summarize what we discussed and actions to get this PR merged:

  • Will rename the --exclude-events-before-method, --exclude-events-after-method to --exclude-events-before-jitting-method and --exclude-events-after-jitting-method and only apply it to jitted method events.
  • If we are guaranteed that events are sorted by timestamp in the .etlx file (.nettrace doesn't guarantee that) we could drop custom sorting and apply filter logic directly when iterating methods.
  • Keep --include-methods/--exclude-methods as is since they are generic add lots of flexibility to decide what methods you include/exclude in the created mibc file. and mirrors parts of whats currently done by the Android SDK team in a bunch of custom tooling.

Sounds good?

Sounds great to me. Thanks!

@lateralusX
Copy link
Member Author

@jakobbotsch So to summarize what we discussed and actions to get this PR merged:

  • Will rename the --exclude-events-before-method, --exclude-events-after-method to --exclude-events-before-jitting-method and --exclude-events-after-jitting-method and only apply it to jitted method events.
  • If we are guaranteed that events are sorted by timestamp in the .etlx file (.nettrace doesn't guarantee that) we could drop custom sorting and apply filter logic directly when iterating methods.
  • Keep --include-methods/--exclude-methods as is since they are generic add lots of flexibility to decide what methods you include/exclude in the created mibc file. and mirrors parts of whats currently done by the Android SDK team in a bunch of custom tooling.

Sounds good?

Sounds great to me. Thanks!

@jakobbotsch, PR has been updated.


if (includeMethods != null || excludeMethods != null)
{
string methodAsString = method.ToString();
Copy link
Member

@jakobbotsch jakobbotsch Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string is generally meant for debugging, it goes through DebugNameFormatter which will produce method names that users may not be familiar with (e.g. I believe types will be module prefixed in the format "[module name]type name"). Would it be better to introduce a custom formatter, reuse the TypeString formatter, or to reuse the methodNameFromEventDirectly that we are already using to print the warnings above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there expectation for users to be more specific than to use a regular expression for specifying which method to exclude events after? I was trying to get a local test loop going on Android to see the structural differences between a MethodDesc.ToString() and the methodNameFromEventDirectly but its taking me some time to get it working locally.

I can switch it to methodNameFromEventDirectly as that should still be able to match a regex provided by the user. Is it usually easier to match the methodNameFromEventDirectly form than to match the MethodDesc.ToString() form even though it is passed through the DebugNameFormatter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the way DebugNameFormatter formats names is an internal detail of the managed type system, so I do not think user-facing tools should be using this for any purpose but diagnostics. We do not want situations where we change this detail in the future and break something that took a dependency on the exact way that namespaces or class names are formatted by this formatter.

I looked again and realize methodNameFromEventDirectly is not actually a field that comes directly from the event, but a concatenation of various fields. I think a simpler concatenation like e.MethodNamespace + "." + e.MethodName would be better to do the matching against. I think MethodNamespace includes the class name, but you may want to double check.

Copy link
Member

@mdh1418 mdh1418 Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also suggests that e.MethodNamespace will also contain the class name https://github.com/microsoft/perfview/blob/766a08ca8373d273d55da8b07147e7068e4d5f58/src/TraceEvent/Samples/21_ObserveJitEvents.cs#L126-L132. (haven't found the source code for such fields/classes yet)

Maybe methodNameFromEventDirectly would allow for more flexibility as it will allow users to also specify between different overloaded forms of the same method? It would also more closely align with a GetMethodName in Perfview https://github.com/microsoft/perfview/blob/766a08ca8373d273d55da8b07147e7068e4d5f58/src/TraceEvent/Computers/TraceManagedProcess.cs#L4170

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea, but then we should match the PerfView name exactly, methodNameFromEventDirectly does not match that today. We should also update documentation of the CLI option to refer to the fact that matching works against names that are formatted the same as PerfView (I would also add a note that this includes parameters, so that users can realize they cannot use e.g. *MyClass* to match all methods defined on that class). Do you know where exactly this name is displayed in the PerfView UI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the user scenarios, so I dont know if it is likely that one would want to only include methods with a certain set of parameters and not include the others. Maybe we can start out with having the most customizable option first (allowing users to include/exclude particular overloaded methods) and later if we know users will never want to do that we can pursue the simpler format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also noticing that the excludeEventsBeforeJittingMethod and excludeEventsAfterJittingMethod are based off of method.ToString(). It would probably be better to maintain parity for those as well. And on more deliberation and offline discussion, maybe keeping the simpler format is better for now. Also, do you happen to know if MethodDesc.ToString() will contain the parameter information or not @jakobbotsch ?

Copy link
Member

@mdh1418 mdh1418 Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like it does

sb.Append('(');
try
{
sb.Append(Signature.ToString(includeReturnType: false));
}
catch
{
sb.Append("Unknown");
}
sb.Append(')');

Maybe it would be better to stick with @lateralusX's original intent of the regex matching.

I'm not very familiar with these types, would e.MethodNamespace + "." + e.MethodName + paramsArgs be the same values that are held by the resulting MethodDesc discovered by idParser.ResolveMethoID(e.MethodID...)? If not, then it might not be the right "substitution"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also noticing that the excludeEventsBeforeJittingMethod and excludeEventsAfterJittingMethod are based off of method.ToString(). It would probably be better to maintain parity for those as well.

Yes, the code above should be changed in the same way.

We should not use MethodDesc.ToString() for anything except diagnostics when it is user facing like this. It should have a dedicated formatter then if we do not want to use the name from the event. I think you can base it on TypeString, except without including instantiations.

I'm not very familiar with these types, would e.MethodNamespace + "." + e.MethodName + paramsArgs be the same values that are held by the resulting MethodDesc discovered by idParser.ResolveMethoID(e.MethodID...)? If not, then it might not be the right "substitution"

Yes, they represent the same thing. dotnet-pgo has a managed implementation of the type system that replicates the native implementation of the type system that produced the strings in the event. Classes and methods are loaded into the type system based on events in the trace.

Copy link
Member

@mdh1418 mdh1418 Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use e.MethodNamespace + "." + e.MethodName + paramsArgs as the method's name for regex matching, formatted after the method name in PerfView.

Also updated the documentation in the options' descriptions

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks ok to me. Do you want this in .NET 8, or can it wait for .NET 9?

@mdh1418
Copy link
Member

mdh1418 commented Aug 8, 2023

@jakobbotsch we want to get this into .NET 8. @lateralusX is out, so I can help push this through after addressing your feedback if it looks mostly complete.

@mdh1418 mdh1418 self-requested a review August 8, 2023 14:17
@jakobbotsch
Copy link
Member

@jakobbotsch we want to get this into .NET 8. @lateralusX is out, so I can help push this through after addressing your feedback if it looks mostly complete.

Sounds good to me.

Comment on lines 1440 to 1448
if (e.TimeStampRelativeMSec > excludeEventsBefore && excludeEventsBeforeJittingMethod != null && excludeEventsBeforeJittingMethod.IsMatch(perfviewMethodName))
{
excludeEventsBefore = e.TimeStampRelativeMSec;
}

if (e.TimeStampRelativeMSec < excludeEventsAfter && excludeEventsAfterJittingMethod != null && excludeEventsAfterJittingMethod.IsMatch(perfviewMethodName))
{
excludeEventsAfter = e.TimeStampRelativeMSec;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the behavior here on multiple matches the right one? In particular excluding events before the last match instead of before the first match seems odd to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIght, it would be problematic especially if the method matching had a timestamp order of beforeMethodMatch1 < afterMethodMatch < beforeMethodMatch2. And worst case, if the two matches wer the same, it would be impossible for the user to have a specific enough regex to only match the first (temporally) method.

Are methods in the enumerator of p.EventsInProcess.ByEventType<MethodJittingStartedTraceData>() guaranteed to be ordered temporally?

Maybe it would be better to keep a list of the matches and choose the first temporal match for excludeEventsBeforeJittingMethod and the last temporal match for excludeEventsAfterJittingMethod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are methods in the enumerator of p.EventsInProcess.ByEventType() guaranteed to be ordered temporally?

Yes, so we can just have a bool or something to keep track of whether we have already assigned excludeEventsBefore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for excludeEventsAfterJittingMethod we would need to still keep a list of the matches for a scenario like afterMethodMatch1 < beforeMethodMatch < afterMethodMatch2 in case the user wanted to use the bounds [beforeMethodMatch, afterMethodMatch2].

Just realizing, if these events are for jitting methods, is it the case that we would only JIT a particular method once? Meaning no two events will have the same name? Not sure exactly how these events are collected. If thats the case, then maybe we can have it such that the user needs to be more specific with their Regex matching if there happens to be multiple matches?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to trace an app and sent over the .nettrace to my windows machine. It looks like caching does happen and a method will only be found once in MethodJittingStartedTraceData? So it seems like users can be more specific with their regular expression matching, but will opt to take the first "before" match and the last "after" match to avoid invalid bounds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for excludeEventsAfterJittingMethod we would need to still keep a list of the matches for a scenario like afterMethodMatch1 < beforeMethodMatch < afterMethodMatch2 in case the user wanted to use the bounds [beforeMethodMatch, afterMethodMatch2].

I don't think this is a scenario we need or should to try to handle. As long as we give a warning or error (which I believe we already do) the user can fix the regex to be specific enough. I would just make sure we pick the largest range that corresponds to the matches.

Just realizing, if these events are for jitting methods, is it the case that we would only JIT a particular method once? Meaning no two events will have the same name? Not sure exactly how these events are collected. If thats the case, then maybe we can have it such that the user needs to be more specific with their Regex matching if there happens to be multiple matches?

coreclr will JIT the same method multiple times at different optimization levels when tiered compilation is enabled, so there can be multiple "JIT started" events for the same method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two doubles to track the earliest excludeEventsBeforeJittingMethod match and latest excludeEventsAfterJittingMethod match

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing my feedback.

@jakobbotsch jakobbotsch merged commit 6d9baea into dotnet:main Aug 10, 2023
104 of 106 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants