-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow analyzers to selectively opt into generated code analysis #39758
Comments
Have we already confirmed that the performance issues cannot be addressed by making the analyzer itself faster? As an alternative, could the analyzer request that generated code be analyzed, but have a quick check in the analyzer itself to see if the current file is the expected generated code and return without performing the analysis if it's not a generated Razor file? |
dotnet/aspnetcore#17001 is about making those analyzers faster by switching to operations. However, in the general case it is probably a good idea to be able to limit them to the files they're specifically interested in (i.e. Razor pages). Various code generation schemes used by users can create massive amounts of generated code, and as fast as we can make those analyzers it would be better for them not to process that at all... |
Couldthis be done by having a SemanticModelStartAction? Clients could register, see if the file was one they wanted to operate on, and skip ones they didn't care about. |
I agree with @sharwell - there can be extremely large number of variations of what can be classified as generated code, and none of the heuristics or algorithms we employ in the driver can satisfy every client. The analyzer driver uses a simple heuristic for extremely common cases for generated code files. We do not guarantee that this logic will not have any breaking changes in future, so no analyzer should rely on these heuristics for its own functional or performance characteristics. If the analyzer wants to have very specific logic on what classifies as generated code, and especially something that is performance sensitive, it should register to analyze all generated code in compilation and have a checks upfront to determine it's own generated code value per file and decide to bail out or not. Note that you can potentially cache the roslyn/src/Compilers/Core/Portable/DiagnosticAnalyzer/DiagnosticAnalysisContext.cs Lines 468 to 478 in f828d26
As an aside, we should probably consider implementing #7578 for convenience of authors. |
I'm not sure why we need to think about variations, heuristics or complexity - the proposal here is simply to add a single string tag that would be emitted by code generators and opted into by analyzers shipped with those code generators. I also hope that the current ConfigureGeneratedCodeAnalysis logic is guaranteed in the future, and if so I'm not sure why the additional overload wouldn't be. I'd venture to say the following... In general, any analyzer that does want to process some generated code, is going to want to process only the specific generated code it was designed for; it ships as part of the same product that also generates the code that it needs to analyze (e.g. ASP.NET). In other words, the |
And so on and so forth. Heck, what happens if you yourself realize later that your tag wasn't sufficient? |
Can you explain why the build in filtering is insufficient? Why can't you just write the code that actually just does the filtering when you look at the document? You can then define whatever system you want, and you can change/augment that system in teh future. Analyzers are already designed to handle this well. |
One can always ask this question about any feature - we could have used the same logic to not implement the existing ConfigureGeneratedCodeAnalysis mechanism. The mechanism was also added despite the fact that analyzers could implement their own filtering themselves - because it's commonly needed. We have a clear scenario with a problem - a project which generates code and wants to deliver an analyzer to process that code. This seems like a well-defined case which probably plagues most (if not all) analyzers out there that don't opt out from all generated code. |
Right... and you're the second group here... so it's being asked of your feature :D
Right. The 80%+ case was handled. At this point, it seems better to just have the analyzer do the simple custom filtering. It feels like it would literally just be a case of getting the leading trivia of the doc and checking for your tag.
You're the first to mention it. Seems like it may just be niche :) |
Analyzing generated code is something most analyzers don't want to do, because a user didn't write it. Normally an error in generated code means the user needs to file a bug on the team that wrote the generator, not change their code. With code generators coming in as a feature this may become more common. |
To be clear (and also to repeat myself), I am not saying analyzers should process generated code. I'm saying that in the case they do need to analyze generated code, they will almost always want to limit their analysis of only the generated code they're interested in. Since not doing so is a perf pit of failure, and since it should be done by (almost) all analyzers (which don't opt out of all generated code), it seems reasonable to have a built-in way to filter, to prevent every analyzer from having to do it again and again. |
Right, I took what you said to mean this. Sorry for being overly general in my statements. I was just trying to provide some context.
That was not the scenario we were originally thinking about when we added this option. The concern was that certain diagnostics needed to consider all use sites, even generated code, in order to know if something is correct. The security analyzers that ship with the SDL, for example, need to look at everything and will flag generated code as having security issues.
I guess my pushback here is that there is nothing that the roslyn analyzer driver can do inherently better than an analyzer author. We could just agree on a convention and copy syntax trivia check into the driver. But that code is going to look identical to what your analyzer would do. So if perf is a concern then I would need to see some measurements since we would just be saving some virtual dispatch overhead at that point. Every file in a compilation is parsed everytime so asking an analyzer to parse a comment is not likely to be a perf bottleneck. |
@jmarolf sure thing. Regarding there being nothing that the analyzer driver can do better than an analyzer author, isn't that argument true of the general ConfigureGeneratedCodeAnalysis as currently implemented? The point here isn't really to save a virtual dispatch, but to provide a simple, accessible/discoverable filter mechanism that would hopefully increase the chances of analyzer authors actually using it and thinking about the issue. We have one good data point here - the ASP.NET analyzer writers didn't implement their own filter, so I'm assuming many others won't either, adversely impacting build perf in various scenarios. And a simple, clear API may help mitigate that. But if people on the team don't consider there's enough value, it can be closed. |
There are a lot of heuristics that we apply here that would not be obvious to a new author and so we felt it was necessary to do this once in a standard way.
right, my point was just that they could. There is nothing stopping an analyzer author from seeing a performance problem, profiling it, and then doing a quick up-front check to improve performance. Still, I agree that is likely not going to be the majority of users.
I think generators and the like are going to force us to deal with this one way or the other. This sort of thing is going to become more common. My alternate proposal would be that perhaps we add an api that allows analyzers to register a filter expression on the file name. /// <summary>
/// Configure analysis mode of generated code for this analyzer.
/// Non-configured analyzers will default to an appropriate default mode for generated code.
/// It is recommended for the analyzer to invoke this API with the required <see cref="GeneratedCodeAnalysisFlags"/> setting.
/// </summary>
/// <param name="searchPattern">
/// The search string to match against the names of files in path.
/// This parameter can contain a combination of valid literal path and wildcard (* and ?) characters,
/// but it doesn't support regular expressions
/// </param>
public void ConfigureGeneratedCodeAnalysis(GeneratedCodeAnalysisFlags analysisMode, string? searchPattern = null) public override void Initialize(AnalysisContext context)
{
context.EnableConcurrentExecution();
context.ConfigureGeneratedCodeAnalysis(GeneratedCodeAnalysisFlags.Analyze, searchPattern: "*.Razor.cs");
context.RegisterSymbolAction(AnalyzeSymbol, SymbolKind.NamedType);
} |
@jmarolf thanks for being open to this. The issue with a filter expression on the file name is that the name may not be enough to identify the type/source of the generated file. A tag within the comment seems like a more flexible solution, as code generators can always be made to emit it, whereas filenames may be more constrained etc. |
I believe this feature request can be handled more cleanly by using the editorconfig options for specifying generated code. We added such an option in #38859, which implements #38674 (comment). All the code generators can emit an editorconfig file with generated_code = true, and this will be respected by the analyzer driver and soon also by compiler for nullable analysis (#39051). We also plan to respect it in IDE’s GeneratedCodeRecognitionService and potentially all tooling based on editorconfig if editorconfig/editorconfig#399 gets implemented. For analyzers that need more fine grained distinction between different kinds of generated files (such as razor vs EF migration mentioned here), they can define their own editorconfig option, say generated_code_kind, and only process files that do not have generated_code = true OR have generated_code_kind of their preference. IMO this is a much declarative, cleaner and scalable solution rather then sprinkling such generated code annotations within generated file contents and adding detection heuristics for these annotations in different parts of the ecosystem. |
I think the thing that's problematic in this case is that Razor is generated code. Razor is kinda unique (holla at me in any other language fits in this category) in that unreadable, horrible, generated code, and user-code coexists in the same C# file. Razor's output is used in debugging, and properly highlights the correct line number in The IDE currently uses the "is generated code" knowledge for a bunch of different features and behaviors with the assumption that "is generated code" applies to the whole file. That's desirable for some of these features and undesirable for others. We largely mark Razor's output as generated code because that gives non-broken behavior in visual studio. We end up writing analyzers for Razor code because Razor's code generation phase is separate from Roslyn. We don't have semantic information for instance about the symbols and expressions that are in the user's code. There are things that we can only do by writing analyzers - these support making out programming model more robust, and help prevent common mistakes that are specific to our programming model. So I guess I'm wondering if we added the feature being proposed here - would anyone other than Razor use it? If we're taking another crack at source generators, does the problem I described above (mixed user-code and generated code int he same file) start appearing in other contexts? |
@mavasani I didn't know about the ability to specify generated code via editorconfig, that's nice! It could indeed be a good possibility to introduce filtering via this mechanism. @rynowak good question - I'm assuming that ASP.NET isn't the only one doing generated code analysis, but I may be wrong... |
@mavasani I always equated @roji and @rynowak I think when we add source generators this becomes a mainstream thing that we want to do. I don't know what the generators api is going to look like but I think we can assume that it will mirror the analyzer and codefix apis in some respects. Which would mean that each generator would have an id of some kind that it would register with the compilation process. Going back again to that old well of adding configurations via the /// <summary>
/// Configure analysis to only be triggered for the output of the given code generators.
/// </summary>
/// <param name="generatorIds">
/// The set of code generator ids whose output should tigger analysis.
/// </param>
public void ConfigureAnalysisForCodeGeneratorOutputs(params string[] generatorIds) public override void Initialize(AnalysisContext context)
{
context.EnableConcurrentExecution();
context.ConfigureAnalysisForCodeGeneratorOutputs(RazorCodeGenerator.Id);
context.RegisterSymbolAction(AnalyzeSymbol, SymbolKind.NamedType);
} With the intent being that now this analyzer is only called on files/syntax that is generated by the given code generator. |
@jmarolf that API proposal looks great to me! I guess the questions above are more about how/where generator IDs are defined (i.e. in the file in the |
based on initials discussions and #12303 I assume the API will look something like this: public class RazorCodeGenerator : SourceGenerator
{
private const string Text = @"
partial class C
{
public void DoNothing() {}
}";
public override string CodeGeneratorId => "Razor.Code.Generator";
public override void Execute(SourceGeneratorContext context)
{
var syntaxtree = CSharpSyntaxTree.ParseText(Text);
context.AddCompilationUnit("C.DoNothing", syntaxtree);
}
} This the types being references looking something like: public class SourceGeneratorContext
{
public ImmutableArray<SyntaxNode> MatchingTypes { get; }
public Compilation Compilation { get; }
public void AddCompilationUnit(string name, SyntaxTree syntaxTree) { }
}
public abstract class SourceGenerator
{
public abstract string CodeGeneratorId { get; }
public abstract void Execute(SourceGeneratorContext context);
} |
@jmarolf Why can't the source generators drop in an editorconfig with the generated files? |
@mavasani they could. I tend to think of |
I think editorconfig is a way to declaratively associate preferences or semantics to the source code they apply to. It seems completely reasonable to be produced by code generators for source code they produce as they have more knowledge of the code then the end users who are just packaging the generated code in their project. My primary concern with your approach is we are inventing a new mechanism and API to convey additional semantics about parts of code for a specific feature. We already have a communication mechanism/stream for such code specific preferences and semantics from the authors of the code to the tools analyzing the code, so we don’t need to invent one for every such feature request. The new API/overload seems to add unnecessary complexity and overhead for majority of analyzer authors who don’t care about this feature or different flavors of generated code. I’d rather prefer this knowledge be baked into analyzers that want to make fine grained decisions about kinds of generated code they want to analyze. |
FWIW the convention of having
I think this is the case for all of the proposals above, regardless of where the generator ID value actually comes from. An analyzer doesn't need to know or care about the API - they may simply opt-in to filter via the new method. Otherwise they can remain oblivious to the whole thing. |
The team views identifying generated files as a heuristic and does not want an implicit API contract about how they are discovered. I personally don't like this approach because it makes the code generator and the analyzer tightly coupled. What if I want to analyze a generated file but I do not have access to the source for the generator to update it to output
Thats fair, I am coming from the perspective of wanting to do this feature request in the first place instead of using one of the many other mechanisms we have as a workaround. Its always acceptable to say that the effort doesn't justify the benefit. The main reason I don't like analyzers configured via editorconfig that are deployed via nuget package is that it is very hard for new developers to debug. The most common complaint I hear from analyzer authors starting out is "why wasn't my code called in this case?". Determining that now means I need them to paste a combination of:
Yes that is just 2 things, but one of them you can attach the debugger to and determine what went wrong. It can take a while for me to recognize when an Overall, parsing the comments at the top of the file in your analyzer is not going to incur any performance overhead and is immediately available so I would do that. Adding a new API (whatever that would be) would need to first have lots of people asking for it. I suspect that when the source generators feature ships with C# this may cause enough people to need this that we would do it. |
Indeed. I've measured this myself. Since the trees are already available to the analyzer it's on the order of microseconds (or less) to just get the token and check it. When doing this for a feature that checked for copyrights, i think testing all of roslyn took a couple of ms. |
tl;dr Allow analyzers to process only certain types of generated code but not others (depending on who generated it).
The problem
Analyzers can currently opt into or out of processing generated code for perf reasons via
ConfigureGeneratedCodeAnalysis
. However, at the moment this is a coarse yes/no flag.In dotnet/aspnetcore#16922, some ASP.NET analyzers really are supposed to process ASP.NET-generated Razor files. However, as a result they also process EF core migration files, severely impacting build times. We'd ideally want the analyzer to process only the Razor files.
Proposal
Roslyn already recognizes
<auto-generated />
in a comment on the beginning of source files. We could extend this with an optionaltype
attribute:// <auto-generated type="Razor" />
On the analyzer side, we could add an overload of
ConfigureGeneratedCodeAnalysis
that accepts a string type parameter and a GeneratedCodeAnalysisFlags. The current setting would continue to act as a general default, with the new overload type-specific overload allowing finer-grained opt-in (or opt-out)./cc @agocke @jaredpar @rainersigwald @mvasani @rynowak @Pilchie @ajcvickers
The text was updated successfully, but these errors were encountered: