Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for a way to ensure some source generators run before / after others #57239

Open
pranavkm opened this issue Oct 19, 2021 · 103 comments
Open

Comments

@pranavkm
Copy link
Contributor

We briefly discussed this during 6.0 / C# 10 but ultimately punted it since the ask came in too late. We've had issues where user-written source generators can no longer observe the output of source generated by Razor's:

For 6, the solution we gave our users is to turn off the source generator and fall back to our 2-phase compilation process. We'd made some progress discussing this internally, but filing this for C# 11 consideration.

@pranavkm pranavkm added Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels Oct 19, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 19, 2021
@pranavkm pranavkm removed Concept-API This issue involves adding, removing, clarification, or modification of an API. Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 19, 2021
@bugproof
Copy link

I really want to use this feature with Blazor but seems like I have to rely on AOP frameworks instead.

@jcouv
Copy link
Member

jcouv commented Mar 1, 2022

Another instance of this problem which was reported is using RegexGenerator in Blazor project.

@MarkFl12
Copy link

Yeah, I've been trying to build a source generator to help with Blazor projects, but I think what's happening is that it needs to run before the generator that transforms the .razor files so that when their generator runs the code I've generated that's used in them is available

@jcouv jcouv added this to the Backlog milestone Mar 17, 2022
@RyoukoKonpaku
Copy link

+1 for this.

We've been using generators for Blazor to generate a strongly typed class that contains all routes of the application so routing isn't stringly typed and much less prone to human errors. This relies on us having to us finding route attributes on the generated files by the razor compiler. Current workaround for us is to define the RouteAttribute in a codebehind so our source generator can see it, but ideally it'd be nice to keep using the @page directive for this.

@Blackclaws
Copy link

I think we might want to get some traction or at least ideas going on how this might be be achieved.

In general I see that there are at least two ways this could work.

  1. Re-run generators on the output of all other generators until a steady state is achieved

This would probably be the easiest way to implement this. However this already sounds like a really bad idea in case you build two generators that always trigger each other, leading to an infinite loop. This would however allow generators to use each other, with the possibility of terminating after a maximum amount of round trips.

  1. Actually order generators properly.

The way this could work is by having introducing for example an interface

public interface IOrderedGenerator
{
   IEnumerable<string> GetBeforeGenerators();
   IEnumerable<string> GetAfterGenerators();
}

With the returned strings understood as assembly names.

This would allow a generator to declare both before and after targets.

  • Before generators is relevant for generators that themselves generate attributes that are understood by other generators.
  • After generators make sense in the context of generators like the Razor Source generator that create code from additional files

This method would also make it easy to detect loops, allow parallel execution of generators that aren't part of the same dependency graph, etc.

@sanchez
Copy link

sanchez commented Sep 6, 2022

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

@Youssef1313
Copy link
Member

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

This will prevent possible parallelization, I think

Suppose GeneratorA should run after Generator B, but GeneratorC doesn't have any dependency.
It should be possible to start running both GeneratorA and GeneratorC in parallel.

@sanchez
Copy link

sanchez commented Sep 6, 2022

True. I hadn't considered that. You could group/nest sources thought right?

<Generators>
  <Step>
    <Source Type="GeneratorA" />
    <Source Type="GeneratorB" />
  </Step>
  <Step>
    <Source Type="GeneratorC" />
  </Step>
</Generators>

Each step being parallel within but being sequential adjacent. So A and B parallel together and C after

Edit: sorry realized I messed up your example a bit. Here A and B have no dependencies but C does for example

@Blackclaws
Copy link

Blackclaws commented Sep 6, 2022

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

The problem I think is that this firmly puts the burden on the developer using the generator to know what it generates and how the dependencies between generators might play out.

I think the pit of success is rather that generators themselves define if they handle output from other generators or generate output that other generators (possibly dependencies) consume. A developer wouldn't really want to worry about these specific details and just expects to be able to use source generators in a way that they can build on one another without needing an extra step.

Also this would introduce another point where this needs to be validated etc.

@CyrusNajmabadi
Copy link
Member

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

@juarezasjunior
Copy link

I liked the idea of the dev specify the order in .csproj. But, sometimes it could be boring, special if the dev is using a package that he doesn't know that it has code being generated.
What about if we have both options? The default behavior is the same that we have now. The projects in .csproj will execute without order. But, if we want to execute it following an order, we may specify the order in .csproj like mentioned by @sanchez #57239 (comment)?

@phil-scott-78
Copy link
Contributor

I'm not sure ordering would ever be possible. Take the Razor generator. There are a decent amount of generators that would need to look at the code in the razor files while also letting those razor files consume the generated code. I think that leaves multi-pass runs as the only option.

I suspect, maybe with hopeless optimism, that if this multi-pass behavior was restricted to incremental generators that the output could be examined rather quickly to determine if more runs are required, with a failsafe to bail if recursion is detected.

@Blackclaws
Copy link

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

Performance is secondary kind of. This is a feature that might have a decent performance hit, but as long as its something that generators themselves have to implement in order for that to matter I think the problem isn't that big.

I liked the idea of the dev specify the order in .csproj. But, sometimes it could be boring, special if the dev is using a package that he doesn't know that it has code being generated. What about if we have both options? The default behavior is the same that we have now. The projects in .csproj will execute without order. But, if we want to execute it following an order, we may specify the order in .csproj like mentioned by @sanchez #57239 (comment)?

I think specifically that the default option being no order doesn't make a lot of sense. If I already know as the author of a generator that it will generate output that another generator has to process I want to be able to declare this in advance.

I'm not sure ordering would ever be possible. Take the Razor generator. There are a decent amount of generators that would need to look at the code in the razor files while also letting those razor files consume the generated code. I think that leaves multi-pass runs as the only option.

I suspect, maybe with hopeless optimism, that if this multi-pass behavior was restricted to incremental generators that the output could be examined rather quickly to determine if more runs are required, with a failsafe to bail if recursion is detected.

I think for razor files specifically this is a difficult situation. Looking at the generated code, I don't think there is a reason why Razor itself would have this sort of circular dependency on the output of the source generator. It might be there and since this is definitely one of the most visible use cases for chaining generators we definitely have to think about it though.

The razor source generator might itself be split into two parts though. The first part generates a file that just has all the attributes and declares a partial class, and the second part implements the actual rendering pipeline. That way you could hook into the middle of the razor generator.

@HavenDV
Copy link

HavenDV commented Mar 13, 2024

@Cybrosys For some cases, next approach may work - move part of the project to a library in which one of the generators will work. And apply the second generator to the project using the library.
Of course, you can also run multi-level compilation by adding a little MSBuild code, but this cannot be applied as a general solution because it will create huge performance problems

@Cybrosys
Copy link

Cybrosys commented Mar 13, 2024

@Cybrosys For some cases, next approach may work - move part of the project to a library in which one of the generators will work. And apply the second generator to the project using the library.

Now this was a very interesting workaround. Only caveat for multiple projects is that it messes up the project folder structure depending on how things are organized 😄

@CyrusNajmabadi
Copy link
Member

Is there a tracking issue for making the compilation incremental?

Compilations are already incremental in all the ways that are safe to do so based on the design of the language. Tons of changes though simply cannot be incremental, and have deep implications on what must be reanalyzed.

SGs also have an viable incremental path that we recommend. HOwever, even with this path, the cost of doing incremental compilation is still very high, and continually a problem for performance for higher layers of our stack.

It has been a while since this appeared in the discussion and I didn't re-read, but I think the compilation being replaced every build was a major source of the performance issues?

The compilation is replaced, but it happens in an incremental fashion. That said, incremental doesn't mean we still meet the perf needs of all the layers that sit on top of the compielr.

It's a solved problem making object graphs immutable while still allowing incremental updates from the latest version, similar to how VS TextBuffers allow edits only on the newest version. I think that would be sufficient for Roslyn as it likely doesn't need to make new compilations based on anything but the previous compilation.

We already do this.

(For people who don't know how that works, its usually called "persistent data structures" not "immutable" and works by reserving a variable in each object to record how a future version changed, instead of immediately making a copy when a new version is required.

We already do this.

I think the main paper explaining this technique was called "making data structures persistent" but I may be mistaken, its been a while I've looked into this topic.)

You may be thinking of "Purely Functional Data Structures" by Chris Okasaki. I can assure you, we're already using those designs and techniques. But we still have extreme latency goals as well, and even though have trouble.

Sometimes it's not possible to get high flexibility and high performance.

I understand if you have different priorities at the moment, but rejecting that this is a solvable problem seems ... wrong?

No one has rejected it as a solvable problem. WHat's been said is that until we have a design that solves the problem, it's much more difficult to get something in here due to the devastating impact it could have.

@PeterHevesi
Copy link

PeterHevesi commented Jul 24, 2024

I have found this awesome tool called Metalama based on Roslyn (they have their own fork of Roslyn), that can do AOP, but beside that, can also generate code (new fields and properties, new methods and even classes), so many existing Roslyn generators could be easily rewritten using Metalama.

And you can specify order of those "Metalama aspects" there! While not needing any additional compilations !

It cannot solve the problem with built-in .NET's Roslyn generators, as all "Metalama aspects" run after Roslyn source generators..
But at least while using Metalama, you can specify order between those "aspects" without performance killing !

PS: I think Microsoft should buy Metalama, and make it a free part of .NET :D
As it already provides aspects, which are much better than current state of Roslyn interceptors, and already provides an ability to do ordering!

@alrz
Copy link
Member

alrz commented Jul 25, 2024

@CyrusNajmabadi

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

I think only the immutable syntax/operation tree generated by a previous generator is sufficient to write a source generator based on another generator's output.. would that have a performance cost compared to the way they are currently processed?

@weltkante
Copy link
Contributor

weltkante commented Jul 26, 2024

I think only the immutable syntax/operation tree generated by a previous generator is sufficient to write a source generator based on another generator's output

Some (many?) generators eventually access the semantic model, which isn't calculated incrementally, and is the expensive part of chaining generators. It was mentioned previously further up this discussion.

@b-straub
Copy link

For me this discussion goes somewhat off-topic. While it might be desirable chaining arbitrary SG together, I would be more than happy, if my SG could use the increasing number of runtime SGs such as the one for JSON. Especially for AOT aware development this is a major issue.

It's one thing implementing something experimental, that might turn out being not the right path to the future, but something totally different increasingly using this feature across the entire runtime and still complaining about the limitations of the approach.

@alrz
Copy link
Member

alrz commented Jul 26, 2024

I think only the immutable syntax/operation tree generated by a previous generator ..

Some (many?) generators eventually access the semantic model, which isn't calculated incrementally,

I think that would be the semantic model from the current compilation (which is and stays immutable), not of the generated code from another generator.

I would drop "operation tree" from my previous comment if that is indeed where this gets problematic, only the syntax should be sufficient as the 'common model' between generators. If you're building on top of another, you're already coupled to its exact output, so there's no reason to do semantic analysis on the generated code which almost certainly will have a predefined structure.

For example, imagine you want to add some partial definitions on types generated by another generator. The only thing you would need is the syntax, the semantics are completely predictable at that point.

@weltkante
Copy link
Contributor

weltkante commented Jul 26, 2024

only the syntax should be sufficient as the 'common model' between generators. If you're building on top of another, you're already coupled to its exact output, so there's no reason to do semantic analysis on the generated code which almost certainly will have a predefined structure.

You often need names bound to something to know their meaning; most commonly used for types, for example to check if the type returned by a property is a class or struct, or to check for attributes on the type. There is a lot that you can't do with just syntax. There's a reason why even the builtin code generators require the semantic model.

For example, imagine you want to add some partial definitions on types generated by another generator. The only thing you would need is the syntax, the semantics are completely predictable at that point.

You probably need to know the return type. Sure, its predictable, but then you replicate the binding of names to symbols that the semantic model does.

If you only need to copy/paste the name of the return type, sure, you can work around that and use the name as it was spelled in the original syntax tree, but that quickly runs into issues with name colissions due to using namespaces and aliases. Makes the code generator fragile not to use the semantic model to normalize names.

But even beyond that, if its anything more complicated than a text template code generator it will want to check the type for being value or reference types, or maybe whether an interface is implented, certain attributes are set, lots of things need to be checked that require the semantic model when you have a non-text-template generator that operates on a semantic level and not just copy/pastes some names into a predefined template.

@alrz
Copy link
Member

alrz commented Jul 26, 2024

for example to check if the type returned by a property is a class or struct

For anything like that, if it comes from the generator, you should be able to determine that based on how the generator work in the first place, if it comes from the source, you can rely on the root compilation that is already available. And as you pointed out, names will be emitted verbatim from the syntax.

Checking for name collisions or attributes may need to be a little more involved considering we only have the syntax, but that is not downright impossible. BTW, we're talking about attributes or names coming from the generator, in which case you will have some understanding of the structure you're working with.

@weltkante
Copy link
Contributor

You clearly speak of a specific scenario you have in mind, i.e. your own usecase, but I can only repeat its not true, not even for the code generators in the framework, nor the ones I have written myself.

The code generator needs to look at the attributes defined on a type. This type may or may not come from another assembly. This type has your code generators attributes (or some well-known ones from someone else) on it, but to look at them you need to resolve the type i.e. replicate the functionality of the name binder and semantic model. Think of marshalling attributes if you don't know any examples.

Struct vs. class is important for storage, serialization and marshalling scenarios because they need to create different code. Again, the type you need to know this about can be either in the current compilation unit or being referenced, so you need it resolved.

Often a code generator needs to look at all properties of a type. People (including me) have joined in here because the missing chaining of source generators means e.g. generators for serialization won't see properties from other generators.

All that requires a semantic model, either the inbox one or a replication of part of its logic into the source generator. Just operating on text (i.e. syntax tree) isn't sufficient.

@codymullins
Copy link

There's a lot of discussion going on, which is great, but I'd love a clear understanding of what specifically needs to happen for this item to be under active investigation. As it stands, it feels this issue is not a priority at all for the .NET team, while the community continually circles back to wanting this feature to be expanded. @CyrusNajmabadi or another maintainer, if you could help us summarize where we are here - that would be wonderful.

@CyrusNajmabadi
Copy link
Member

Nothing has changed here. The rules and restrictions around generators have been clearly delineated and we have no plans on changing them currently.

@codymullins
Copy link

codymullins commented Aug 2, 2024

Thanks. So should this be marked as "Closed, won't do"?

@CyrusNajmabadi
Copy link
Member

Thanks. So should this be marked as "Closed, won't do"?

Not for me. Not planning to do anything now doesn't mean we won't do something in the future.

@kzu
Copy link
Contributor

kzu commented Aug 2, 2024

My advise to anyone coming here: the team does not seem interested in solving this.
The only workaround possible is to re-run the generators that should be aware of your code, from your own generator.

For example:

  1. Orleans (thankfully!) created a separate package with its CodeGenerator which you can invoke from your own source generator. Theirs mostly just calls it out too so you can copy/paste a bit and be done.
  2. Razor provides the language package which you can also invoke manually like so:
        var generated = new HashSet<string>(texts.Where(x => x.Path.EndsWith(".g.cs")).Select(x => x.Path));
        var razors = new HashSet<string>(texts.Where(x => x.Path.EndsWith(".razor")).Select(x => x.Path));
        var options = compilation.SyntaxTrees.FirstOrDefault()?.Options as CSharpParseOptions;
        
        var fs = RazorProjectFileSystem.Create(project.ProjectDirectory);
        var engine = RazorProjectEngine.Create(
            RazorConfiguration.Default, fs,
            builder => builder.SetRootNamespace(project.RootNamespace));

        var trees = fs.EnumerateItems(project.ProjectDirectory)
            // Filter out files we find that aren't razor files currently passed in for generation for whatever reason, via targets
            .Where(x => razors.Contains(Path.GetFullPath(x.CombinedPath)))
            .Select(x => CSharpSyntaxTree.ParseText(engine.Process(x).GetCSharpDocument().GeneratedCode, 
                path: Path.GetFullPath(x.CombinedPath), options: options))
            .ToImmutableArray();

        var razored = compilation.AddSyntaxTrees(trees);

(this allows you to inspect the types that razor would generate, since you cannot run after it and see its output).

Pretty sure for the JSON generator you could end up doing something similar, or eventually just invoke their generator manually by loading the assembly from the analyzer assemblies you can pass to your generator as additional files and do a bit of reflection, etc.

And so on. Eventually, all of this will become so slow when everyone is doing it, that they might come back to this and finally see the importance of the use case. And then they will have to go chasing everyone who was forced to work around this incomplete design and beg them to stop doing so and use the "improved design" then. Pretty sure there will be analyzers trying to stop you from just getting the job done, before they just give up and fix it. :)

It's not really hard to imagine the myriad scenarios where this is necessary. I alone was faced by the above two myself in about a year. It's inevitable.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Aug 2, 2024

My advise to anyone coming here: the team does not seem interested in solving this.

The team is interested. But no one (internal or external) has been able to find a viable solution. We can't move forward without an actual design that will work.

before they just give up and fix it. :)

How do you suggest fixing it? We have no magic bullet at our disposal.

@kzu
Copy link
Contributor

kzu commented Aug 3, 2024

I'm confident the day when one source-generated feature in the BCL depends on another source-generated feature, a viable solution will be discovered. I'm just hoping that happens sooner than later, but I know it will happen.

@CyrusNajmabadi
Copy link
Member

I'm just hoping that happens sooner than later,

If a suitable design can be presented, we'd be very interested. That is true whether that design comes from BCL partners, or the community. Neither has been able to come up with a way to address the problems presented so far.

@kzu
Copy link
Contributor

kzu commented Aug 3, 2024

I'm pretty sure when the heat comes from BCL partners, "a suitable design will present itself" 😉

@CyrusNajmabadi
Copy link
Member

I'm pretty sure when the heat comes from BCL partners, "a suitable design will present itself" 😉

That hasn't happened so far. The opposite happened. We created a limited, but workable design, and we pushed the entirety of the bcl onto it. That's what v2 generators were. Exactly because it's not tenable to try to build on to if a bad foundation.

@CyrusNajmabadi
Copy link
Member

If a suitable design is presented, we will consider it. But so far that hasn't happened. We have not come up with one. Our partners have not come up with one. The community has not come up with one. People need to stop thinking there's some magic bullet here, or that we're not doing it because we don't see the value. We're not doing it because no one knows how. If that changes we can take next steps.

@dotnet dotnet locked and limited conversation to collaborators Aug 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.