Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] enable code generating extensions to the compiler #5561

Closed
mattwar opened this issue Sep 29, 2015 · 190 comments
Closed

[Proposal] enable code generating extensions to the compiler #5561

mattwar opened this issue Sep 29, 2015 · 190 comments

Comments

@mattwar
Copy link
Contributor

mattwar commented Sep 29, 2015

Often when writing software, we find ourselves repeatedly typing similar logic over and over again, each time just different enough from the last to make generalizing it into an API impractical. We refer to this type of code as boilerplate, the code we have to write around the actual logic we want to have, just to make it work with the language and environment that we use.

One way of avoiding writing boilerplate code, is to have the computer generate it for us. After all, computers are really good at that sort of thing. But in order for the computer to generate code for us it has to have some input to base it on. Typical code generators are design-time tools that we work with outside of our codebase, that generate source that we include with it. These tools usually prefer their input to be XML or JSON files that we either manipulate manually or have some WSIWYG editor that lets us drag, drop and click it into existence. Other tools are build-time, that get run by our build system just before our project is built, but they too are driven by external inputs like XML and JSON files that we must manipulate separately from our code.

These solutions have their merits, but they are often intrusive, requiring us to structure our code in particular ways that allow the merging of the generated code to work well with what we’ve written. The biggest drawback, is that these tools require entire facets of our codebase to be defined in another language outside of the code we use to write our primary logic.

Some solutions, like post-build rewriters, do a little better in this regard, because they operate directly on the code we’ve written, adding new logic into the assembly directly. However, they too have their drawbacks. For instance, post-build rewriters can never introduce new types and API’s for our code to reference, because they come too late in the process. So they can only change the code we wrote to do something else. Even worse, assembly rewriters are very difficult to build because they must work at the level of the IL or assembly language, doing the heavy lifting to re-derive the context of our code that was lost during compilation, and to generate new code as IL and metadata without the luxury of having a compiler to do it. For most folks, choosing this technique to build tools to reduce boilerplate code is typically a non-starter.

Yet the biggest sin of all, is that all of these solutions require us to manipulate our nearly unfathomable build system, and in fact requires us to have a build system in the first place, and who really wants to do that. Am I Right?

Proposal: Code Injectors

Code injectors are source code generators that are extensions to the compiler you are using, as you are using it. When the compiler in instructed to compile the source code you wrote, code injectors are given a chance to exam your code and add new code that gets compiled in along with it.

When you type your code into an editor or IDE, the compiler can be engaged to provide feedback that includes the new code added by the code generators. Thus, it is possible to have the compiler respond to your work and introduce new code as you type that you can directly make use of.

You write a code injector similarly to how you write a C# and VB diagnostic analyzer today. You may choose to think of code injectors as analyzers that instead of reporting new diagnostics after examining the source code, augment the source code by adding new declarations.

You define a class in an assembly that gets loaded by the compiler when it is run to compile your code. This could easily be the same assembly you have used to supply analyzers. This class is initialized by the compiler with a context that you can use to register callbacks into your code when particular compilation events occur.

For example, ignoring namespaces for a moment, this contrived code injector gives every class defined in source a new constant field called ClassName that is a string containing the name of the class.

[CodeInjector(LanguageNames.CSharp)]
public class MyInjector : CodeInjector
{
    public override void Initialize(InitializationContext context)
    {
        context.RegisterSymbolAction(InjectCodeForSymbol);
    }

    public void InjectCodeForSymbol(SymbolInjectionContext context)
    {
        if (context.Symbol.TypeKind == TypeKind.Class)
        {
            context.AddCompilationUnit($@”partial class {context.Symbol.Name} 
                      {{ public const string ClassName = “”{context.Symbol.Name}””; }});
        }
    }
}

This works because of the existence of the C# and VB partial class language feature.
Of course, not all code injectors need to be in the business of adding members to the classes you wrote, or especially not adding members to all the classes you wrote indiscriminately. Code injectors can add entirely new declarations, new types and API’s that are meant to simply be used by your code, not to modify your code.

Yet, the prospect of having code injectors modify the code you wrote enables many compelling scenarios that wouldn’t be possible otherwise. A companion proposal for the C# and VB languages #5292 introduces a new feature that makes it possible to have code generators not only add new declarations/members to your code, but also to augment the methods and properties you wrote too.

Now, you can get rid of boilerplate logic like all that INotifyPropertyChanged code you need just to make data binding work. (Or is this that so last decade that I need a better example?)

Subjects not covered in this proposal but open for discussion too

  1. Ordering of Injectors – this concerns the order of injectors, which are run first etc., and the order of the new sources as presented to the compiler. The is of interest to the supersedes feature proposed in [Proposal] add supersede modifier to enable more tool generated code scenarios. #5292
  2. Callback events – beyond callbacks for type symbols declared in source, what other callback patterns would be useful for code generators, keeping in mind that these will likely need to be invoked by the IDE as well.
  3. Having multiple injection event handlers lead to the generation of the same source code, only once, and being smart about it.
  4. Recursion - Can generated code trigger additional code injection events for the newly injected declarations? I’d rather the answer be No, since this will make the system much simpler.
  5. More?
@daveaglick
Copy link
Contributor

Looking forward to seeing where this discussion goes. The last time I remember metaprogramming being discussed, or more generally compile-time hooks, it sounded like the team wanted a little more time to see how things shook out (#98 (comment)). Hopefully now that it's 8 months later and the library and tooling is in the wild it's a good time to revisit.

I personally really like the idea of using code diagnostics and fixes to drive this functionality. We already have great tooling around developing and debugging them, and Roslyn already knows how to apply them. Once developed, there's also already a mechanism for explicitly applying them to target code to see how they'll look.

I can envision a variety of use cases for this concept, from simple one-off fixes to things like entire AOP frameworks based on applying code fixes in the presence of attributes that guide their behavior.

@sharwell
Copy link
Member

I wrote the code generation portions for the C# targets of ANTLR 3 and ANTLR 4 using a pattern similar to XAML. Two groups will shudder at this statement: developers working on MSBuild itself, and the ReSharper team. For end users working with Visual Studio, the experience is actually quite good. There are some interesting limitations in the current strategy.

Reference limitations

It is possible for C# code to reference types and members defined in the files which are generated from ANTLR grammars. In fact, from the moment the rules are added to the grammar (even without saving the file), the C# IntelliSense engine is already aware of the members which will be generated for these rules.

However, the code generation step itself cannot use information from the other C# files in the project. Fortunately for ANTLR, we don't need the ability to do this because the information required to generate a parser is completely contained within the grammar files.

Undocumented build and IntelliSense integrations

The specific manner in which a code generator integrates with the IntelliSense engine (using the XAML generator pattern) is undocumented. This led to a complete lack of support for the code completion functionality described in the previous section in other IDEs and even in ReSharper.

@ghord
Copy link

ghord commented Oct 1, 2015

Let's deal with the problem of undefined order of such transformations. In normal OO code, this pattern is conceptually similar to decorator pattern. Take a look at this code:

var logger = new CachingLogger(new TimestampedLogger(new ConsoleLogger()));

vs

var logger = new TimestampedLogger(new CachingLogger(new ConsoleLogger()));

The nice thing is that we have to manually specify the order of wrapping the class. This seems like the most obvious answer: Let the programmer specify the order.

I think this would be simple to do cleanly with attributes which would be somehow wired up to cause this transformations

[INoifyPropertyChanged, LogPropertyAccess]
public class ViewModel
{
   //gets change notification and logging
   public int Property { get; set; } 
}

They could be specified at assembly/class/member level to represent all kinds of transformation scope.

The problem is that until now, attributes serve as metadata only option - now instead they modify source code. Maybe user defined class modifiers would be better:

public notify logged class ViewModel
{
    public int Property { get; set; }
}

Where notify and logged are user defined somehow.

@mattwar
Copy link
Contributor Author

mattwar commented Oct 1, 2015

@ghord That's a good idea. If we limit the code generators to only working on symbols with custom attributes explicitly specified, we can order the code generators by the order of the attributes as specified in the source.

@daveaglick
Copy link
Contributor

@mattwar @ghord While I think the use of attributes to guide the code generation process could work (it's worked well for PostSharp, for example), I'd love to see a more general solution that isn't directly tied to a specific language syntax or feature. That's why I mentioned being able to apply analyzers and code fixes as a possible approach.

The way I would envision this working is that the compiler would be supplied with a list of analyzers and code fixes to automatically apply just before actual compilation. It would work as if the user had manually gone through the code and applied all of the specified code fixes by hand before compiling.

Benefits

I suspect that this could be achieved with a minimal amount of changes to existing Roslyn, at least functionality-wise (though it may take some serious refactoring - I have no idea). Of course, the compiler would need a mechanism for specifying the analyzers and code fixes and applying them during compilation. Note the following:

  • We already have the concept of diagnostics to describe those portions of code we're interested in changing. Perhaps a new DiagnosticSeverity is needed to indicate potential code generation, or maybe a DiagnosticSeverity of Hidden could be used with some other indication. Regardless, diagnostics can be used to identify where code generation should take place.
  • We also already have the concept of analyzers to figure out where such diagnostics should apply.
  • We also already have the concept of code fixes to specify, in as flexible a way as possible, what should be done to the code.
  • The distinction between analyzers, diagnostics, and code fixes is a nice separation that could also be leveraged for code generation. For example, I could write a code generation code fix that would fix up some built-in diagnostic on every build.
  • Visual Studio already has support for developing analyzers, diagnostics, and code fixes. We can scaffold them with a template, debug them on real code, etc.
  • Visual Studio also already has support for applying analyzers and code fixes so that code generation authors and users can see exactly where their code generation will apply and can preview what it will do (and even apply it beforehand if needed).

There would also be some synergy with this approach between existing authors of conventional analyzers and code fixes and those intended to be used for code generation. Existing code fixes could also be adapted or possibly applied wholesale during the code generation stage (if specified). The tooling and process would be the same so skills could be leveraged for either.

Challenges

I do see the following questions or complications with this approach:

  • How to supply the analyzers and code fixes to the compiler? Would it be a command-line argument? An external file? Something in the .config file (or equivalent).
  • There's still the problem of ordering. What if one code fix changes the code is such a way that the next one to be run no longer applies? How could that be reported to the user? Would chaining too many code fixes together create unexpected behavior (though I think this is a challenge with any automatic code generation process)?
  • Debugging. How can the newly generated and/or changed code be debugged? How to make sure that the .pdb or other debugging artifacts can still trace back to the original code?
  • Analyzers and code fixes would have to be totally decoupled from Visual Studio. My understanding right

An example that uses attributes

Getting back to the use of attributes, one of the big examples of this approach that I've been thinking about is using it to build out a full AOP framework similar to what PostSharp does. In this case, an analyzer would be written that looks for the presence of specific attributes as defined in a referenced support library. When it finds them, it would output diagnostics that a code fix would then act on. The code fix would then apply whatever code generation is appropriate for the attribute.

My favorite PostSharp aspects is OnMethodBoundaryAspect, which allows you to execute code defined in your aspect attribute class before method entry and after method exit. Something similar could be constructed by having a code fix inject calls to methods contained in a class derived from a specific attribute for any method that has said attribute applied to it.

You could potentially build up an entire AOP framework by creating analyzers and code fixes that act on pre-defined attributes and their derivatives. The point, though, is that you wouldn't have to. The code generation capability could be as flexible and general as analyzers and code fixes themselves, which because they directly manipulate the syntax tree can do just about anything.

@MgSam
Copy link

MgSam commented Oct 1, 2015

Very happy to see a proposal for this on the table. Are attributes how you envision applying a Code Injector? I think specifying the syntax for applying them is needed in the proposal.

Is the idea that CodeInjections all take place prior to build so that you can see and possibly interact with the members it generates? If so, I think being able to interact with the generated code is a another huge benefit that you should mention in your proposal. When using PostSharp, anything you have it generate doesn't exist until build time, so you can't reference any of it in your code.

@ghord The problem with your proposal on ordering is that you might not define the injections in the same place. For example, you could have an code-injection attribute on a class NotifyPropertyChanged and then a code-injection attribute on a method Log. Which one should be applied first? I think you need a way of explicitly specifying an overall ordering when you invoke a CodeInjector (probably just an integer).

@paulomorgado
Copy link

Using a property on an attribute to specify order is not new to the framework:

DataMemberAttribute.Order Property

But I think that, if order is important, then there's something wrong.

Notifying property change is something that is expected form the consumers of an object, not that the object expects itself. So, as long as it's done, the order doesn't matter.

Logging is the same thing. If you want to log the notification, than that is not logging the object but logging the notification extension.

Is there any compeling example where one extension influences the other and order matters and it can still be considered a good architecture?

@MgSam
Copy link

MgSam commented Oct 2, 2015

@paulomorgado Yes, there are any number of use cases. For example, you want to have some authorization code run before some caching code. PostSharp has several documentation pages about ordering.

@MrJul
Copy link
Contributor

MrJul commented Oct 2, 2015

Rather than ordering at use site, why not let injectors specify their dependencies using something akin to those PostSharp attributes @MgSam is linking to? (Or OrderAttribute in VS.) Depending on the order of the attributes at the use site seems very brittle to me and prevent using them at different scopes.

@ghord
Copy link

ghord commented Oct 2, 2015

There are some issues with attributes which we will have to overcome for this to work:

  1. Partial classes and members in separate files: attributes from which part of the class/member have the priority?
  2. Assembly attributes: attributes from which file have the priority?

We could make the order alphabetical according to file names. I'm pretty sure that in 99% cases the order won't matter, but leaving undefined behavior such as this in the language is very dangerous - application could crash or not depending on the applying order of transformations.

@paulomorgado
Copy link

@ghord, what in this proposal influences assembly attributes?

@Inverness
Copy link

I think code generation support for the compiler would be fantastic. I'd love to be able to do something similar to what PostSharp provides.

PostSharp's more limited free version, and the requirement to submit an application to get an open source project license makes me unwilling to look at it for anything but larger projects at work that we would invest money in.

I'd like to be able to have great AOP tools for everyday/hobby projects without additional hassle.

@daveaglick For debugging, if code generation is only happening after things are sent to the compiler, wouldn't inserting line directives into the syntax tree preserve the integrity of the debugging experience?

I did make a syntax tree rewriter for Roslyn to implement a simple method boundary aspect. I used a Roslyn fork to get this hooked in during compile time. Line directives ensured there was no issue with debugging. It was an interesting experience and an example of something I'd like to be able to do without jumping through hoops.

One issue I had though was the fact that I was working at the syntax tree stage deprived me of information that was needed from the bound tree stage. Is there a way to know about type relationship information at this point? When you see an attribute on a class how will you know that it subclasses MethodBoundaryAspect or whatever?

@AdamSpeight2008
Copy link
Contributor

Is this like F#'s type providers?

@mattwarren
Copy link

It's great to see this being proposed, I remember asking a while back if this was being considered.

I think that it should be possible to have the modified source code written out to a temp folder, to make debugging easier. Either by default or controllable via a flag.

I also think that having to apply an attribute to the parts of the source code that can be re-written is a nice idea as it makes the feature less magical and it's easier reason about.

@mattwarren
Copy link

@AdamSpeight2008 I don't think so, I see this feature more as a compiler step that lets you modify code before it's compiled. But crucially this isn't meant to be seen by the person who wrote the code, it happens in the background when the compiler runs.

My understanding of type providers is that they integrate more into the IDE and help you when you are writing code that works against a particular data source (by providing intellisense, generate types that match the contents of a live database, etc)

@Pilchie Pilchie removed the Area-IDE label Dec 4, 2015
@mausch
Copy link

mausch commented Dec 21, 2016

@Inverness https://github.com/StackExchange/StackExchange.Precompilation already does just that.

@Dennis-Petrov
Copy link

Another option we discussed was to simply move this out of the core compiler. Instead move it to say the MSBuild pipeline

Debugging of generated code inside IDE is important. As far as I understand, this will be unavailable in case of MSBuild.

@davidfowl
Copy link
Member

Another option we discussed was to simply move this out of the core compiler. Instead move it to say the MSBuild pipeline. The same generator experience can be defined in MSBuild but it doesn't have the same expectations around an IDE experience.

I would start with that but still have a hook at the csc level (similar to analyzers). That makes it run at the "right" time. This is essentially what compile modules did (ignored the IDE 😄 because it was hard.).

@Antaris
Copy link

Antaris commented Dec 22, 2016

^^ saying that though, you could debug them by throwing in a Debugger.Launch and attaching Visual Studio - wasn't perfect, but was doable.

@jaredpar
Copy link
Member

Debugging of generated code inside IDE is important. As far as I understand, this will be unavailable in case of MSBuild.

That should work fine actually. The MSBuild approach would augment the compilation with additional files. As this isn't done in the compiler these files would need to reside physically on disk (likely in the obj folder). Hence debugging would just work.

@alrz
Copy link
Member

alrz commented Dec 22, 2016

Could something like CallerMemberNameAttirbute be implemented by generators? I believe caller info attributes are just a matter of code generation and can be done outside of the compiler, however, it needs "inspecting" the code, rather than adding a compilation unit and replacing members in declaration-site i.e. replace/origin. I'm sure a lot more interesting AOP scenarios can be implemented with generators if said API exists.

@jaredpar
Copy link
Member

@alrz

Could something like CallerMemberNameAttirbute be implemented by generators?

There are two types of generators to consider:

  1. Augmenting: this simply adds additional source files, or possibly references, to the compilation.
  2. Modifying: this can modify any aspect of the compilation including source, references, etc ...

An modifying generator would be able to implement CallerMemberNameAttribute. It has the ability to modify the source file where the user added the attribute and hence can add the necessary information.

An augmenting generator would not be able to. It can only add source files hence can't modify the file authored by the user where the [CallerMemberName] annotation was used.

Note that when I've discussed generators on this thread I've mostly been talking about an augmenting generator. Those IDE problems I've discussed for augmenting generators pale in comparison to the challenges faced by a modifying generator. How for instance do you design a rational IDE experience around a plugin that can virtually erase the keystroke you are currently typing in the emitted binary? It's quite daunting and likely there is no sane possible experience.

These problems are why the compiler team eventually took on a two prong solution: augmenting generators + language features to make generators more powerful. The latter has been done in the past (think partial types and methods). The original / replaces model extended that to allow a lot more flexibility.

@alrz
Copy link
Member

alrz commented Dec 22, 2016

@jaredpar

How for instance do you design a rational IDE experience around a plugin that can virtually erase the keystroke you are currently typing in the emitted binary?

It modifies the AST that is handed to the emitter to generate the final binary. I think this shouldn't go back and forth in the same assembly boundary. For this particular scenario, modifying the invocation wouldn't even affect other parts of the code, so there is no need to know what has been changed. I agree in any other cases that need dramatic changes to members declarations, replace/original can do a better job.

@jaredpar
Copy link
Member

@alrz

It modifies the AST that is handed to the emitter to generate the final binary

Sure but what does Intellisense say? How does debugging work? The final syntax tree will be, possibly, very different than what lives in your source repo. When you F5 and step into that file what happens?

@alrz
Copy link
Member

alrz commented Dec 22, 2016

@jaredpar Right, the only observable thing for CallerMemberNameAttribute usages in debugging is the value passed to the method. However, if that was not a constant it wouldn't work well in debugging.

@m0sa
Copy link

m0sa commented Dec 22, 2016

@alrz it's also affects all call sites, e.g.

Foo(); Bar();

where

public static string Foo([CallerMemberName] string caller = null);

So when you're debugging the first snippet, and your breakpoint is on the Foo(); invocation, after step-over - F10 , you need to land on Bar();, but if the rewritten source line is

Foo("myMember"); Bar();

Visual Studio would highlight Member, inside the string... That's why you'd also need to add #line directives, which is not the most straightforward thing to do. And this is just the tip of the iceberg...


@jaredpar I think having something like a structured representation of the source map when rewriting SourceTrees (either on the SourceTree, or on the entire Compilation) would solve a lot of problems we face RE proper debugging support. Having dealt with it in both C# and JavaScript, I must say I think the JS SourceMap approach is superior to what we currently have in C#. I think it shouldn't be to difficult to adjust the map automatically for unmodified parts of the SourceTree on modifications, since we already have a structured representations for changes.

@vbcodec
Copy link

vbcodec commented Dec 22, 2016

@jaredpar
Augmenting code is pretty limited and have high cost (creating generators and significantly increasing amount of code).
Better ways is move from replace/original + generators to pluggable solution driven by attributes. This will allow to intercept methods, properties and events and call provided code. Additionally, after implementing #6671, there will be possible to dig into methods and track / modify default code execution:. For example

[MyLoopMonitor()]
foreach (string x in coll)
{
    ...
}

Where MyLoopMonitor class can track and modify x variable and finish loop enumeration at any time (revolution).

@mcintyre321
Copy link

mcintyre321 commented Jan 24, 2017

@mattwar There is/was an interesting, but unknown project called Genuilder, written which hooked into a pre-compile event in msbuild, and passed the source tree to custom c# generators (which are in standalone dlls), which could then output extra source code. Here's a repo using it: Magic.

The classes would get picked up by VS and you had working intellisense for generated code within the same project (R# intellisense wouldn't pick up the generated classes though).

@Dennis-Petrov
Copy link

Just an idea, talking about modifying generators, as @jaredpar classified them.

What if there will be some "special" type of project item, which can be edited by user as regular C# source file, using Intellisense, code analyzers, syntax highlighting, etc, but this project item will be able to run generators to get actual source code to compile? It's some sort of "advanced T4 template".

This project item must not allow to modify its source code, typed by user, directly from generator - modification must be applied to the output. Also, debugger must step into actual, modified source code. This could solve problems with IDE experience, since there won't be any code (user typed) modifications "on-the-fly". Also, user can see, what he gets from generators - this will decrease level of "magic", brought by one or another generator.

There are some things to think about in context of breakpoints - user can put breakpoint at line, which will be absent in output, but this could be solved by disabling such breakpoints.

What do you think about that?

@kzu
Copy link
Contributor

kzu commented Jan 24, 2017

Which is more or less what this does: https://github.com/AArnott/CodeGeneration.Roslyn

/cc @AArnott

@reduckted
Copy link

There's also Scripty, which is similar: https://github.com/daveaglick/Scripty

@Dennis-Petrov
Copy link

Dennis-Petrov commented Jan 25, 2017

I'm almost sure, that there are number similar 3rd party tools.

The basic problem is that they are 3rd party tools. The probability of abandoning them is rather high. Moreover, if this is non-commercial projects for contributors, I'd afraid to bring them into real projects. E.g., both mentioned projects have less than 70 commits. This is negligibly small. Compare number of commits to alive and popular projects, like Autofac: https://github.com/autofac/Autofac.

Remember Code Contracts?

IMHO, tools with impact like this should be an official part of .NET ecosystem.

@per-samuelsson
Copy link

@CyrusNajmabadi

Thanks for all the time you spent this December patiently explaining your POV on this. It's certainly of value for stakeholders such as me and my company, trying to assess the likeliness of source generators to be realized some time soon.

If you'd find a minute or two to spare at some point, I'd highly appreciate your view on my recent question here.

@per-samuelsson
Copy link

If you'd find a minute or two to spare at some point, I'd highly appreciate your view on my recent question here.

Wow, I didn't even had the time to post that before you did. 😄 🚀. Thanks a lot.

@gafter
Copy link
Member

gafter commented Mar 27, 2017

This is now tracked at dotnet/csharplang#107. It is championed by @mattwar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests