Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating non-source files #57608

Open
geeknoid opened this issue Nov 5, 2021 · 117 comments
Open

Support generating non-source files #57608

geeknoid opened this issue Nov 5, 2021 · 117 comments

Comments

@geeknoid
Copy link
Member

geeknoid commented Nov 5, 2021

We have a few use cases where we'd like to leverage source generators to produce non-source files. Basically, we want to use source generators to generate secondary build outputs. For example:

  • We annotate data models with data classification attributes to denote PII and we produce a CSV file that enumerates the types and members in the project and their data classification. You can then take this CSV file and use it when doing privacy audits.

  • We use source generators to produce a bunch of code around metrics. We'd like to emit a schema file describing the effective shape of the metrics being produced by the code. This would probably be in the form of an output JSON file.

We have a few other scenarios in the wings which could leverage this approach.

Unfortunately, source generation is currently limited to C# sources. Trying to emit files with other extensions doesn't work. As a result, our source generator currently contains this horrible hack:

        public void Execute(GeneratorExecutionContext context)
        {
            ... generate the CSV file ...

            context.CancellationToken.ThrowIfCancellationRequested();

            /// Adding the workaround to generate reports via File writes since <see cref="GeneratorExecutionContext.AddSource(string, CodeAnalysis.Text.SourceText)"/>
            /// has an underlying check for `.cs` files and automatically adds the `.cs` suffix if the provided filename has some other filetype.
            /// Refer <see href="https://github.com/dotnet/roslyn/blob/v3.8.0/src/Compilers/Core/Portable/SourceGeneration/AdditionalSourcesCollection.cs#L63">AppendExtensionIfRequired</see>.
            _directory ??= FileUtilHelpers.GetOutputDirectoryForGeneratedFile(typeof(Generator), context.Compilation.Assembly, context.Compilation.Options.OptimizationLevel);
            _ = Directory.CreateDirectory(_directory);

            // Write properties to CSV file.
            File.WriteAllText(Path.Combine(_directory, _propertiesFileName), properties);

            // Write log methods to CSV file.
            File.WriteAllText(Path.Combine(_directory, _logMethodsFileName), logMethods);
      }

Could we get 1st class support for producing more than .cs files?

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Nov 5, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ryzngard
Copy link
Contributor

ryzngard commented Nov 5, 2021

Are these outputs also included in the final compilation somehow, or are they just independent artifacts of the build?

@geeknoid
Copy link
Member Author

geeknoid commented Nov 5, 2021

They are secondary outputs of the build process, just like the .xml doc file produced for XML doc comments. If C# didn't support XML doc comments, you could in fact implement them fully as a source generator.

Ideally, the output files would be produced in the same directory as the final output assembly, rather than being buried somewhere in the obj directory. But I can live with the obj directory output.

@ryzngard ryzngard added Area-Compilers New Feature - Source Generators Source Generators Feature Request and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 5, 2021
@lsoft
Copy link

lsoft commented Nov 11, 2021

probably, something related? #52677

@vpenades
Copy link

vpenades commented Nov 24, 2021

I agree with this proposal. in my case I would need something as simple as this:

void AddContent(string fileName, byte[] content); // add file as a content item

Notice that in my case, I also need that the output file to be considered as a Content item, so it also needs to handle CopyToOutputDirectory, pack to nuget as a content file, etc... and not limited to text files

In the grand scale of things, I would like to have is a framework similar to source generator, but for content files... basically a "content generator". I am not sure if this can be achieved by simply adding an "AddContent" method, or it would require a completely separated, and more specialised framework.

A practical use case scenario from which we could benefit: We have a localization framework that parses the source code looking for specific localization attributes. Then it queries our database for localized strings, and writes an XML file containing the attribute keys and the localized strings. This is now running as a command line tool, which is nasty.

@AaronRobinsonMSFT
Copy link
Member

This is also of interest to the Interop team and the DllImport source generator.

We would like to emit artifacts that could be used in other languages and likely used in a post-compile MSBuild Target. Roslyn could give us a Stream instance to use or we could call an API that takes a Stream. Either approach works for the data stream. The name of the output artifact file would also be important. The specific path isn't interesting coming out of the compiler, since we would move/copy it to another location, but the name would be helpful.

From within MSBuild, it would be helpful to either have a convention of a new set of Items (e.g., <GeneratorName>Artifacts) that we could then consume. I would caution against requesting an Item name from the generator for the typical name conflict issues.

/cc @dotnet/interop-contrib @chsienki @jaredpar @kg

@chsienki
Copy link
Contributor

chsienki commented Jan 26, 2022

@AaronRobinsonMSFT This is basically the design we landed on for the feature :)

We actually have a (somewhat out of date) PR to implement this on the analyzer side #49046 but it's been stuck in limbo due to lack of prioritization.

Feel free to bug @jaredpar if this is something you think is important to the success of the DllImport generator.

@stephentoub
Copy link
Member

stephentoub commented Feb 24, 2022

As a +1, I'd also like this for the RegexGenerator. I'd like to be able to output non-compiled files that a developer could browse to in solution explorer to be able to provide additional diagnostic information about the regular expressions being used. Imagine, for example, if in addition to outputting RegexGenerator.g.cs that included the source code, it could also output a .dgml file containing a graph of the DFA for the regex, and a developer could simply double-click that .dgml file to have it opened in VS to visualize the shape of their regex. Today the best I can do here seems to be emitting that dgml content as a comment embedded in the .cs file file, at which point a developer needs to find it, copy/paste it to a separate file, and then open that file they created (and I need to somehow escape the content in a way that it doesn't break the .cs compilation but is still valid dgml).

@jaredpar jaredpar added this to the C# 11.0 milestone Feb 24, 2022
@ChrML
Copy link

ChrML commented May 24, 2022

This would also be really useful for projects like Swagger. It could have a content generator that would output OpenAPI data for the API controllers in the project at compile-time rather than runtime. Thus avoiding reflecting classes and parsing the XML file at runtime.

Even better if the API let you choose if the content should be added as a file next to the build output, or if it should be embedded as a part of the assembly dll file.

@jaredpar
Copy link
Member

@ChrML in the Swagger case how would you use the output though:

  • Have an MSBuild task that copied it somewhere?
  • Tell users where it is on disk and let them consume it at will?
  • etc ...

@ChrML
Copy link

ChrML commented May 24, 2022

@ChrML in the Swagger case how would you use the output though:

  • Have an MSBuild task that copied it somewhere?
  • Tell users where it is on disk and let them consume it at will?
  • etc ...

The exact same way as any solution item marked as "Content". Except in this case the content is not a solution item, but dynamically provided during compile by the source generator.

As for the Swagger-like use-case there could be no change to the code for the library consumers. Internally all the runtime code could be almost-zero, and the library would just serve the compile-time generated OpenAPI file directly to the API.

Assuming ofcourse that there's no dynamic logic involved and all metadata is available at compile-time. I would think this is the most common scenario.

@BertanAygun
Copy link
Contributor

Adding my +1 as well for VisualStudio.Extensibility, we would love to have this natively supported since we generate both source files and also a metadata asset to be included as part of the extension build output. So being able to call a method like AddContent(string relativePath, ) which handles copying the file to the correct output location like a Content item would be great.

@kg
Copy link
Contributor

kg commented Aug 11, 2022

Things like shader compilers also need to typically generate manifests or metadata in addition to shader binaries - I can imagine someone wanting to do a thing where you write shaders in C# or in string literals, and then a source generator would produce C# methods that expose access to the shader and then embed shader binaries and metadata for use at runtime. The alternative is manually pulling out manifest resources etc, which is error prone (especially since embedding manifest resources exposes you to msbuild's broken dependency tracking).

@MortenChristiansen
Copy link

I would love this for generating SQL change scripts based on classes representing tables.

@CollinAlpert
Copy link
Contributor

I'd also like this to generate Razor pages with Source Generators. Since it's probably too late for .NET 7, can anyone say if this is triaged for .NET 8?

@andrew-hampton
Copy link

andrew-hampton commented Oct 28, 2022

Adding a +1 to this proposal. We have a few scenarios where we use targets to launch a separate executable that uses reflection to generate a content file and includes that content as part of the build. This process works, but can be fragile. Would be fantastic to be able to utilize the context within source generators to achieve this.

@DomenPigeon
Copy link

@CyrusNajmabadi just wanted to thank you for advocating for the use of MSBuild, because of your comments I was forced to go check out how MSBuild actually works and what capabilities it has and I was amazed about how little I actually knew about it and how strong of a tool it is.

For the case I have mentined above, now that I know what is MSBuild capable of, I don't need the source generators for it, because I can generate all the mirror JS files from C# via reflection.

The one thing that would still be hard, as far as I now understand the MSBuild, (as was mentioned above in some comment), is to generate files (C# or other) based on the syntax tree or semantic model, which would be hard to get inside a task but is easily accesable when writting source generators.

@CyrusNajmabadi
Copy link
Member

The one thing that would still be hard, as far as I now understand the MSBuild, (as was mentioned above in some comment), is to generate files (C# or other) based on the syntax tree or semantic model, which would be hard to get inside a task but is easily accesable when writting source generators.

Correct. This is the area where SGs excel (and do things that would be extraordinarily difficult to do from outside the compiler). :-)

@Perksey
Copy link
Member

Perksey commented Sep 10, 2023

Encountered a need for this today to wield Blazor's WASM tooling to generate a native P/Invoke trampoline to account for scenarios that the current P/Invoke tooling in Blazor doesn't handle very well.

However, I fully appreciate that analogues to this feature in other languages (such as the Rust proc_macro) also do not implement this because fundamentally it doesn't make sense for a A->B compiler to also generate C to then be fed into another compiler, they're fundamentally different stages of the pipeline.

Perhaps a happy medium could be allowing the generation of a metadata file from a source generator to allow SGs to provide information about their execution retrospectively, to then be picked up by another tool, using an opt-in parameter on csc.

@BlackGad
Copy link

You can generate cs file with block comment on full json/xml content) make extension something like foo.metadata.cs. on pre build step (but after generation) rename file to .json/.xml with comment block cut.

Of course it is a trick but still if you really need this - would work.

@mhmd-azeez
Copy link

@Perksey I have been able to successfuly do that by writing a custom MSBuild task:
extism/dotnet-pdk#4

@Kukkimonsuta
Copy link

Would allowing generating non-source files only within RegisterImplementationSourceOutput allow this feature to go through? Because I think most/all use cases mentioned here would be fine with that.

Alternatively if we really must avoid writing files at all cost during this process, could there be a mechanism to push information back to msbuild so a msbuild task can pick it up and further process however/whenever needed? That would allow code inspection during compilation while allowing generating whatever non-code files based on that after compilation.

@jaredpar
Copy link
Member

Would allowing generating non-source files only within RegisterImplementationSourceOutput allow this feature to go through?

No. This feature still has a number of open issues and likely requires changes to other products for us to ship. For example non-source file outputs must be written to disk. That represents dynamic build outuputs from the compilation task. That is not supported by MSBuild today and would lead to basic build correctness issues. Dealing with this is an active discussion topic between us and the MSBuild team.

This is a feature we're very much looking at for .NET 9 but there is not a simple fix.

@jaredpar
Copy link
Member

Want to elaborate on the problem this presents for MSBuild. The purpose of this feature is to allow source generators to produce non-source files outputs. These outputs will be written to disk for later phases of the build to consume.

MSBuild tracks both inputs and outputs to targets. At a high level when evaluating a given target for a project MSBuild will determine if the inputs to the target are the same as the last evaluation and the outputs still exist, if so then it can skip the target entirely. This is how incremental builds, fast up to date, etc ... function in MSBuild. This means though that if you do a build of console.csproj, then say delete obj\net472\console.dll a rebuild will re-run the compiler because MSBuild can see the output file is missing hence the target is out of date.

MSBuild support though requires that all of the inputs and outputs are known at the time the target is evaluated. That is where the check occurs. The issue with non-source file outputs is they are determined dynamically by the generators based on the content of the Compilation they are given. That does not fit into MSBuild's existing tracking model. These would all untracked outputs and features like rebuild, FUTD, etc ... would be broken.

I understand that some contributors here may see that as an acceptable trade off but we do not. Really hard to push through a compiler feature that is fundamentally incompatible with our basic build principles. It's a recipe for customer confusion and lots of feedback issues.

There are a couple of ways this could be fixed:

  1. Break up CoreCompile target into a series of targets: CoreCompileGetOutputs and CoreCompileCompile. The CoreCompileGetOutput targets runs the generators enough to get the output list and provide that to CoreCompileCompile which runs the compiler. The issue with this approach is that CoreCompileGetOutputs must always run. It must be constructed in a way that it always fails the up to date check. That means generators run, even in a limited form, on rebuild scenarios. Not ideal.
  2. Change MSBuild such that it supports targets with dynamic outputs, and likely dynamic inputs. The CoreCompile target could be changed to take advantage of this and non-source file generators would fit naturally into that scenario.

At the moment (2) is the preferred approach and we're digging into its viability.

MSBuild tracking issue: dotnet/msbuild#701

@rainersigwald
Copy link
Member

rainersigwald commented Oct 17, 2023

MSBuild support though requires that all of the inputs and outputs are known at the time the target is evaluated.

Nit: it's at execution time, just before executing the tasks inside the target (step 6 of the target order determination).

(It could have been at evaluation time but it's slightly more dynamic than that--if it was at evaluation, plan 1 wouldn't work.)

@Perksey
Copy link
Member

Perksey commented Oct 17, 2023

This does feel like it may invoke a future feature request for RegisterIncrementalImplementationSourceOutput or something equivalent, as I don't think everyone will want to regenerate every time, but this sounds on the right track.

@riverar
Copy link

riverar commented Oct 17, 2023

@jaredpar I remember being unconvinced that this feature was truly needed (in MSBuild). Did you or the team come up with scenarios that are driving this current work?

@kg
Copy link
Contributor

kg commented Oct 17, 2023

For browser scenarios, CSP rejects attempts to dynamically generate code or run code that wasn't loaded from a trusted network source, so if you are (for example) generating C#/JS interop bindings, it is ideal if you can produce a paired .cs file and .js file from the same inputs in one go. Embedding the JS in a string literal would violate CSP, so the only alternatives are to have a separate tool that generates the .js files from the source metadata, or to embed the JS in the generated C# somehow and extract it with an msbuild task.

IIRC we ended up working around the lack of good solutions by using an interpreter instead of generating JS, even though it's slower. I don't know if we would actually rewrite our code to use this feature when it arrives (we were told to expect it a long time ago).

I also saw someone suggest an analyzer+source generator as a way to implement dotnet/csharplang#7529 which would require the generator be able to generate string tables as secondary outputs. (I don't think that is actually a solution for the use case, but it was suggested there.)

@jaredpar
Copy link
Member

jaredpar commented Oct 17, 2023

@riverar

I remember being unconvinced that this feature was truly needed (in MSBuild).

Can't tell which feature you're speaking about here: non-source outputs in compiler or dynamic output tracking in MSBuild.

If it's non-source outputs, it's still not 100% clear if it's fully needed. There are many scenarios where it would be a strong benefit to the ecosystem. There are ways to work around this by just writing more MSBulid code but it would be a lot better integrated in the source generator pipeline. Essentially there is enough benefit here for us to be doing deeper investigation.

At the same time the "just write more MSBuild code" is not always viable. As long as the output file set depends on the input code then it's still a dynamic output problem. That pretty much has no other solution than a new MSBuild feature or always run targets.

If it's msbuild tracking dynamic output,, I outlined one of the more important scenarios above. Effectively a general build principal for us is deleting build output should cause rebuild to re-execute compilation. Lacking MSBuild features or the targets trick I mentioned this can't happen. That's a non-starter for us right now.

@eerhardt
Copy link
Member

I just wanted add a +1 to this scenario and link to the .NET Aspire Components use case for this: dotnet/aspire#1146.

.NET Aspire Components allow app settings to be configured in appsettings.json files. We added a feature to Visual Studio's JSON editor to support augmenting the JSON intellisense with JSON schema segments coming from referenced NuGet packages. For example, when a project references the Aspire.StackExchange.Redis package, and the dev opens the appsettings.json file, they get auto-completion and intellisense for the Redis options in VS (note we want to add this feature to VS Code as well):

image

The input to these JSON schema files is .NET Types, in this case the class is the StackExchange.Redis ConfigurationOptions class. As we update this class, I'd want the JSON schema files to be automatically updated, instead of having to hand-craft the files that we do today. This seems like a perfect fit for a Roslyn Source Generator, if it supported outputting non-source files.

@CyrusNajmabadi
Copy link
Member

@eerhardt why does this need to be a SG thought? Nothing in the same app will consume that same json schema right? So this could just be a trivial tool that runs over the compilation as a post-step and produce the json schema file.

@BlackGad
Copy link

@eerhardt why does this need to be a SG thought? Nothing in the same app will consume that same json schema right? So this could just be a trivial tool that runs over the compilation as a post-step and produce the json schema file.

Because of double syntax tree build maybe?) With proper cache SG could do this automatically. Also it will be convenient to encapsulate this logic to separate nugget package so proj file will be untouched

@CyrusNajmabadi
Copy link
Member

Because of double syntax tree build maybe?)

  1. with a compiler server, this should be a non-issue.
  2. this motivates a different technology that is nto source-generators. This would be something like a post-generator tech. i'm much more ok with that.

Note: i'm fairly sure people want a post-generator, since SGs can't see the results of other SGs. So if you wrote an SG here, for example, your json-schema could not include any information about things generated by other generators, which is not what i think people want.

@eerhardt
Copy link
Member

@eerhardt why does this need to be a SG thought? Nothing in the same app will consume that same json schema right? So this could just be a trivial tool that runs over the compilation as a post-step and produce the json schema file.

It doesn't need to be a SG. To unblock the scenario my plan is exactly as you suggest - a post-compilation step. It just feels more natural to do this during csc and using Roslyn Source Generators than it does to make custom MSBuild tasks/targets/etc.

@Khitiara
Copy link

definitely seems like a general way to hook into the compiler to access syntax and the semantic model during other tasks/targets is what people actually need. can say that would handily cover my use-cases for this feature too, as long as that can be made to run early enough to create embedded resources

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Nov 30, 2023

definitely seems like a general way to hook into the compiler to access syntax and the semantic model during other tasks/targets is what people actually need.

Yeah, this is definitely possible, though we coudl likely make things better here by making the tasks easier to write. As an example, the roslyn IDE does this ourselves in order to generate LSIF files from the built compilations. If we could extract out the portion of that taht handles getting the same data and then operating on that, that would likely be the most valuable thing here.

IMO, given a compielr server, you have the best of all worlds:

  1. the server keeps the data alive (compilations, references, trees) so you're not paying the expense multiple times.
  2. you run after all real source generation is done. so you see the complete view of hte code. not the primordial view.
  3. you are just a normal msbuild task with proper inputs/outputs.
  4. you incur no costs deep in the compiler.
  5. you can use any generation mechanism. you're not dependent on things like ForAttributeWithMetadataName.
  6. You can generate binary files (or anything for that matter).
  7. You can be async, etc.
  8. You can run in any .net host. You're not limited to netstandard2.0

etc. etc.

@jaredpar
Copy link
Member

jaredpar commented Nov 30, 2023

So this could just be a trivial tool that runs over the compilation as a post-step and produce the json schema file.

This may work but only because the file name is known before the build occurs. Thus it's not subject to the issues I listed before. That is the rare case though. Most non-source generators don't know their content until they inspect the code.

I generally agree though that most of these generators fall into post-compilation. Essentially has no IDE / keystroke interaction. If we do this feature the interface will be geared to that.

There have been requests to generate resources that could be embedded into the compilation. That doesn't have a nice clean split. At the same time this request seems to be the minority use case right now.

definitely seems like a general way to hook into the compiler to access syntax and the semantic model during other tasks/targets is what people actually need.

Yeah, this is definitely possible, though we coudl likely make things better here by making the tasks easier to write.

We can do this but I would push back strongly on it. The msbuild task for translating build properties into a correct compilation is complex. Maintaining a compiler server that meets the reliability constraints we have for build is very hard. Duplicating that would be a decent effort. Even if it were easy though there is also the problem that many teams in the industry maintain copy of our .target files. When you alter them like this they get out of sync with those targets and suddenly code that should compile stops compiling.

I think it makes a lot more sense to follow our existing generator model where such plugins run in the existing compiler infrastructure. Because they are purely tools of build (don't impact IDE experience) such generators won't come into play in the IDE space.

You can run in any .net host. You're not limited to netstandard2.0

jared starts laughing and ends by crying

@eerhardt
Copy link
Member

eerhardt commented Dec 5, 2023

So this could just be a trivial tool that runs over the compilation as a post-step and produce the json schema file.

I took a stab at writing this today, and I don't think this is trivial. How exactly would you do this? What I tried:

  • Make an MSBuild Target/Task that hooks into TargetsTriggeredByCompilation
  • Reference Microsoft.CodeAnalysis.* assemblies in the Task
  • Pass @(ReferencePaths) and @(IntermediateAssembly) to the Task

The issue I immediately hit is that the Microsoft.CodeAnalysis.* assemblies are not loaded by MSBuild. So I need to find them somehow myself, either bring my own or have MSBuild pass the location and hook AssemblyLoadContext like Arcade does.

Is using a Task wrong here and instead should be making a .exe that does this instead? I would still have the problem of how to load the Microsoft.CodeAnalysis assemblies.

with a compiler server

How do you make a custom task or tool that uses a compiler server? Are there examples?

@jaredpar
Copy link
Member

jaredpar commented Dec 5, 2023

I took a stab at writing this today, and I don't think this is trivial. How exactly would you do this?

I don't think it's realistic to do.

How do you make a custom task or tool that uses a compiler server? Are there examples?

You can't really. The only task that can access the compiler server is the Csc / Vbc tasks. That is unlikely to change.

I do not think hosting the compiler yourself is a realistic path forward. We've seen many teams attempt to do this over the years and it does not end well for them.

@lsoft
Copy link

lsoft commented Dec 16, 2023

This article contains many use cases. May be it is enough now to generalize... I don't want to seem annoying, but let me ask please: is there any plan in Net9 dev cycle to work about this topic?

if no, what is about some kind of workaround: many of us will be happy if we will have a way to touch Compulation (SemanticModel, Document, SyntaxNode and other Roslyn stuff) efficiently somewhere in the build cycle (prebuild?). (#57608 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests