Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype: Storing compilation outputs in Project #34698

Closed
wants to merge 1 commit into from

Conversation

tmat
Copy link
Member

@tmat tmat commented Apr 2, 2019

The EnC infrastructure needs to read data from the assembly and the PDB produced by the compiler.
Currently this information is not represented in Workspace layer, which is where we need to access it.

See #34371.

Proposal

Add new APIs that provide an access to the files that is abstracted from the file system, so that it allows projects to potentially compile in-memory and still provide the necessary data.

 public abstract class CompilationOutputs
 {
     /// <summary>
     /// Opens an assembly file produced by the compiler (corresponds to OutputAssembly build task parameter).
     /// </summary>
     public abstract Stream OpenOutputAssembly();

     /// <summary>
     /// Opens a reference assembly file produced by the compiler (corresponds to OutputRefAssembly build task parameter).
     /// </summary>
     public abstract Stream OpenOutputRefAssembly();

     /// <summary>
     /// Opens a PDB file produced by the compiler (corresponds to PdbFile build task parameter).
     /// </summary>
     public abstract Stream OpenPdbFile();

     /// <summary>
     /// Opens a documentation file produced by the compiler (corresponds to DocumentationFile build task parameter).
     /// </summary>
     public abstract Stream OpenDocumentationFile();

     public abstract override bool Equals(object other);
     public abstract override int GetHashCode();
 }

OpenXxx opens a new stream each time it's called. It's up to the caller to dispose the stream.
If the stream is backed by a file, the file is locked while the stream is being opened. The caller must make sure to close the stream as soon as it can to allow rebuild.

A file system specific implementation:

 public sealed class CompilationOutputFiles : CompilationOutputs, IEquatable<CompilationOutputFiles>
 {
     public string OutputAssemblyPath { get; }
     public string OutputRefAssemblyPath { get; }
     public string PdbFilePath { get; }
     public string DocumentationFilePath { get; }

     public CompilationOutputFiles(
         string outputAssemblyPath = null,
         string outputRefAssemblyPath = null,
         string pdbFilePath = null,
         string documentationFilePath = null);

     public CompilationOutputFiles WithOutputAssemblyPath(string path);
     public CompilationOutputFiles WithOutputRefAssemblyPath(string path);
     public CompilationOutputFiles WithPdbFilePath(string path);
     public CompilationOutputFiles WithDocumentationFilePath(string path);
 }

@tmat tmat requested a review from a team as a code owner April 2, 2019 22:51
@tmat
Copy link
Member Author

tmat commented Apr 2, 2019

@jasonmalinowski @DustinCampbell @CyrusNajmabadi @heejaechang PTAL.

The implementation is incomplete. Just wanted to put together a proposal.

@tmat
Copy link
Member Author

tmat commented Apr 2, 2019

/cc @jinujoseph

{
// The project doesn't produce a debuggable binary, we can't read it or it's not stored in a file on disk.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirement for storing the file on disk is temporary. It will be removed in the new EnC manager implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went ahead with this obviously add a comment to that effect.

@CyrusNajmabadi
Copy link
Member

Will do! Note: i haven't looked, but i'm always wary of things like public abstract Stream OpenOutputAssembly(); (just on a gut level, since streams represent mutable data). But i'm hoping you did sane stuff here and the 'shape' of hte compilation is still immutable, it's just possible for a "host" to be able to control how the compiler will actually end up streaming data.

@CyrusNajmabadi
Copy link
Member

What i would probably like are more involved lifetime docs on things like:

    /// <summary>
     /// Opens an assembly file produced by the compiler (corresponds to OutputAssembly build task parameter).
     /// </summary>
     public abstract Stream OpenOutputAssembly();

i.e. will this only ever be called once? if it could be called multiple times, what is the expected behavior. Should hte same instance of this type be passed to multiple compilation versions? what happens then in terms of them not stompin on each other? :)

I have no protest about anything in this PR. I'm just always scared of streams/IO and how to best represent them. So any and all clarity here would be appreciated!

@tmat
Copy link
Member Author

tmat commented Apr 2, 2019

Note: i haven't looked, but i'm always wary of things like public abstract Stream OpenOutputAssembly();

The API shape is the same as SourceReferenceResolver.OpenRead: http://source.roslyn.io/#Microsoft.CodeAnalysis/Compilation/SourceReferenceResolver.cs,45

An instance of SourceReferenceResolver is stored on CompilationOptions.

@tmat
Copy link
Member Author

tmat commented Apr 2, 2019

So any and all clarity here would be appreciated!

Sure. OpenXxx opens a new stream each time it's called. It's up to the caller to dispose the stream.
If the stream is backed by a file the file is locked while the stream is being opened. The caller must make sure to close the stream as soon as it can to allow rebuild.

@CyrusNajmabadi
Copy link
Member

The API shape is the same as SourceReferenceResolver.OpenRead: http://source.roslyn.io/#Microsoft.CodeAnalysis/Compilation/SourceReferenceResolver.cs,45

Oh ok! That makes me feel much better now. If this is following patterns that have worked well for us that def makes me feel more comfortable. Will try to review in the next day.

@CyrusNajmabadi
Copy link
Member

Sure. OpenXxx opens a new stream each time it's called. It's up to the caller to dispose the stream.
If the stream is backed by a file the file is locked while the stream is being opened. The caller must make sure to close the stream as soon as it can to allow rebuild.

Cool! Thanks for the info. It's not on you to necessarily add that info. but it probably would be useful to have this in the docs. The part about the caller having to dispose is especially relevant.

@tmat
Copy link
Member Author

tmat commented Apr 2, 2019

Agreed. This is just a quick prototype implementation to see how the API looks and is used, will definitely document in more details once we accept the design.

@@ -43,9 +44,14 @@ public void SetWin32Resource(string filename)
// This file is used only during emit. Since we no longer use our in-proc workspace to emit, we can ignore this value.
}

public CompilationOutputFiles CompilationOutputFiles
=> (CompilationOutputFiles)VisualStudioProject.CompilationOutputs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we ever expect VisualStudioProject to not be using a CompilationOutputFiles?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when redoing the project system code I was trying to keep VisualSudioProject fairly abstract and still making sure it doesn't assume:

  1. That projects necessarily exist on disk.
  2. They come from a particular source like MSBuild
  3. They're C# or VB.

Right now ASP.NET is doing some work where they'll be creating projects that represent in-memory compilation results, and so theirs might be using the base type in this if we enable E&C for that. But for the legacy C#/VB shims that wrap all that, it's safe to assume. This cast makes total sense to me, other than wondering out loud if Tomas wants to move it to AbstractLegacyProject.

@@ -53,8 +53,8 @@ internal sealed class VisualStudioProject
private string _filePath;
private CompilationOptions _compilationOptions;
private ParseOptions _parseOptions;
private CompilationOutputs _compilationOutputs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it just to be on the safe sid that this is the base type, and not the derived type? Or is this very intentional because you'll have a different impl sometimes? For example does EnC have it's own project type that would do things differently?

Copy link
Member Author

@tmat tmat Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intentional. I assume we may have projects that compile in-memory at some point (e.g. ASP.NET).

@heejaechang
Copy link
Contributor

If the stream is backed by a file, the file is locked while the stream is being opened. The caller must make sure to close the stream as soon as it can to allow rebuild.

can it copy it over to memory map file first time it is opened and give stream to it if it is read only? or is it writable as well? if it is read only, one can give out multiple stream at the same time from memory map file. we already have all code for it. and code to sync it to OOP.

http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Workspace/Host/TemporaryStorage/ITemporaryStorageService.cs,12

@heejaechang
Copy link
Contributor

if this become part of ProjectAttribute
http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Workspace/Solution/ProjectInfo.cs,383

synchronizing with OOP is straight forward since ProjectAttribute is currently the thing we consider data we need to sync between VS.

if we want it to live outside of ProjectAttribute, then it require a bit more changes in OOP sync code.

Copy link
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not signing off since this is indeed a prototype and is obviously missing some polish (doc comments, etc.) but I think this is on the right direction. My only concern is the snapshot behavior here -- the implicit contract of stuff you get from Solution/Project/Document are snapshots, but the streams you get either reflect the current snapshot nor do they behave as snapshots. I think moving the "get me the options" API to a workspace service might be the right approach, leaving all the rest of the work in place. The API is still public and there's still plenty of layers of indirection available.

{
// The project doesn't produce a debuggable binary, we can't read it or it's not stored in a file on disk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went ahead with this obviously add a comment to that effect.

_mvid = ReadMvid(outputPath);
using (var outputAssemblyStream = outputFiles.OpenOutputAssembly())
{
if (outputAssemblyStream != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an API question: would there be needs for people to know if there is an output assembly without actually triggering opening a stream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add bool HasOutputAssembly etc. if needed

// Unlike OutputFilePath and OutputRefFilePath, the intermediate output path isn't represented in the workspace anywhere;
// thus, we won't mutate the solution. We'll still call ChangeProjectOutputPath so we have the rest of the output path tracking
// for any P2P reference conversion.
ChangeProjectOutputPath(ref _intermediateOutputFilePath, value, s => s, w => { });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ChangeProjectOutputPath was also implicitly hooking up the ability to convert a file path given to us from the project system to a proper project-to-project reference. I don't expect other projects to be referencing the obj but did it out of paranoia. If that's something we'll want to maintain we'll need to fix that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Why are we second guessing the project system on what references are p2p? msbuild knows. Shouldn't PS know as well?

@@ -121,11 +121,6 @@ public string BinOutputPath
}
}

internal string GetIntermediateOutputFilePath()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was CPS or F# not calling this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F# does not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPS does not have IVT.

string getAbsolutePath(string path)
=> (outputDirectory != null && path != null) ? Path.Combine(outputDirectory, path) : path;

// TODO: calculate default paths if not specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you want to? Or just leave it at null? The defaults should be computed by the MSBuild that's feeding into this, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Need to look into this. I think all these are initialized when the project is created from project system, while they are optional on the csc.exe command line. We should probably require them to be initialized when we receive the data from project system.


namespace Microsoft.CodeAnalysis
{
public sealed class CompilationOutputFiles : CompilationOutputs, IEquatable<CompilationOutputFiles>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "we have a base type, and then one that's file-specific".

string pdbFilePath = null,
string documentationFilePath = null)
{
string requireAbsolutePath(string path, string parameterName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have a helper for this?

public abstract class CompilationOutputs
{
/// <summary>
/// Opens an assembly file produced by the compiler (corresponds to OutputAssembly build task parameter).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be worth calling out this is the obj thing, and might get further mutations? I admit it's a bit hard to write that since this this class is potentially pretty abstract and "obj path" isn't something really discussed here.

Copy link
Member Author

@tmat tmat Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling out this is the obj thing

We can give it as an example, but obj is a construct of msbuild.

might get further mutations

Yes. It should say the "current content of". Although it depends what we decide to do about snapshotting.

id, version, name, assemblyName, language,
filePath, outputFilePath, outputRefFilePath, defaultNamespace: null, compilationOptions, parseOptions,
documents, projectReferences, metadataReferences, analyzerReferences, additionalDocuments,
isSubmission, hostObjectType, hasAllInformation: true, CompilationOutputFiles.None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why CompilationOutputFiles.None versus a no-op implementation of the base?

@@ -469,6 +472,16 @@ public ProjectState UpdateParseOptions(ParseOptions options)
documentStates: docMap);
}

public ProjectState UpdateCompilationOutputs(CompilationOutputs outputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a bit funky -- this doesn't quite follow the usual snapshot model in that CompilationOutputs's stream methods actually return that doesn't match up with the current snapshot but something older -- it's implicitly the result of the last time we built, right? It also means if somebody does observe the value, we won't necessarily be persisting it for later calls to ensure immutability.

I can think of a few options:

  1. Don't care, and just do nothing.
  2. Instead make an ICompilationOutputsWorkspaceService (which would still be public) and that just has a GetCompilationOutputs method that takes a ProjectId. That way it's no longer strictly tied to the snapshot and doesn't come with the guarantees thereof.

I'm thinking 2 would be my preference. The CompilationOutputs type you've got itself I think would stay unchanged and all the work already done up in VisualStudioProject.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from having OutputPath on Project? The CompilationOuputs instance is a an abstracted set of "paths".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more I agree with going with [2].

@jasonmalinowski
Copy link
Member

Actually had one more thought: making this a workspace service potentially makes other things like the remote workspace a bit saner, since that could just implement the service in a fairly independent way without having to special case how this gets passed around?

@tmat
Copy link
Member Author

tmat commented Apr 4, 2019

the implicit contract of stuff you get from Solution/Project/Document are snapshots

Some things are snapshot and some aren't - they are just "references" whose content is not snapshot. We already have OutputPath and OutputRefPath - neither of these are snapshot.

The proposed API allows us to snapshot if we need to (and we might). You can write implementation of CompilationOutput that wraps the implementation given us by the creator of the Project and implement snapshoting there. The wrapper would first check if there is already a snapshot and if not call the original implementation to get a stream, copy it to vm and then open a new stream from the vm.

@tmat
Copy link
Member Author

tmat commented Apr 4, 2019

I'll send another PR implementing the workspace service approach as suggested in #34698 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants