Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single file apps in .net 5 #90

Merged
merged 10 commits into from
Feb 20, 2020
Merged

Conversation

swaroop-sridhar
Copy link
Contributor

This change updates the design for single-file apps support, with upcoming changes in .net 5.
The .net core 3 design is summarized in design_3_0.md

This change updates the design for single-file apps support, with upcoming changes in .net 5.
The .net core 3 design is summarized in design_3_0.md
* Write meta-data headers and manifest that help locate the contents of the bundle.
* Set the bundle-indicator in the `AppHost` to the offset of the bundle-header.

## Implementation

The bundling tool is implemented as a library in the [Microsoft.NET.HostModel](https://www.nuget.org/packages/Microsoft.NET.HostModel/) package. This library is used by the SDK in order to publish a .net core app as a single-file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at it just recently. Is there a reason for why we have this split into Microsoft.NET.HostModel and a task that lives in the SDK? It may be useful to write down the reason for the split here. (I was expecting to have one .dll with the msbuild task that does everything.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-exhaustive list of reasons:

  1. msbuild tasks need to be multi-targeted as they must carry all their dependencies

  2. cli uses hostmodel to generate apphosts for global tools without running a build

  3. we can keep all the localized error messages in the sdk repo where we have loc infrastructure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it was deemed desirable to have reading and writing code in the same repo to more easily evolve the format and unit test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nguerrera. Some of these reasons are documented in the Bundler's readme file. I'll add the details here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the ReadMe file, now that a few repos have changed:
dotnet/runtime#32583

* ReadyToRun Assemblies
* On Linux, ReadyToRun assemblies are loaded directly from bundle. The various sections in the PE file are mapped at appropriate addresses, and offsets are fixed up.
* On MAC, ReadyToRun assemblies are loaded similar to Linux. However, Mojave hardened runtime doesn't allow executable mappings of a file. Therefore, the contents of an assembly are read from the bundle into pre-allocated executable memory.
* On Windows, due to certain limitations in memory mapping routines described below, ReadyToRun assemblies are loaded by memory mapping the file and copying sections to appropriate offsets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we can do better here by laying out the sections in the file such that this copying is not necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see you are covering this below.

* [MapViewOfFile]( https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffileex) can only map parts of a file aligned at [memory allocation granularity]( https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/ns-sysinfoapi-system_info), which is 64KB.
This means that:
* Each section within the assemblies should be aligned at 64KB – which is not guaranteed by the crossgen compiler.
* In order to memory-map one embedded assembly at a time, the assemblies in the bundle must be aligned at 64KB boundaries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime can be fixed to handle assemblies that do not start at 64kB boundaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that if we don't change the runtime, the binary size would be bloated by padding up to the 64k boundary for each assembly in the app/framework?
That would run counter to the idea to be able to create smaller single-file apps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime can be fixed to handle assemblies that do not start at 64kB boundaries.

Yes, I agree that the runtime can round-down to a 64KB boundary, and read the headers from an appropriate offset. But after the headers are processed, the individual sections themselves need to be 64 KB aligned. So, it is probably sufficient to state the limitation about section alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that if we don't change the runtime, the binary size would be bloated by padding up to the 64k boundary for each assembly in the app/framework?
That would run counter to the idea to be able to create smaller single-file apps.

The current proposal doesn't involve 64KB alignment, but an extra copy step at run-time.

The 64 KB section alignment is a Windows limitation. So the suggested mitigation (likely as a subsequent step) is to use a composite R2R with aligned sections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But after the headers are processed, the individual sections themselves need to be 64 KB aligned.

They do not need to be. We have full control over the embedded binary format. We do not have to embed the exact copy of the PE file:

  • We can reformat the sections as we embed them, so that we can map the whole file into memory at once, and just flip the permissions on pages as necessary.
  • Also, for IL-only (ie no R2R), the runtime can use the flat mapping (we do that on Unix).

Copy link
Member

@jkotas jkotas Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, this is not a fundamental limitation. This is limited by amount of effort we can afford to put into this.


The binaries that are published in the project are expected to be handled transparently by the host. However, explicit access to the embedded files is useful in situations such as:
Proposed solution is: `:/<bundle-sub-path>/asm.dll`, where the `:` denotes the location within the bundle. The `:` is not a valid character in file-names on Windows. The `:` is already used as a path-separator on Unix systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this is a good idea. We are inventing a new custom path format here. People will pass this path to File.Exist or File.Open half of the time, and the app is still going to be broken. I do not think we will want to quirk all file I/O APIs to deal with this.

The most appropriate value value to return from Assembly.Location is null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Assembly.CodeBase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null is a valid/documented return value of Assembly.Location, but pointing at the documentation and telling people that the NuGet package they're using has a bug because it doesn't check for null never helped much. Most apps don't check for the null, and returning null means that most apps will hit a NullReferenceException somewhere. People don't know how to deal with that when that exception happens in a NuGet package they don't have source code for. We have plenty of issues like that in the dotnet/corert repo (which already has this problem because it chose null).

Returning a custom path has the same problem - Assembly.Location is typically passed to the Path APIs because people either want the file part or the directory part. Most uses I've seen want the directory part because e.g. they're trying to scan for plugins or find a config file. Directory.GetFiles(Path.Combine(Path.GetDirectoryName(asm.Location), "*")) results in an exception that is similarly unhelpful to a NullReferenceException.

I'm wondering whether the best course of action would be to just make this throw with a message that has an fwlink to useful guidance. There can be AppContext switches documented at the fwlink that override the throw with a quirk to make the NuGet package work (return a fake location that combines AppContext.BaseDirectory with the module name, or return null).

Assembly.CodeBase is getting obsoleted in .NET 5 (see dotnet/runtime#31127). The appropriate behavior would be to throw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have trained people to handle Assembly.Location returning null gracefully for last 5 years, and number of NuGet packages made the required fixes already. I do not think getting everybody to change again is a good idea.

There are number of other APIs that are problematic for this mode. We plan to deal with these using analyzer that will warn about the use of these problematic APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We plan to deal with these using analyzer that will warn about the use of these problematic APIs.

@jkotas, mind adding details to dotnet/runtime#30740? @terrajobst is currently using that to seed our plans for what we're actually going to ship in the release.

Copy link
Member

@jkotas jkotas Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbomer @MichalStrehovsky and @vitek-karas are working on this. The current thinking is to have it in the same ILLinker-bases analyzer that detects problematic reflection APIs, etc. dotnet/runtime#30740 is about Roslyn-based analyzers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current thinking is to have it in the same ILLinker-bases analyzer that detects problematic reflection APIs, etc. dotnet/runtime#30740 is about Roslyn-based analyzers.

Are you saying that the detection will be part of the linker? While that might make sense for code sharing, it's not necessarily a great developer experience b/c Roslyn-based analyzers are running while editing and thus are providing rapid feedback. In cases where there are alternatives we might want to provide a code fixer, which will force some of the detection logic to be part of the IDE experience anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating between null and UNC options:

Establishing a path-like location for the the bundle contents is useful if the app explicitly wants to use Assembly.LoadFrom() on one of the bundled assemblies. Otherwise, Assembly.LoadFrom() will always be understood to load from the disk, which is one way to define the semantics.

The path notation is also useful to address content (ex: data files) bundled in the app directly. However, if we only support accessing content through extraction, or as PE resources, this use-case is not a motivation for UNC-like paths.

I'll change the semantics to return null.


However, explicit access to the embedded files is useful in situations such as:
- Reading data-files from the app's native code
- Open an assembly for reflection/inspection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing AssemblyExtensions.TryGetRawMetadata API for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssemblyExtensions.TryGetRawMetadata() retrieves the meta-data for loaded assemblies, right? How about inspecting the assembly before loading?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to build a custom loader, attach the assemblies as managed resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the case where there's an assembly in the app that's loaded through usual means in one context (say through TPA in default context) but needs to be inspected before loading into another context. All of this is achievable if the user can implement it themselves (ex: by overloading ALC.Load()) but is easier if they don't have to.
Anyway, I don't yet know of an app that falls into this use-case, so will not push on it.

The app may want to access certain embedded content for reading, rather than loading via host/runtime. For example: bundled payload/data files. In this case, the recommended strategy is to embed the content files within appropriate managed assemblies as PE resources, and access them through [resource handling APIs](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourceinfo?view=netcore-3.1).

However, explicit access to the embedded files is useful in situations such as:
- Reading data-files from the app's native code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not handled by GetManifestResourceStream ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we recommend to store everything as resource, then that approach should work for all code (managed or native). The new API doesn't add any new value - it just makes it so that it's possible to not use managed resources. Which we probably don't want people to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the content in resources can be accessed by native code (ex: the managed code can share it on platforms where Win32 resource handling APIs are not available).

- Reading data-files from the app's native code
- Open an assembly for reflection/inspection

Therefore, we propose adding an API similar to [GetManifestResourceStream](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourcestream?view=netcore-3.1) to obtain a stream corresponding to an embedded file. This is only a draft of the proposed APIs. The actual shape of the APIs will be decided via API review process.
Copy link
Member

@jkotas jkotas Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs need more crisp scenarios. I do not see the real-world scenarios that they are for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that we might want to ship without these APIs to force people toward the managed resources solution. The only scenario which would need the new APIs are cases where the app needs access to full assemblies - not sure if that's an important enough case to warrant a new API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KISS - keep it simple until we have a really good reason for needing new APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that not requiring the content to be embedded within PE files will lead to easier adaptation, because existing applications need fewer changes. Also, the way apps are built in the non-single-file case need not change at all.

But I agree that we can seek feedback from customers without them in previews, so I've removed the new APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the content accessing APIs, I think the IsBundle() API (which tells whether the app is run from a single-file or not) is worth including.

In general, apps should be agnostic to how they are published (single-file or many files). However, having the IsBundle() test helps customer work around situations where the single-file functionality may be missing.

Several customers have asked for this IsBundle() test in .net core 3.x. We've resisted from offering this API in 3.x, instead offering solutions/work-arounds for the specific customer situations -- sometimes not completely successfully.

For example, in 3.x AppContext.BaseDirectory points to the extraction directory. We have given other options to locate the AppHost (ex: Process.GetCurrentProcess().MainModule.FileName). This works in the single-file scenario, but not if the app is run through dotnet run in a non-single-file publish.

This specific situation of AppContext.BaseDirectory will be resolved in 5.0, but having a bundle-differentiator will likely give a chance for customers to circumvent other shortcomings until the resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsBundle() API

The problem with such API is that it is unclear what is the set of behaviors that it returns true for. I do not think we would be able to provide a good guidance for what to use it for, and maintain compatibility for how this API gets used as we evolve the runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of IsBundle() API was that it returns true if the host contains a bundle-header.

namespace System.Runtime.Loader
{
    public partial class Bundle
    {
        // Check whether an app is running from a single-file bundle
        public static bool IsBundle();
    }
}

The semantics, and the form of this API may need to evolve, along with the runtime.
For example:

  • If we have CoreRT style single-file apps, whether IsBundle() returns true?
  • If we support single-file plugins, whether we should have ``IsBundle(assembly)` ?

Copy link
Member

@jkotas jkotas Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsBundle() API was that it returns true if the host contains a bundle-header.

Given #90 (comment) , we may end up with bundle-header for every app in .NET 5 and so it would return true all the time based on this spec.

The APIs need to be described in terms of what people should use them for, not in terms of how they are implemented.


Most applications are expected to work without any changes. However, apps with a strong expectation about absolute location of dependent files may need to be made aware of bundling and extraction aspects of single-file publishing. No difference is expected with respect to debugging and analysis of apps.
* Normal publish: `dotnet publish`
* Published files: `HelloWorld.exe`, `HelloWorld.dll`, `HelloWorld.deps.json`, `HelloWorld.runtimeconfig.json`, `HelloWorld.pdb`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: There is a proposal from @richlander to get rid of HelloWorld.deps.json and to embed HelloWorld.runtimeconfig.json by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can't drop .deps.json in all cases (Framework assembly overrides for example). And we're discussing that ideally the "embedding" would be the same technology for single-file and non-single-file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, ,deps.json file is used for assembly resolution when loading plugins.


## Related Work
* Single-file publish: `dotnet publish -r win10-x64 --self-contained=false /p:PublishSingleFile=true`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Single-file publish: `dotnet publish -r win10-x64 --self-contained=false /p:PublishSingleFile=true`
* Single-file publish: `dotnet publish -r win-x64 --self-contained=false /p:PublishSingleFile=true`

Nit: I think we should be encouraging version-less RIDs in our examples.

* Published files: `HelloWorld.exe`, `HelloWorld.pdb`
* Single-file publish Windows: `dotnet publish -r win10-x64 /p:PublishSingleFile=true`
* Published files: `HelloWorld.exe`, `HelloWorld.pdb`, `coreclr.dll`, `clrjit.dll`, `clrcompression.dll`, `mscordaccore.dll`
* Single-file publish Windows with Extraction: `dotnet publish -r win10-x64 /p:PublishSingleFile=true /p:IncludeContentInSingleFile=true /p:ExtractContentsFromSingleFile=true`
Copy link
Member

@jkotas jkotas Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to also keep the publish with Extraction for Unix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are dropping it, we should explain why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option to ExtractContentsFromSingleFile is available for Unix too, mainly for user's own native binaries that may be included with the app. I did not include it here, because it doesn't apply to HelloWorld app.


Therefore, we propose adding an API similar to [GetManifestResourceStream](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourcestream?view=netframework-4.7.2#System_Reflection_Assembly_GetManifestResourceStream_System_String_) to obtain a stream corresponding to an embedded file.
`AppContext.BaseDirectory` will be the directory where the AppHost (the single-file bundle itself) resides.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to change this even for the self-extractor case, or is that one going to stay as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's currently no "self-extractor" option in .Net 5.
For netcoreapp3.x single file apps, the AppContext.BaseDirectory will still be the extraction directory, as noted in design_3_0.md here.


Developers should be able to use the feature easily from Visual Studio. The feature will start with text based support -- by explicitly setting the `PublishSingleFile` property in the project file. In future, we may provide a single-file publish-profile, or other UI triggers.
* **Compression**: Currently the bundler does not compress the contents embedded at the end of the host binary. Compressing the bundled files and meta-data can significantly reduce the size of the single-file output (by about 30%-50% as determined by prototyping).
* **Single-file Plugins** Extended the above design to seamlessly support single-file publish for plugins.

## User Experience
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be at the start of the doc. The start of the doc should explain what we are building. The implementation details should be after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll change that, thanks

- Reading data-files from the app's native code
- Open an assembly for reflection/inspection

Therefore, we propose adding an API similar to [GetManifestResourceStream](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourcestream?view=netcore-3.1) to obtain a stream corresponding to an embedded file. This is only a draft of the proposed APIs. The actual shape of the APIs will be decided via API review process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one API that I think would make sense to add is an API that returns launching .exe path. It is clearly defined what it does and you can do that today using System.Diagnostics.Process, but it is show and comes with a lot of unnecessary dependencies. Some discussion on this is in dotnet/runtime#3704 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single-exe in .net 5, the AppContext.BaseDirectory will point to the launching EXE path.
The launching EXE will be the app.exe within this directory.

However, there are other scenarios for launching the app (ex: dotnet run) where the launching EXE is not in the same directory as the app. So, is your recommendation to find the launching EXE in these cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppContext.BaseDirectory will point to the launching EXE path

It sounds good to me.

So, is your recommendation to find the launching EXE in these cases?

It depends what you need the launching .exe for. For example, if you would like to relaunch the app under different credentials or with extra command line arguments, you do want to use the path to dotnet.exe in your example.

This is only a draft of the proposed APIs. The actual shape of the APIs will be decided via API review process.
## Handling Content

The app may want to access certain embedded content for reading, rather than loading via host/runtime. For example: bundled payload/data files. In this case, the recommended strategy is to embed the content files within appropriate managed assemblies as PE resources, and access them through [resource handling APIs](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourceinfo?view=netcore-3.1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The app may want to access certain embedded content for reading, rather than loading via host/runtime. For example: bundled payload/data files. In this case, the recommended strategy is to embed the content files within appropriate managed assemblies as PE resources, and access them through [resource handling APIs](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourceinfo?view=netcore-3.1).
The app may want to access certain embedded content for reading, rather than loading via host/runtime. For example: bundled payload/data files. In this case, the recommended strategy is to embed the content files within appropriate managed assemblies as resources, and access them through [resource handling APIs](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourceinfo?view=netcore-3.1).

| Property | Behavior when set to `true` |
| ------------------------------- | ------------------------------------------------------------ |
| `IncludeContentInSingleFile` | All published files (except symbol files) are embedded within the single-file bundle. |
| `IncludeSymbolsInSingleFile` | The symbol files (IL `.pdb` file, and the native `.ni.pdb` / `app.guid.map` files generated by ready-to-run compiler) are embedded within the symbol file. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is going to be useful only in combination with ExtractContentsFromSingleFile. Is that correct?

Or are we going to fix all tools that consume .ni.pdb, .map files, etc. to deal with them when they are embedded? It would be pretty hard to do, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the symbols are only useful when extracted.

Similarly IncludeContentInSingleFile is only useful with ExtractContentsFromSingleFile unless they are read through the new APIs that were proposed.

So, I have simplified the publish properties to remove ExtractContentsFromSingleFile option. Instead IncludeContentInSingleFile and IncludeSymbolsInSingleFile will imply extraction.

Copy link
Member

@jkotas jkotas Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the symbols are only useful when extracted.

That is true for unmanaged symbols (.ni.dll/.map). For managed symbols, we can do the work to map them from the bundle for the stacktraces. Or we can just tell people to use .pdbs embedded into IL image if they want to have managed symbols for the single file apps (ie no special work required for single file) to get stack traces with filenames and line numbers.

For the unmanaged symbols, I suspect that there would be number of other changes required. For example, .map files contain absolute file offset, so you would need to rewrite them as part of the embedding.

Similar on Windows, the sampling profilers depend on the file to be mapped as proper Windows module. So to make them work, I believe you would need to merge all R2R .PDBs into one that matches the single-file module - or something like that.

I think this paragraph about symbols needs to have customer scenarios: What is and is not going to work, and why it is important. Right now it just talks about implementation details that is not good enough.

BTW: I am not convinced that we need to be embedding and extracting the unmanaged symbols (.map/.ni.pdbs). The scenarios for it sounds pretty marginal. However, we should make sure that the unmanaged symbols are still usable after embeding - that is the important part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with @mikem8361 about this.

Mike said that the the .map files store physical offsets that are interpretted wrt the module load. Today, R2R assemblies are loaded on Linux via mapping the files and adjusting offsets -- the .map files are expected to work correctly in this case. For single-file apps, R2R assemblies will again be loaded via MappedImageLayout (albeit after mapping from a different offset within the single-file bundle). So, the .map files are expected to work OK.

Mike also said that the Windows profiler should be able to handle the .ni.pdb files without the need to merge them into one. But I don't know much details about the profiler.

I agree that embedding symbol files within the bundle is not an important use case. IncludeSymbolsInSingleFile was provided in .net core 3 based on request by a few customers -- so I carried forward the option. But nobody has asked for embedding the .ni.pdb/ .map files yet.

Copy link
Member

@jkotas jkotas Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing the homework on this. I think this is what the plan should be:

  • IncludeSymbolsInSingleFile option is provided for compatibility with .NET 3 single-file mode. It extracts files to disk.
  • Embedded portable PDBs are the solution for proper no-extract-to-disk single file. This should work today already.
  • Unless we hear somebody asking for it with a good backing scenario, we will not do anything for embedding the .ni.pdb/ .map files

Thoughts?

| ------------------------------- | ------------------------------------------------------------ |
| `IncludeContentInSingleFile` | All published files (except symbol files) are embedded within the single-file bundle. |
| `IncludeSymbolsInSingleFile` | The symbol files (IL `.pdb` file, and the native `.ni.pdb` / `app.guid.map` files generated by ready-to-run compiler) are embedded within the symbol file. |
| `ExtractContentsFromSingleFile` | The files that cannot be processed directly from the bundle are extracted out to disk at startup. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the scenarios where I include everything (pdb or not) but don't extract? Only the cases where these are accessed from the app via new APIs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider the other default (if included extract by default) - that may lead to more successful application to random code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### The Host
Once the single-file-publish tooling is added to the publish pipeline, other static binary transformation tools may need to adapt its presence. For example, tools like [Fody](https://github.com/Fody/Fody) that use `AfterBuild`/`AfterPublish` targets may need to adapt to expect the significantly different output generated by publishing to a single file. The goal in this case is to provide sufficient documentation and guidance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have the problem already in 3.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the work is provide documentation for dotnet publish page, once the documentation for 3.x is live.


* If the app needs extraction, extracts out the appropriate files as explained in [this document](extractor.md).

* Reads the `deps.json` file directly from the bundle and resolves dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably changes to the dependency resolution and potentially even .deps.json - this needs a detailed analysis. Basically the hostpolicy must be able to "probe" in the bundle alongside disk probe paths - priority and handling of paths from .deps.json is something we will have to define.

Copy link
Contributor Author

@swaroop-sridhar swaroop-sridhar Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HostPolicy probing can continue as if the contents of the bundle were available alongside the app (using bundle_probe instead of file probing. I did not elaborate on this yet, since deps.json probing itself is still under discussion. I agree that this requires more explanation -- I'll think a little more about it.

* The `bundle_probe` function pointer is passed to the runtime (encoded as a string) through a property named `BUNDLE_PROBE`.

* The contents of the bundle are not included in `TRUSTED_PLATFORM_ASSEMBLIES`. Any additional assemblies on disk (ex: due to `additional-deps` /`AdditionalProbingPaths`) are included in `TRUSTED_PLATFORM_ASSEMBLIES`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention that NATIVE_DLL_SEARCH_DIRECTORIES are not affected (as in all native files are expected to be "External" or "Extracted") - this might need some "Tricks" for the few native dlls from CoreFX which will be bundled directly - for example .deps.json includes records for coreclr.dll, clrcompression.dll and so on.

We should also discuss what changes are going to happen for PLATFORM_RESOURCE_ROOTS - what happens if the bundle contains satellites and how are the probing paths affected and how (if at all) the runtime's satellite probing is affected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Tricks" for the few native dlls from CoreFX which will be bundled directly

We should get these linked into coreclr.dll to avoid need for these tricks. We are doing that for System.Globalization.Native already - dotnet/runtime#6928

Copy link
Contributor Author

@swaroop-sridhar swaroop-sridhar Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extraction directory (if any) will be added to NATIVE_DLL_SEARCH_DIRECTORIES.

The runtime probes the satellite assemblies in the bundle before looking in PLATFORM_RESOURCE_ROOTS. There is no change anticipated to PLATFORM_RESOURCE_ROOTS list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "tricks" I meant changes we made in the prototype so that [DllImport("clrcompression.dll")] continues to work even when clrcompression.dll is statically linked with the host. I don't think that will just magically work.
Another consideration is that hostpolicy is not aware of these changes, so maybe we will need to make it aware (so that it doesn't put a relative path to the NATIVE_DLL_SEARCH_DIRECTORIES as that may actually cause potential security issues).

* For exception stack trace source information, error string resources on Windows: `Microsoft.DiaSymReader.Native.amd64.dll`, `mscorrc.debug.dll`, `mscorrc.dll`

When targeting `win7` platform, several additional DLLs are necessary (`api-*.dll`) to handle API compatibility. These files must be alongside the AppHost for the app to even start execution.
Therefore, we propose that targeting `win7-*` should not be supported when publishing apps as a single-file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky one - please run this by PMs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richlander - What is the decision for Win7?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsp-msft @richlander To state the constraint more explicitly, if we do support win7 targets, then the self-contained single-file publish for hello-world console app will be 47 files, instead of 224 files. These files cannot be bundled within the app and later extracted. Please see: dotnet/runtime#13356 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we should be able to build the apphost as a Win7 compatible executable without the additional dlls (after all Win10 can run Win7 exes just fine). The question is if that has other side effects - like increasing the size of the apphost, or turning on compatibility shims on Win10 and so on. In the extreme we could have a Win7 specific apphost.


### Dependency Resolution

In order to resolve assemblies, the embedded resources are probed first, followed by other probing paths encoded in `TRUSTED_PLATFORM_ASSEMBLIES`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to resolve assemblies, the embedded resources are probed first, followed by other probing paths encoded in `TRUSTED_PLATFORM_ASSEMBLIES`.
In order to resolve assemblies, runtime will first query the bundle probe for it followed by exiting search in `TRUSTED_PLATFORM_ASSEMBLIES` followed by any existing additional probing.

The app may want to access certain embedded content for reading, rather than loading via host/runtime. For example: bundled payload/data files. In this case, the recommended strategy is to embed the content files within appropriate managed assemblies as PE resources, and access them through [resource handling APIs](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourceinfo?view=netcore-3.1).

However, explicit access to the embedded files is useful in situations such as:
- Reading data-files from the app's native code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we recommend to store everything as resource, then that approach should work for all code (managed or native). The new API doesn't add any new value - it just makes it so that it's possible to not use managed resources. Which we probably don't want people to do.

- Reading data-files from the app's native code
- Open an assembly for reflection/inspection

Therefore, we propose adding an API similar to [GetManifestResourceStream](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.assembly.getmanifestresourcestream?view=netcore-3.1) to obtain a stream corresponding to an embedded file. This is only a draft of the proposed APIs. The actual shape of the APIs will be decided via API review process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that we might want to ship without these APIs to force people toward the managed resources solution. The only scenario which would need the new APIs are cases where the app needs access to full assemblies - not sure if that's an important enough case to warrant a new API.

* Otherwise, the symlink is created under `~/.cache/dotnet/<app>/<bundle-id>`
* In order to check for reuse on startup:
* If the symbolic link exists and isn't stale (points to a directory owned by the user, with correct permissions (`0700`), etc.), that's what it is used;
* If the symbolic link does not exist (or exists and is stale), it is removed, a new directory is created with `mkdtemp()`, and the link is re-created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpereira worked on prototyping another alternate mechanism for extracting files on Linux (when necessary) using FUSE -- to extract to an in-memory file system. @lpereira can add additional details.

For example, to place some files in the publish directory but not bundle them in the single-file:
| Property | Behavior when set to `true` |
| ---------------------------- | ------------------------------------------------------------ |
| `IncludeContentInSingleFile` | All published files (except symbol files) are embedded within the single-file bundle. The files that cannot be processed directly from the bundle are extracted out to disk at startup. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do people find where these files are extracted to with the changed meaning of AppContext.BaseDirectory property?

Copy link
Contributor Author

@swaroop-sridhar swaroop-sridhar Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my proposal I originally had a new API GetContentRoot() API for locating the base directory of extracted files:

namespace System.Runtime.Loader
{
    public partial class Bundle
    {
        // Check whether an app is running from a single-file bundle
        public static bool IsBundle();
        
        // Get the location where contents of the bundle are extracted, if any
        public static string GetContentRoot();
    }
}

However, a different viewpoint is that the extraction is somewhat an implementation detail, and the app should not learn about it. That is, the contents of the extracted directory should be accessed via internal mechanisms. For example, native library loads through PInvoke or NativeLibrary.Load() API that search through known paths NATIVE_DLL_SEARCH_DIRECTORIES.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with limiting the extraction to the unmanaged .dlls that cannot be embedded. In that case, it may better to use more specific name for this, e.g. IncludeNativeLibrariesInSingleFile.

@nguerrera
Copy link
Contributor

@dsplaisted @wli3


* If the assembly is loaded directly from the bundle, return empty-string
* If the assembly is extracted to disk, return the location of the extracted file.
Proposed solution is for `Assembly.Location` to return `null` for bundled assemblies, which is the default behavior for assemblies loaded from memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should actually be an empty string, not null, as per the docs (I came back to this while helping .NET Native customer with yet another issue caused by this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MichalStrehovsky, I'll fix this.


* The `bundle_probe` function pointer is passed to the runtime (encoded as a string) through a property named `BUNDLE_PROBE`.

* When probing for assemblies, the the [assembly resolution logic](https://github.com/dotnet/runtime/blob/4f9ae42d861fcb4be2fcd5d3d55d5f227d30e723/docs/design/features/assembly-conflict-resolution.md#probe-ordering) will treat bundled assemblies similar to assemblies in the app directory. The probe ordering will be in the order: servicing location, shared store, framework directory(s) from higher to lower, the *single-file bundle*, app directory, additional locations specified in runtimeconfig.dev.json file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitek-karas it is interesting whether the "App directory" should be a part of the default probing paths at all, for single-files.

If the user chooses to not bundle certain assemblies, those files will be left beside the app in the publish directory. We'd like to be able to load these assemblies without having to set additionalProbingPaths

However, the single file app may live in a directory with many other apps, assemblies and random files. The app shouldn't pick up a different assembly simply because it lies beside the app and has a later version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probing doesn't work that way. The algorithm goes in the order of probing paths and if it finds a match, it stops and returns. The host never opens the file (it only checks for existence), and so it doesn't read the file version (or assembly version).

There is some logic comparing versions, but it's solely based on versions written in .deps.json - and it only works to resolve conflicts between the app and the frameworks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also forgot to add: I do think we should probe the app's path for files which are left beside it (and for cases where the app uses plugins and such).
There's an interesting question what should happen for apps which are single-file and extracted. In that case the probing should probably look both in the extraction folder and in the app's .exe folder (in that order).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @vitek-karas, I forgot about this detail -- we'd talked about this issue.

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the probe ordering. Otherwise mostly nits.
The exact behavior of native dlls and hotpolicy is something we can figure out later on (whatever it will do doesn't change the design of the feature).

@@ -101,22 +127,26 @@ On Startup, the [host components](https://github.com/dotnet/core-setup/blob/mast

* Implements a bundle-probe function, which will be used by the runtime to probe contents of the bundle when resolving assemblies.

* Bundle probe has the following signature:
```C++
/// <summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation


* The `bundle_probe` function pointer is passed to the runtime (encoded as a string) through a property named `BUNDLE_PROBE`.

* When probing for assemblies, the the [assembly resolution logic](https://github.com/dotnet/runtime/blob/4f9ae42d861fcb4be2fcd5d3d55d5f227d30e723/docs/design/features/assembly-conflict-resolution.md#probe-ordering) will treat bundled assemblies similar to assemblies in the app directory. The probe ordering will be in the order: servicing location, shared store, framework directory(s) from higher to lower, the *single-file bundle*, app directory, additional locations specified in runtimeconfig.dev.json file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated the doc with the correct probe order (as is in the code): https://github.com/dotnet/runtime/blob/master/docs/design/features/assembly-conflict-resolution.md. Please fix the text in this document to match (app goes before framework, bundle should go before app).

Nit: double the on the first line.


* The `bundle_probe` function pointer is passed to the runtime (encoded as a string) through a property named `BUNDLE_PROBE`.

* When probing for assemblies, the the [assembly resolution logic](https://github.com/dotnet/runtime/blob/4f9ae42d861fcb4be2fcd5d3d55d5f227d30e723/docs/design/features/assembly-conflict-resolution.md#probe-ordering) will treat bundled assemblies similar to assemblies in the app directory. The probe ordering will be in the order: servicing location, shared store, framework directory(s) from higher to lower, the *single-file bundle*, app directory, additional locations specified in runtimeconfig.dev.json file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also forgot to add: I do think we should probe the app's path for files which are left beside it (and for cases where the app uses plugins and such).
There's an interesting question what should happen for apps which are single-file and extracted. In that case the probing should probably look both in the extraction folder and in the app's .exe folder (in that order).

/// </summary>
bool bundle_probe(const char *path, int64_t *size, int64_t *offset);
```
* The assemblies on disk are listed in `TRUSTED_PLATFORM_ASSEMBLIES` using absolute paths. The assemblies to be loaded from the single-file bundle will be listed in `TRUSTED_PLATFORM_ASSEMBLIES` using relative paths (as a distinguishing convention).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it super clear to what the relative paths are "relative". I know it's a bit weird, but it's important. Not sure if we describe the fact that the bundle stores a directory structure and the paths are effectively relative to the bundle root. But we probably should. And maybe even mention how does that map to the app's layout when built without single-file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'll add a note about recording relative-paths of file names in the meta-data of the bundle.

Actually, since the contents of the bundle always mask the contents of the app-directory, we need not actually store relative paths in the bundle. We could equivalently store absolute paths relative to the app-directory (which will be AppContext.BaseDirectory. In the single-file prototype, this was a convenience, because many parts of the runtime don;t like to handle relative paths.

However, writing relative paths in the TPA list makes the contents coming from the bundle evident for visual inspection.

* The `bundle_probe` function pointer is passed to the runtime (encoded as a string) through a property named `BUNDLE_PROBE`.

* The contents of the bundle are not included in `TRUSTED_PLATFORM_ASSEMBLIES`. Any additional assemblies on disk (ex: due to `additional-deps` /`AdditionalProbingPaths`) are included in `TRUSTED_PLATFORM_ASSEMBLIES`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "tricks" I meant changes we made in the prototype so that [DllImport("clrcompression.dll")] continues to work even when clrcompression.dll is statically linked with the host. I don't think that will just magically work.
Another consideration is that hostpolicy is not aware of these changes, so maybe we will need to make it aware (so that it doesn't put a relative path to the NATIVE_DLL_SEARCH_DIRECTORIES as that may actually cause potential security issues).

@@ -1,62 +1,53 @@
# Single-file Publish

Design for publishing apps as a single-file in .Net Core 3.0
This document describes the design single-file apps in .net 5.0.
The design of single-file apps in .net core 3.0 can be found [here](3.0/bundler.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The design of single-file apps in .net core 3.0 can be found [here](3.0/bundler.md)
The design of single-file apps in .NET Core 3.0 can be found [here](3.0/bundler.md)

@@ -1,62 +1,53 @@
# Single-file Publish

Design for publishing apps as a single-file in .Net Core 3.0
This document describes the design single-file apps in .net 5.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document describes the design single-file apps in .net 5.0.
This document describes the design single-file apps in .NET 5.0.


#### Goals

In .Net Core 3.0, we plan to implement a solution that
In .Net Core 5.0, we plan to implement a solution that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In .Net Core 5.0, we plan to implement a solution that:
In .NET 5.0, we plan to implement a solution that:

* Usable with debuggers and tools: The single-file should be debuggable using the generated symbol file. It should also be usable with profilers and tools similar to a non-bundled app.

This feature-set is described as Stage 2 in the [staging document](staging.md), and can be improvised in further releases.
* Is widely compatible: Apps containing MSIL assemblies, ready-to-run assemblies, composite assemblies, native binaries, configuration files, etc. can be packaged into one executable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Is widely compatible: Apps containing MSIL assemblies, ready-to-run assemblies, composite assemblies, native binaries, configuration files, etc. can be packaged into one executable.
* Is widely compatible: Apps containing CIL assemblies, ready-to-run assemblies, composite assemblies, native binaries, configuration files, etc. can be packaged into one executable.

We need to determine the semantics of current APIs such as `Assembly.Location` that return the information about an assembly's location on disk.
* ReadyToRun Assemblies
* On Linux, ReadyToRun assemblies are loaded directly from bundle. The various sections in the PE file are mapped at appropriate addresses, and offsets are fixed up.
* On MAC, ReadyToRun assemblies are loaded similar to Linux. However, Mojave hardened runtime doesn't allow executable mappings of a file. Therefore, the contents of an assembly are read from the bundle into pre-allocated executable memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* On MAC, ReadyToRun assemblies are loaded similar to Linux. However, Mojave hardened runtime doesn't allow executable mappings of a file. Therefore, the contents of an assembly are read from the bundle into pre-allocated executable memory.
* On Mac, ReadyToRun assemblies are loaded similar to Linux. However, Mojave hardened runtime doesn't allow executable mappings of a file. Therefore, the contents of an assembly are read from the bundle into pre-allocated executable memory.

* Apps using managed, ready-to-run, native code
* Framework-dependent and self-contained apps
* Apps with content explicitly annotated for inclusion/exclusion in the single-file bundle
* MSIL files with embedded PDBs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* MSIL files with embedded PDBs
* CIL files with embedded PDBs

@@ -0,0 +1,136 @@
# Single-file Publish

The design of single-file apps in .net core 3.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The design of single-file apps in .net core 3.0.
The design of single-file apps in .NET Core 3.0.


### Existing API
* IL assemblies are loaded directly from the bundle.
* The portion of the single-file bundle containing the required assembly is memory mapped, and the contents are appropriately interpreted by the runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire bundle file can be quite large, and on 32-bit systems there might be not enough virtual address to do this. There should be a fallback.

Copy link
Contributor Author

@swaroop-sridhar swaroop-sridhar Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpereira; Actually, here only the relevant portions of the bundle corresponding to the IL file are memory mapped -- not the entire bundle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32-bit systems there might be not enough virtual address to do this

The bundle sizes large enough to hit this problem should be a explicit non-goal.


* When probing for assemblies, the [host probing logic](https://github.com/dotnet/runtime/blob/master/docs/design/features/host-probing.md#probing-paths) will treat bundled assemblies similar to assemblies in the app directory. The probe ordering will be in the order: servicing location, the *single-file bundle*, app directory, framework directory(s) from higher to lower, shared store, additional specified probing paths.

* The assemblies on disk are listed in `TRUSTED_PLATFORM_ASSEMBLIES` using absolute paths. As a distinguishing convention, the assemblies to be loaded from the single-file bundle will be listed in `TRUSTED_PLATFORM_ASSEMBLIES` using relative paths (as noted in the [bundle meta-data](bundler.md#Bundle-Transformation)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative option would be to just use the probing function, and not include embedded assemblies at all.

Why is it a good idea to include the embedded assemblies on TRUSTED_PLATFORM_ASSEMBLIES? Do we expect that it will make more things "work"?

Note that TRUSTED_PLATFORM_ASSEMBLIES list is used by code out there, and the code generally expect the paths to be actual file paths. One example from many: https://github.com/genesisdotnet/genesis/blob/master/src/Genesis/GenesisGlobals.cs + https://github.com/genesisdotnet/genesis/blob/8cb94f1fe2242bd9bb6c7ebdee4178ad35d7de99/src/Executors/Input/Genesis.Input.SwaggerUrl/SwaggerUrlReader.cs#L282

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good reason to go either way. Your argument about code expecting to find the assemblies in the TPA can also be used to say that not including them at all is a problem as well. It's kind of a toss up what's worse, having "Wrong" file paths, or not having the files at all. I guess it depends on the app what effect that will have.

Given that the bundle stores a directory tree with files in it, it's in theory possible for the runtime to need to know where to look for the assembly in question. That would have to be provided by the hostpolicy (as a result of processing .deps.json). That said, I can't think of a case where this would actually happen though - managed assemblies should basically always remain in the "root', except for satellites which follow their own layout.

I searched github a bit and using the TPA as source of file paths (which are actually used as file paths) is by far the most common case. I found dozens of cases like that and only one where TPA was used just to list assembly names. Also, Roslyn actually has code in it which uses TPA to get file paths:
https://github.com/dotnet/roslyn/blob/e704ca635bd6de70a0250e34c4567c7a28fa9f6d/src/Scripting/Core/Hosting/Resolvers/RuntimeMetadataReferenceResolver.cs#L186

Given this I tend to agree that we should probably not list the bundle assemblies in TPA at all. It will still break these apps, but at least it will break them consistently. Returning relative paths (most likely just Assembly.dll without any directory) has a higher chance of accidentally working, but with the wrong file being loaded.

We should also track work to add detection of this pattern into the linker analyzer - basically AppContext.GetData("TRUSTED_PLATOFRM_ASSEMBLIES") should always be marked as potentially dangerous for single-file scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of the discussion above, I was about to change the look up mechanism to:
• The Host doesn’t include bundled assemblies in TRUSTED_PLATFORM_ASSEMBLIES
• The runtime probes TRUSTED_PLATFORM_ASSEMBLIES first (so that servicing locations get priority), and then probes the bundle.

However, for satellite assemblies (assuming that we want to leave them servicible), the PLATFORM_RESOURCE_ROOTS are a set of directory paths. So, if we probe PLATFORM_RESOURCE_ROOTS first, some files in those paths may unintentionally override the ones found in the bundle.

So, I've written the logic as:
• If a bundled assembly is overridden by a servicing location, exclude it from the bundle_probe listing
• Probe in the bundle first, followed by TRUSTED_PLATFORM_ASSEMBLIES / PLATFORM_RESOURCE_ROOTS as applicable.

| ------------------------------------ | ------------------------------------------------------------ |
| `IncludeNativeLibrariesInSingleFile` | Bundle published native binaries into the single-file app. |
| `IncludeSymbolsInSingleFile` | Bundle symbol files (IL `.pdb` file, and the native `.ni.pdb` / `app.guid.map` files generated by ready-to-run compiler) into the single file app. |
| `IncludeAllContentInSingleFile` | Bundle all published files (except symbol files) into single-file app. This option is proposed to replicate the .NET Core 3.x version of single-file apps. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `IncludeAllContentInSingleFile` | Bundle all published files (except symbol files) into single-file app. This option is proposed to replicate the .NET Core 3.x version of single-file apps. |
| `IncludeAllContentInSingleFile` | Bundle all published files (except symbol files) into single-file app. This option provides backward compatibility with the .NET Core 3.x version of single-file apps. |

* The directory where embedded files are extracted out: This option enables easy access to content files spilled to disk. However, if no files need to be extracted to disk for an application bundle, there's no extraction directory.
* The extraction-directory if files are extracted, the AppHost directory otherwise. The limitation to this approach is that the value of `AppContext.BaseDirectory` changes with the way applications are packaged/executed.
* The directory where `AppHost` binary resides (always).
However, when single file apps are published with `IncludeAllContentInSingleFile` property set (which emulates .NET Core 3.x bundling behavior), `AppContext.BaseDirectory` returns the extraction directory, following [.NET Core 3.x semantics](design_3_0.md#API-Impact).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, when single file apps are published with `IncludeAllContentInSingleFile` property set (which emulates .NET Core 3.x bundling behavior), `AppContext.BaseDirectory` returns the extraction directory, following [.NET Core 3.x semantics](design_3_0.md#API-Impact).
However, when single file apps are published with `IncludeAllContentInSingleFile` property set (which provides backward compatibility with .NET Core 3.x bundling behavior), `AppContext.BaseDirectory` returns the extraction directory, following [.NET Core 3.x semantics](design_3_0.md#API-Impact).

Instead of "replicate" or "emulate", I think it is better say directly that this option is for .NET Core 3.x compat.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You have put a lot of work into this.

Co-Authored-By: Jan Kotas <jkotas@microsoft.com>
<Content Update="*.xml">
<CopyToPublishDirectory>PreserveNewest</CopyToPublishDirectory>
<ExcludeFromSingleFile>true</ExcludeFromSingleFile>
</Content>
Copy link

@Nirmal4G Nirmal4G Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swaroop-sridhar

Can I also set it like this?

<Content Remove="External\*.xml" Condition="'$(ExcludeFromSingleFile)' == 'true'"/>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.