Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Runtime Identifiers are not scalable #5862

Open
ericstj opened this issue Sep 7, 2017 · 38 comments
Open

[RFC] Runtime Identifiers are not scalable #5862

ericstj opened this issue Sep 7, 2017 · 38 comments
Assignees
Milestone

Comments

@ericstj
Copy link

ericstj commented Sep 7, 2017

I'm opening this issue here rather than https://github.com/dotnet/core-setup (host, platform abstractions), https://github.com/dotnet/corefx (RID-graph), or https://github.com/dotnet/sdk (runtime deps generation). The issue impacts all of these, including NuGet which was the source of the RID behavior as an opaque string.

The problems are as follows:

  1. Existing supported OS releases a new version resulting in a new calculated RID and this RID cannot be used for restore/runtime-selection due to missing compatibility mappings until we update the RID-graph.
  2. New linux distro comes along and should be treated as Linux-arch but is not.

This stems from the definition of RIDs being opaque strings and the only source of relationships being runtime.json / nuget.

Definitions:

  • OS-name: specific name of an operating system or distro excluding version, eg: osx, ubuntu, win
  • OS-version: specific version of an operating system or distro, eg: 10, 10.13, 7.4
  • OS: a combination of OS-name and OS-version to represent a specific release of an operating system or distro.
  • OS-family: a group of OS-name for the purpose of representing compatible behavior. eg: linux, unix

See also: https://github.com/dotnet/corefx/blob/master/pkg/Microsoft.NETCore.Platforms/readme.md

The behavior I'd like to see is as follows:

  1. For an OS that's already defined an OS-name relationship in the RID-graph, that relationship should hold for future OS-versions that aren't present in the RID graph. EG: osx.10.20-x64 should be considered compatible with osx-x64 (and all of its mappings in turn). If a RID is present in the graph explicitly this inference would not need to happen.
  2. For an OS that's already defined an OS-version relationship in the RID-graph, that relationship should hold for future versions that aren't present in the RID graph. EG: osx.10.20-x64 should be considered compatible with osx.10.11-x64 (and all of its mappings in turn). If a RID is present in the graph explicitly this inference would not need to happen.
  3. For an OS-name that does not appear in the RID-graph that OS-name could be treated as part of some default OS-family (potentially defined by the project).

I can imagine solutions to this that have changes in just the host and tooling, as well as changes that may require a NuGet update as well. I wanted to get a discussion going to understand if its something we can fix.

/cc @Petermarcu @davidfowl @anurse @emgarten @eerhardt @steveharter

@eerhardt
Copy link

What's the plan with this issue? Is this something that can be addressed in the .NET Core 2.1 time frame?

@tmds
Copy link

tmds commented Dec 11, 2017

I think source-build is responsible for updating the runtime.json so it includes the target rid (and possible parents).
If the rid isn't in the corefx runtime.json, other dotnet sdks won't be able to target it, but it should be fully functional on its own.

I know source-build has had a task for updating the runtime.json file, but I'm not sure what the current behavior is. When trying to build 2.0 for fedora.27, the build failed because the rid was unknown.
This relates to: dotnet/source-build#297.

@eerhardt
Copy link

eerhardt commented Jan 4, 2019

We hit this issue again with alpine 3.8, since it isn't in the RID graph, but it is the distro that is used in our 2.2 docker images. See https://github.com/dotnet/corefx/issues/34316.

@rrelyea
Copy link
Contributor

rrelyea commented Jan 9, 2019

@eerhardt - talking with @dsplaisted, @nguerrera and @livarcocc about RIDs recently.
The goal is to not need runtime.json in netcore 3 scenarios.

I'm unsure how that recent discussion intersects with the inadequacy mentioned in the RID system this issue by @ericstj

@bording
Copy link

bording commented Jan 9, 2019

@rrelyea Is there any place I can find more information about the plans for .NET Core 3 not needing runtime.json?

I'm curious about this because it seems like even if .NET Core 3 itself isn't relying on it, there still could be a problem for packages that need to include RID-specific assets if NuGet still needs the RID graph to determine which assets to use.

@ericstj
Copy link
Author

ericstj commented Jan 9, 2019

@rrelyea that's not entirely true.

We don't need runtime.json to define RID-specific package associations for the framework package, since that is going to be lifted into the toolchain with the runtime-packs feature: https://github.com/dotnet/designs-microsoft/pull/38.

We still need runtime.json to define the RID graph for RID-specific asset selection from a single package. In a shared framework app this would be important for defining the RID fallback chain that gets used by a host (granted the host could change how it determines this fallback chain). For a self-contained app NuGet still uses this for selecting the "best" runtime-specific assets from a package.

@nguerrera
Copy link

Public link for .net core 3 targeting packs and runtime packs: dotnet/designs#50.

Yes, we still need a rid graph. We have been exploring two things:

  1. Can the graph be supplied to NuGet without coming from a package: Mechanism for supplying runtime.json outside of a package #7351
  2. Can we defer the selection of rid specific assets from packages to after restore: https://github.com/dotnet/cli/issues/10528

@dsplaisted
Copy link

We will still need a runtime graph (currently defined via runtime.json) for .NET Core 3. However, instead of coming in via a a package reference, we plan to bundle it in the SDK. We are also considering handling RID selection in the SDK instead of NuGet.

These changes for .NET Core 3 won't directly help address the RID graph complexity, but they may help a bit with supporting new RIDs (as they can be delivered in an updated SDK).

@tmds
Copy link

tmds commented Jan 10, 2019

Question: the runtime itself is also aware of the rid graph, right? If not, how does a framework-dependent application pick managed&native libraries which are rid specific?

@dsplaisted
Copy link

@tmds I believe the runtime has a "RID linked list", ie the subset of the RID graph that is compatible with that runtime, which allows it to pick the best compatible RID from available assets for framework-dependent apps.

@tmds
Copy link

tmds commented Jan 10, 2019

Do you know where the list is stored? And what code is using that list to pick the libraries?
Does mono support this too?

@eerhardt
Copy link

eerhardt commented Jan 10, 2019

Do you know where the list is stored?

It is stored in the shared framework's .deps.json file. dotnet\shared\Microsoft.NETCore.App\3.0.0-preview-27122-01\Microsoft.NETCore.App.deps.json. See the "runtimes" section at the bottom.

And what code is using that list to pick the libraries?

It is contained in the hostpolicy.dll|so|dylib. https://github.com/dotnet/core-setup/search?utf8=%E2%9C%93&q=rid_fallback_graph_t&type=

NOTE: It is also exposed through the DependencyModel library. You just need to ensure you are parsing the shared framework's .deps.json file. Here's an example of the CLI reading this info in managed code

Does mono support this too?

I don't believe so. But I could be wrong.

@tmds
Copy link

tmds commented Jan 15, 2019

Thanks for the explanation @eerhardt

@nguerrera
Copy link

@vitek-karas

@eerhardt
Copy link

I wonder if we could somehow add a "hybrid" approach here to make progress where it really counts - our linux RIDs. (Windows and macOS RIDs don't really have this problem in the current world.)

Looking at the proposal for the next manylinux spec in Python- https://discuss.python.org/t/the-next-manylinux-specification/1043, the Option 2: A ‘perennial’ manylinux section seems to have a pretty decent structure that we could follow.

manylinux_${libc provider}_${libc version}_${arch}

Examples:

  • manylinux_glibc_2_12_x86_64
  • manylinux_musl_1_1_x86_64

We could have a hybrid RID approach where we use the current RID fallback mechanisms in place today. And if no assets are found, then fall back into the manylinux_ RID checks where we check for assets for the current libc_provider, libc_version, arch. The important part here is that the manylinux_ set of RIDs is explicitly NOT an opaque string. People can format and parse this RID using the above structure. And even more important is the libc_version is a "greater than or equal" clause, meaning "this asset will work on all versions greater than or equal to this version of this libc provider."

If, in the future, we needed to support other many{OS Family}_ RIDs, we could add them. But today we could solve this with linux, which would solve it for the majority of the places we currently have issues.

@tmds
Copy link

tmds commented Jun 20, 2019

We have hit this issue here: dotnet/source-build#1083
Fedora 30 isn't in the graph, so it doesn't derive from fedora-x64, but falls back to the generic linux-x64.

This is similar to the issue here: https://github.com/dotnet/corefx/issues/34316. Alpine 3.8 not in the graph, so it doesn't derive from linux-musl-x64, but falls back to the generic linux-x64.

It would be nice if we can improve the situation for known distributions that have a newer version.

CC @omajid

@rrelyea
Copy link
Contributor

rrelyea commented Jul 10, 2019

@nguerrera - the recent work for netcore 3 to provide the ability to have it provide the runtime.json -- does that help things?

@nguerrera
Copy link

@rrelyea Not really, we still have to update the graph too much. I think where the json file comes from is orthogonal to not having to spell out all the versions, etc.

@tmds
Copy link

tmds commented Sep 13, 2019

I had a look at what happens to runtime.json for Linux platforms if you leave out the implicit relationship that comes from the rid name: e.g. ubuntu.18.04[-x64], imports ubuntu.18[-x64], ubuntu[-x64].

Derive from linux/linux-musl:

debian: linux
fedora: linux
gentoo: linux
opensuse: linux
rhel: linux
sles: linux
alpine: linux-musl

Mapping between distros:

ol: rhel
ol.7: rhel.7
ol.8: rhel.8
linuxmint.17: ubuntu.14.04
linuxmint.18: ubuntu.16.04
linuxmint.19: ubuntu.18.04
ubuntu: debian

Express compatibility between versions:

alpine.3.10: alpine.3.9
alpine.3.11: alpine.3.10
alpine.3.7: alpine.3.6
alpine.3.8: alpine.3.7
alpine.3.9: alpine.3.8

rhel.7.1: rhel.7.0
rhel.7.2: rhel.7.1
rhel.7.3: rhel.7.2
rhel.7.4: rhel.7.3
rhel.7.5: rhel.7.4
rhel.7.6: rhel.7.5
rhel.8.1: rhel.8.0

ol.7.1: ol.7.0, rhel.7.1
ol.7.2: ol.7.1, rhel.7.2
ol.7.3: ol.7.2, rhel.7.3
ol.7.4: ol.7.3, rhel.7.4
ol.7.5: ol.7.4, rhel.7.5
ol.7.6: ol.7.5, rhel.7.6

sles.12.2: sles.12.1
sles.12.3: sles.12.2
sles.12.4: sles.12.3
sles.15: sles.12.4

linuxmint.17.2: linuxmint.17.1
linuxmint.17.3: linuxmint.17.2

linuxmint.18.2: linuxmint.18.1
linuxmint.18.3: linuxmint.18.2

linuxmint.19.2: linuxmint.19.1

This has rids which shouldn't be used. If the distro has binary compatibility between major versions because it's meant for in-place upgrades, the base rid should be used (rhel.7 instead of rhel.7.1).
I think this is the case for all these distros except alpine.

For Alpine, you need the minors (e.g. alpine.3.11) to target a version where binary compatibility changed, and this mapping is needed to assume forwards compatibility.

@ericstj
Copy link
Author

ericstj commented Sep 13, 2019

@tmds if I understand you correctly, you're suggesting that the minor version RIDs for most distros aren't all that useful (assuming folks don't need to target those versions specifically).

Even if minor version were omitted from the RID, we'd still need to define all major versions and their relationships. Today the system requires literal entries in the graph for every possible option. We'd really like more flexibility here to avoid this.

@tmds
Copy link

tmds commented Sep 16, 2019

I guess all code using the rid graph ends up doing these steps:

  1. determine current platform rid, or get a value from user (e.g. -r argument).
  2. use rid graph to turn it in to an ordered list (e.g. fedora.30-x64 is preferred over linux-x64).
  3. use from available assets the ones that are highest in the ordered list.

The suggestion is that we can reduce the size of the rid graph if we take some information from the rid value from step 1: e.g. ubuntu.18.04[-x64], imports ubuntu.18[-x64], imports ubuntu[-x64].

What then remains in the rid graph is:

  • Derive from linux/linux-musl
  • Mapping between distros
  • Express forward compatibility between minors

The graph is much smaller, as we don't have to list out every distro version.

Next, it would also make sense to generally assume forward compatibility. This requires that step 3 is capable of extracting version info from the rids to fall back. For example, an asset is available for fedora.31, and the current system is fedora.32.
Only alpine has such forward compatibility declared in the rid graph. For other distros, this scenario fails.

@tmds
Copy link

tmds commented Oct 23, 2019

@ericstj , all, do you have some thoughts on the suggestions from my previous comment?

@ericstj
Copy link
Author

ericstj commented Oct 23, 2019

@tmds I don't disagree. We try to represent the basic concepts of the RID graph with the code-generation steps we did in CoreFx around it. I don't think this issue is yet at the "suggest a workable implementation phase", it's at the "acknowledge the problem and agree to fix it phase". We need @rrelyea to agree to invest in the NuGet side and @jeffschwMSFT to agree to do the same on the host side. Once we have that we can move on to designing a solution together.

@tmds
Copy link

tmds commented Oct 29, 2019

at the "acknowledge the problem

This is how it manifests for us:

  • New version of Fedora don't work because they don't map to the appropriate parent rids. To avoid this, we try to patch the runtime.json up-front for versions that don't exist yet (but are expected in the near future).
  • Since for Fedora rids don't map to their previous version, it's not possible to express 'fedora.30' and higher.
  • Patching runtime.json is a manual, ad-hoc operation.

@tmds
Copy link

tmds commented Sep 14, 2020

@ericstj @eerhardt are there some plans in .NET 5 to improve this?

I hope this issue gets some love in .NET 6.
There shouldn't be a need to continue adding new releases of a distro to keep it working with .NET Core.

@omajid
Copy link

omajid commented Feb 17, 2021

@crummel mentioned that he is reaching out to @ericstj to see if we can get this moving again.

As it is, things are not scalable and we run into surprising build issues all the time.

For example, Fedora 35 just got forked from Fedora 34. Both are still in development, and they are identical right now (aside from the different RIDs). Because .NET doesn't know about Fedora 35's RID, a source-build build fails, only because of that new RID.

@ericstj
Copy link
Author

ericstj commented Feb 17, 2021

I think this is still interesting and important. I'll advocate for this, but I can't wholly sign up to move it forward. I need to get buy in from some others.

The problem has changed slightly since 2017 when I first wrote this up. For one, we no longer rely on runtime.json for describing runtime specific packages. That opens up a very interesting opportunity for solving this problem.

NuGet understands RIDs today and does a cross-product of the TargetFrameworks + RIDs to produce a giant assets file, restoring for each RID. Because it does this, it needs to understand RID compatibility mappings so that it can make the "best asset" selection for each RID-specific target tuple. The separate pass for RIDs was needed specifically to handle RID specific Package->Package dependencies defined in runtime.json. This feature was never documented nor officially supported but was the mechanism used by the runtime in 1.0-2.1 to bring down the platform specific runtime package(s).

The SDK understands RIDs today, since it needs to select the best runtime pack for a given RID for self-contained apps. This wasn't the case in 2017 when I wrote this.

The host understands RIDs today since it needs to calculate the current RID, then it probes a sequence of compatible RIDs as listed in the shared framework deps file, in order to determine the best asset from those listed in the deps file for a framework dependent application.

The interesting thing here, is that starting in netcoreapp3.0 we removed NuGet from needing to understand RIDs for package dependencies. With .NETCore 2.1 going out of support in August, we'll no longer have a need for nuget to understand runtime.json or do RID-specific passes on the package graph. We still need RID-specific applicability evaluation of packages, but that could be done by the SDK in a similar way to how it does this for runtime-packs: NuGet already lists all the runtimeTarget assets, we could move the selection of these to a step in publish. We could even go so far as to completely deprecate the current runtime.json format (which NuGet knows about) and define some new format which only needs to be known by the SDK (and potentially the host).

Here's who I think needs to be involved:

  1. NuGet team for changes to the RuntimeGraph type and potentially the format of runtime.json in order to express the new flexible matching. NuGet needs this so self-contained apps get the right result during publish.
  2. Host team for changes to RID calculation algorithm and potentially deps file format changes: RID fallbacks are listed discretely in shared framework deps files.
  3. SDK team for changes to the deps file. If we decide to eliminate 2-pass restore then also changes to build targets to do asset-selection from runtimeTargets when publishing for a RID.
  4. (potentially) Libraries team for defining a new type / set of types to represent these associations and a data format for describing them.

@dsplaisted
Copy link

Related to how the SDK and NuGet handle runtime identifiers, here's a proposal from the .NET Core 3 development which we never did but could still do and matches some of what you're talking about: dotnet/sdk#10025

Basically NuGet wouldn't do the cross product anymore, the assets file would just have one target for each TargetFramework, which would list the RID-specific assets for that target in runtimeTargets sections.

Even though essentially no one is using the current solution, RID-specific package dependencies are still a valid scenario. It should be possible to redesign them so that they are simpler to author and don't require multiple restore passes. NuGet would still need to have some knowledge of runtime identifiers. We probably don't need to tie this to the rest of the changes proposed here though.

Note that, just because the .NET Core 2.1 runtime is going out of support, doesn't mean we intend to break building projects that target it in the .NET SDK. It's something we could consider, but there's a difference between saying "there are no more security updates for .NET Core 2.1" and "you can't even build projects targeting .NET Core 2.1 with the latest tools".

@tmds
Copy link

tmds commented Feb 18, 2021

We care specifically about the use-case of building .NET using source-build on a system where the rid is not known to .NET.

We want this build to succeed, and the rid to be known to .NET.

For the build to succeed, we need to build using another rid which is known and compatible. I think this may be possible already (to some extent) by using the DOTNET_RUNTIME_ID envvar.

For the rid to be known to .NET, it needs to know its place in the graph.

Example pseudo build commands:

Build on Fedora 35 (rid derived from os-release is fedora.35):

./build.sh --build-rid linux --parent-rids fedora,linux

Build on RHEL 8 (rid derived from os-release is rhel.8):

./build.sh --build-rid linux --parent-rids linux

Build on CentOS 8 (rid derived from os-release is centos.8):

./build.sh --build-rid linux --parent-rids rhel.8,linux

@ericstj
Copy link
Author

ericstj commented Feb 18, 2021

Even though essentially no one is using the current solution, RID-specific package dependencies are still a valid scenario. It should be possible to redesign them so that they are simpler to author and don't require multiple restore passes. NuGet would still need to have some knowledge of runtime identifiers. We probably don't need to tie this to the rest of the changes proposed here though.

RID specific dependencies as a download optimization is still interesting. Now that we have platform-specific TFMs I don't see it as important for this to be something that impacts compile (it doesn't today, and I don't think we should go down that path).

I suspect RID specific dependencies will likely always require multiple restore passes since a RID specific dependency may change the result of a non-RID specific dependency in a parallel subgraph. I suppose NuGet could treat this as a "partial" pass but I suspect from the complexity and cost it's still on par with multiple passes. I agree it should remain a separate issue.

Note that, just because the .NET Core 2.1 runtime is going out of support, doesn't mean we intend to break building projects that target it in the .NET SDK. It's something we could consider, but there's a difference between saying "there are no more security updates for .NET Core 2.1" and "you can't even build projects targeting .NET Core 2.1 with the latest tools".

Can't we use a deprecation path here? SDK still supports side-by-side installation and folks can go and add a global.json to their repos that want to build on out-of-support frameworks? Not sure if that'd work in VS since I don't think NuGet is side-by-side there.

We care specifically about the use-case of building .NET using source-build on a system where the rid is not known to .NET.
We want this build to succeed, and the rid to be known to .NET

@tmds if all you want to do is make source build add a RID on the fly I think you can do that by adding some additional properties as you suggest, and making a change to permit the RID graph generation run during the build (instead of by the developer). Please file a new issue in dotnet/runtime with the scenario and we can make that work.

@tmds
Copy link

tmds commented Feb 19, 2021

@tmds if all you want to do is make source build add a RID on the fly I think you can do that by adding some additional properties as you suggest, and making a change to permit the RID graph generation run during the build (instead of by the developer). Please file a new issue in dotnet/runtime with the scenario and we can make that work.

Yes, I think this is all we want to do. I've created dotnet/runtime#48507.

@ViktorHofer
Copy link

The problem has changed slightly since 2017 when I first wrote this up. For one, we no longer rely on runtime.json for describing runtime specific packages. That opens up a very interesting opportunity for solving this problem.

What about the host packages and potentially others that still use a runtime.json file? I.e. https://www.nuget.org/packages/Microsoft.NETCore.DotNetHostPolicy/6.0.0-preview.1.21102.12.

@ericstj
Copy link
Author

ericstj commented Feb 22, 2021

They shouldn't use it. In general for parts of the product that rely on RID-specific packages they should be plumbed through the SDK's runtime-pack resolution. I suspect this is already the case for the host packages and they never cleaned up the old package.

@ViktorHofer
Copy link

Filed dotnet/runtime#49137 to get the host packages off runtime.json.

I suspect this is already the case for the host packages and they never cleaned up the old package.

There are still places in dotnet/sdk that rely on the runtime.json host meta package. That would also need to be changed.

@crummel
Copy link

crummel commented Mar 19, 2021

I'm working on a proposal for the wider concept of how we're dealing with RIDs. @ericstj , @ViktorHofer , @omajid, @tmds , if there's anyone else who is interested or has an idea of how this should go please contact me.

@dsplaisted
Copy link

@crummel How we deal with RIDs would also impact the .NET SDK side, so keep me in the loop too. Thanks.

@vitek-karas
Copy link

Depending on the design this might have impact on the host (original proposal needed the host to start parsing RIDs for example) - in that case please include me. Thanks.

kzu added a commit to devlooped/chromium that referenced this issue Jun 3, 2022
It is unfortunately not possible to conditionally reference native assets depending on the installed tool's runtime platform. This means we need to assume the dependency will be restored at the project level and look it up via the project.assets.json, rather than the runtime deps.

See
- How a tool project is installed/restored: https://github.com/dotnet/sdk/blob/main/src/Cli/dotnet/ToolPackage/ToolPackageInstaller.cs
- Native files not being automatically added to the runtime deps.json: dotnet/sdk#11373
- How to do that via custom targets, but this wouldn't be possible on the target machine when restoring the temp project created for global tools: https://github.com/ericstj/sample-code/blob/nativeLibSample/addNative/addNative.csproj
- How runtime.json might work but it's going away: dotnet/runtime#11404
- Really going away: dotnet/runtime#49137
- How it's still not solved for .net6: NuGet/Home#5862
kzu added a commit to devlooped/chromium that referenced this issue Jun 3, 2022
It is unfortunately not possible to conditionally reference native assets depending on the installed tool's runtime platform. This means we need to assume the dependency will be restored at the project level and look it up via the project.assets.json, rather than the runtime deps.

See
- How a tool project is installed/restored: https://github.com/dotnet/sdk/blob/main/src/Cli/dotnet/ToolPackage/ToolPackageInstaller.cs
- Native files not being automatically added to the runtime deps.json: dotnet/sdk#11373
- How to do that via custom targets, but this wouldn't be possible on the target machine when restoring the temp project created for global tools: https://github.com/ericstj/sample-code/blob/nativeLibSample/addNative/addNative.csproj
- How runtime.json might work but it's going away: dotnet/runtime#11404
- Really going away: dotnet/runtime#49137
- How it's still not solved for .net6: NuGet/Home#5862
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests