-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking platform dependencies #184
Conversation
|
||
```console | ||
dotnet-deps platform remove debian.10 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenarios that this design is trying to improve are not unique to .NET. How are other platforms similar to .NET solving this? Is there something we can learn from solutions used by others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at Node.js and Python so far and have not come across anything formalized like this.
@omajid - Have you or anyone else at RedHat come across any other platforms that do anything similar to this?
@tianon - In your role maintaining Docker Hub's official images, have there been any platforms out there that have created a standalone, machine-readable description of the platform's package dependencies such that it can be used to maintain the packages installed by the Dockerfile? Or has it always just been the Dockerfile being the source of truth? To give you a very brief summary of what is being proposed here for .NET, there would be a JSON file which formalizes .NET's platform dependencies, including Linux packages. That JSON file could be used to do transformations into or validation of Dockerfiles, as an example. The intention is to have a standalone, independent source of truth from which other assets could be derived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really ambitious -- I'm not aware of anyone trying to standardize this sort of metadata.
If I'm understanding correctly, you're trying to come up with a standard identifier for things like libargon2-dev
in Debian/Ubuntu vs argon2-dev
in Alpine etc?
I think a really big challenge you'll likely run into is that these things aren't always one-to-one mapped -- there are many cases where distributions provide multiple variants (see the -dev
packages in https://packages.debian.org/source/sid/curl for example) or even split different binaries across packages differently. In addition, these things tend to move between packages over time too (like how Debian's btrfs-progs
used to also contain the development headers until the dedicated libbtrfs-dev
package was introduced). Even the "upstream project" often gets split differently in different distros (where some have a single source package and others will end up with multiple source packages representing the same thing).
A different approach to trying to identify/recognize all the explicit packages might be to identify commands and specific header/pkg-config
files necessary, but even that's going to be a bit of guesswork (and if it isn't automated to some extent, will struggle with bitrot), and that list isn't always straightforward to come up with, even for a human (like how difficult it would be to determine in an automated way that a program ends up invoking git
at runtime, or that it uses dlopen
to load a shared library dynamically).
There's also going to be different classes of dependencies -- obvious ones are build vs runtime, but even at runtime there are some dependencies that are more "required" than others (see https://www.debian.org/doc/debian-policy/ch-relationships.html for an example of how Debian handles this with Depends
vs Recommends
vs Suggests
etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you or anyone else at RedHat come across any other platforms that do anything similar to this?
I can try asking around, but I have not seen anything similar to this. Most upstreams are less disciplined than this, I think 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @tianon
If I'm understanding correctly, you're trying to come up with a standard identifier for things like
libargon2-dev
in Debian/Ubuntu vsargon2-dev
in Alpine etc?
Not really a standard identifier. There's no attempt to map 1:1 between distros or anything like that. Package dependencies can be defined for each distro independently. It's really no different than the dependencies that are described in documentation form, it's just that they're defined in a machine-readable form using a common schema.
There's also going to be different classes of dependencies -- obvious ones are build vs runtime
Yes, that's accounted for here by having separate models between them.
but even at runtime there are some dependencies that are more "required" than others (see https://www.debian.org/doc/debian-policy/ch-relationships.html for an example of how Debian handles this with Depends vs Recommends vs Suggests etc).
This is sort of addressed by the dependency usage concept. It's less prescriptive than the depends/recommends/suggests model, allowing for the dependency to be associated with specific dev or app scenarios.
* default: Indicates that the dependency applies to a canonical app scenario (i.e. Hello World). Note that this is specifically about a canonical app and not intended to be a description of required dependencies in the absolute sense. An example of this is libicu. While libicu is not required to run an app if it has been set to use invariant globalization, the default/canonical setting of a .NET app is that invariant globalization is set to false in which case libicu is necessary. This is why the term "default" is used rather than something like "required". | ||
* diagnostics: Indicates the dependency should be used in scenarios where diagnostic tools are being used such as with LTTng-UST. | ||
* httpsys: Indicates the dependency should be used for ASP.NET Core apps that are configured to use the HTTP.sys web server. | ||
* localization: Indicates the dependency should be used for localization/globalization scenarios such as with tzdata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will scale and at the same time, it's not granular enough. I can imagine a model where each NuGet can describe its dependencies as more maintainable and scalable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made an update so that these usages are self-described within the model: 8c52259. This adds a mapping at the root of the model to define all the available dependency usages so that there can still be model validation and a way to provide a description of the value. This will allow teams to add more usages as they need and confine the updates to just the model file.
|
||
### Goals | ||
|
||
* Common schema capable of describing both runtime and toolchain dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do runtime and toolchain mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a definitions section which clarifies these terms: e388195
* Common schema capable of describing both runtime and toolchain dependencies. | ||
* Model that limits repetition to make maintenance easier. | ||
* File format that is machine-readable to allow for automated transformation into other output formats. | ||
* Ability to describe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this apply only to shared framework model or also bundled/self-contained model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the framework-dependent/self-contained deployment models? If so, this applies to both because in either model the platform dependencies are still required. Self-contained deployment does not statically link these dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-contained deployment does not statically link these dependencies.
That's not correct, some of them do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide more details? I'd like to know more about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, Blazor is a self-contained deployment setup with only statically linked dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for clarifying. In that case, the browser
or browser-wasm
RIDs would be just another platform described in this model and not contain these dependencies or whatever would be appropriate.
|
||
### Non-Goals | ||
|
||
* Any dependency that is included in the deployment of the application is outside the scope of this design. NuGet packages are an example of this. A .NET application's dependency on a .NET package, and any assets contained in those packages (managed, native, or otherwise), is explicitly included in the deployment of the application itself. Therefore, the operating environment is not required to be pre-configured with those specific assets; it'll get them naturally through the deployment of the application. However, a NuGet package may have its own platform dependency (e.g. a Linux package) that is not physically contained in the NuGet package; such a platform dependency would be in scope with this design. The concern addressed here is solely focused on what the operating environment must be pre-configured to contain in order to operate on .NET scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a bit word tweaking. You are not excluding NuGet package dependencies in the design as they are the core building block of any app and runtime itself ships many native libraries with native dependencies as OOB NuGets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded this to be more accurate: 1a92592
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO one of the reasons for having this information is to ensure that we are meeting all the constraints of our system. For example, changes to minimal cmake versions need source build considerations. Where in this model do we document why the min versions of all our dependencies are they way they are? Having this will allow maintainers to more effectively understand how to change them/why they are they way they are.
|
||
##### Change Detection | ||
|
||
In order to avoid the reliance upon contributors to recognize when they've changed a dependency, a more automated solution would be preferable. This can be done by defining a GitHub bot that checks for files in PRs containing `NativeMethods` or `Interop` in their name. If such a file is detected, a label is added to the PR alerting the submitter that they should evaluate their changes for changes to the dependencies. While not a fool-proof solution, it should provide coverage for the vast majority of dependencies. Work is still required by the submitter to make the appropriate changes to the platform dependency model but the GitHub bot helps alert them when there is potentially action that is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is sufficient. To help keep dependency tracking live we will need to invest more deeply in dependency identification. Also how would we ensure that toolchain dependencies are being identified automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine something like - run all the tests under ltrace
or similar, and compare all the dlopen's seen to a baseline list. When a new one is seen, it's an indication that someone should check whether a json update is needed, then update the baseline.
It seems to me that isn't part of 'crawl' but maybe 'run'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the aspect of this proposal that will require the most iteration and refinement over time. I don't want to over-invest in this, however. I fully admit that the proposed change detection isn't foolproof. But it's a fairly cheap and non-disruptive solution to get started with. I think as we learn of other areas that require detection, we can work to address those as needed.
I do like the suggestion of @danmoseley for an advanced implementation of detection logic. This could apply to both runtime and toolchain dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mthalman do you believe it's important for this dependency record to be "complete" (in some sense) or is it valuable even if it's not? Eg., if you're using it as a recipe for setting up a container to publish, it needs to be complete. But if you're using it to inform a checker that might help you discover a missing dependency, maybe it does not.
From the discussion, this seems like a hard problem in general, and achieving and maintaining completeness especially through transitive dependencies may not be feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be complete to the extent that we're aware of the dependencies. The model is intended to be used as input to other dependent artifacts like Dockerfiles, as you suggest (see phases 2 & 4 in dotnet/core#5651 for others). I don't see the ability to know and describe the dependencies as being unfeasible. Indeed, it better not be; otherwise, what are we documenting for customers? The change detection aspect is certainly a challenge and I feel should be scoped, at least to begin with, to be able to detect most dependency changes rather than all.
My view is that we treat the dependency model as a work in progress for the lifetime of the .NET version it's associated with. We do our best effort to define the model and if we discover something later on several releases later, we can just go back and edit it to accurately reflect what is known. The benefit of having downstream assets consuming this model data is that it acts a form of validation. If the Dockerfiles are synced with the dependency model and something doesn't work in the container because it's missing a dependency, then we know the model isn't correct. I think we just keep iterating on it until it's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, somehow I had thought you were trying to describe the closure, ie., including transitive dependencies. Yes, just describing direct dependencies is surely doable.
+1 Raising minimum tool versions often requires weighting trade-offs between pain caused by staying with the old version and pain caused by requiring new minimum version. For example, we have different minimum cmake versions for different build configurations today because we found that aggressively raising the minimum version accross the board is going to incur too much pain in aggregate. We should be able to learn how situations like this are handled in other ecosystems. For example, here is a recent discussion on raising minimum cmake version in LLVM: https://lists.llvm.org/pipermail/llvm-dev/2020-April/subject.html#140578 |
I'm not aware of any platform dependencies that ASP.NET Core has here, with the possible exception of the Http.sys/IIS components in Windows. |
Is this proposal related to workload manifests (#120)? For instance, how to track platform dependencies for a mobile workload? |
* Model that limits repetition to make maintenance easier. | ||
* File format that is machine-readable to allow for automated transformation into other output formats. | ||
* Ability to describe: | ||
* Dependencies for multiple platform types (Linux, Windows, MacOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about more specifics in that platform?
For example, .NET Core 2.0 only supported using OpenSSL 1.0.x. .NET Core 2.1 got additional fixes that let it also use OpenSSL 1.1.y. It would be great it we could express things like that in this manifest: needs-one-of (OpenSSL 1.0, OpenSSL 1.1)
.
I am also wondering what happens across builds. A portable build of .NET 5 SDK that bundles dependencies will have fewer build and runtime requirements than a non-portable build of .NET 5 SDK. The non-portable build will, for example, need the matching version of OpenSSL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, .NET Core 2.0 only supported using OpenSSL 1.0.x. .NET Core 2.1 got additional fixes that let it also use OpenSSL 1.1.y. It would be great it we could express things like that in this manifest: needs-one-of (OpenSSL 1.0, OpenSSL 1.1).
I've redesigned the schema to support this. Dependency names can now be described as an expression with logical OR operators. Take a look: b7833d0
I am also wondering what happens across builds. A portable build of .NET 5 SDK that bundles dependencies will have fewer build and runtime requirements than a non-portable build of .NET 5 SDK. The non-portable build will, for example, need the matching version of OpenSSL.
Since that would have been done by a third party and customized to suit their desired configuration, it's not really possible to describe that here, nor is it really relevant. The intent is to describe the dependencies of the assets distributed by Microsoft. And as I mention here, the model description has no impact on the functionality of .NET that has been built/distributed external of Microsoft.
// A shared framework (e.g. Microsoft.NETCore.App, Microsoft.AspNetCore.App) | ||
SharedFramework, | ||
|
||
// A NuGet package (e.g. System.Drawing.Common) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are calling out System.Drawing.Common here, I can't help but wonder what that will look like. AFAIAA, System.Drawing.Common
needs libgdiplus
, but libgdiplus
is neither a build-time nor a runtime dependency, unless you are explicitly using System.Drawing.Common
at runtime. Does it fit in with the default
type or does this need something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you are explicitly using
System.Drawing.Common
at runtime
That's the key. The dependency would specifically be tied to the System.Drawing.Common
NuGet package. So if you're referencing System.Drawing.Common
at runtime, then you'll require libgdiplus
for the canonical scenario. In that case, the type would be default
.
Here's a snippet of what that would look like:
{
"platforms":
{
"rid": "debian",
"components": [
{
"name": "System.Drawing.Common",
"type": "NuGetPackage",
"platformDependencies": [
{
"name": "libgdiplus",
"dependencyType": "LinuxPackage",
"usage": "default"
}
]
}
]
}
}
|
||
#### New Platform Support | ||
|
||
When support for a new platform is added to the product, the platform dependency model of future releases needs to be updated include this platform and all its supported versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any impact on unsupported platforms? For example, we are trying to get source-build to work with Arch Linux. If those folks get .NET building on Arch, will they be affected if we start tracking dependencies here? Will anything suddenly start breaking for them? And will it be okay if we don't want to track Arch Linux here if we dont want to "officially support" it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, nothing would break. This is purely a representation of what dependencies exist; it doesn't dictate what those dependencies must be. The flow of information would always be that the maintainers of the .NET source define what dependencies they want to have, and then the dependency model gets updated to reflect that. There would be no use of the dependency model at runtime or build time (i.e. it has no impact on the running of a .NET application or of the building of .NET source). Its purpose is to add rigor and correctness to the maintenance of other assets that describe what the dependencies are (such as documentation).
|
||
Each component describes the dependencies it has for the platform it is contained within. A dependency is identified by its name and type (Linux distro package, DLL). | ||
|
||
A key piece of metadata that gives context to the dependency is the "usage" field. This field is set to one of the well-known values that describes the scenario in which this dependency applies. Here are some examples of usage values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about build vs runtime dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an update to address this, including an example: 8b0ced7. Hopefully that makes sense.
I identified toolchain dependencies as a problem to be solved but didn't actually define the model to account for such dependencies. These updates include a git repository as a component type. This allows toolchain dependencies to be described for each .NET git repo. An example model is included. I also included a paragraph on where toolchain dependency models would be stored. I think it makes more sense to store them within each repo rather than combining everything in the dotnet/core repo like the runtime models.
@jkotas, @jeffschwMSFT - I'm going to throw out some options here. Let me know what you think is reasonable or feel free to suggest alternatives. My assumption here is that any sort of description will need to be free-form text, not fitting to any schema. 1. Include the description within the model JSON file as general purpose "dependency notes"Pros:
Cons:
2. Define description in a GitHub issue with the URL referenced by the dependency in the model.Pros:
Cons:
3. Define description in a Markdown file in the relevant GitHub repo with the commit URL referenced by the dependency in the model.Pros:
Cons:
|
My preference would be combination of 1 and 2. I think it is useful to have space for a short free-form comment in the model file. If there is too much to say, this comment can include links to github issues, documentation, etc. |
@knuxbbs -
It's unrelated to #120. Workloads are a set of references to SDK packs and workload manifests are descriptions of those workloads. Workarounds are not concerned with, nor do they provide, the set of platform dependencies necessary to run in the target environment.
I'll rephrase this as "how to track platform dependencies for a mobile platform". The schema supports this by the use of a RID to describe the platform. In the case of mobile platforms, there are RIDs for Android and iOS, for example. So it would just be another platform described in the model. |
The primary motivation for this is to have a file format that supports comments. This satisfies the need to be able to include notes about the dependencies such as reasons for the minimum version. See dotnet#184 (comment)
This reverts commit f55c61e. I decided to revert this back to JSON for the simple reason of consistency with the other file formats being used throughout .NET engineering (e.g. releases.json, runtime.json for RIDs, etc). The main motivation for YAML was to have native support for comments. JSON can still use comments but is technically unsupported in the standard. This limits the general accessibility of the file but I would say that the vast majority of consumers would use the .NET library for interacting with the model due to the logic needed to interpret it.
All current feedback has been addressed. This is ready for further review. |
Please provide any remaining feedback by this Friday. I'd like to have this merged next week. |
This is the proposed design for dotnet/core#5646 within the dotnet/core#5651 epic.
This is a cross-cutting proposal that impacts all product teams. I've tried to include representatives across the board as reviewers but feel free to include others you feel are missing.