-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
externalReferences type for "source" packages #98
Comments
Distribution is intentionally not specific to binary, source, hybrid, or other. Multiple distributions can be specified for a component. Take Maven for example. A single component may have multiple artifacts that are part of the distribution. In this case, there's artifacts for the:
It's not the intent to describe every possible artifact type for every ecosystem. I think if we start separating out the types of distributions, we'll create confusion as not all ecosystems are black and white (source and binary). For ecosystems where the component is the source (e.g. Perl), there would be confusion about which type to use as both In the Python example provided, it's easy enough to identify which distribution is the wheel and which one is not. In the Maven example, Maven has naming conventions so simple pattern matching against the distributions will tell you what they are. Other ecosystems may not be as predictable. @coderpatros, @DarthHater what are your thoughts? |
Ah, I see, so for my example above I should just use this today: "name": "chardet",
"version": "4.0.0",
"externalReferences": [
{
"type": "distribution",
"url": "https://files.pythonhosted.org/packages/19/c7/fa589626997dd07bd87d9269342ccb74b1720384a4d739a1872bd84fbe68/chardet-4.0.0-py2.py3-none-any.whl",
"comment": "PyPI wheel file"
},
{
"type": "distribution",
"url": "https://files.pythonhosted.org/packages/ee/2d/9cdc2b527e127b4c9db64b86647d567985940ac3698eeabc7ffaccb4ea61/chardet-4.0.0.tar.gz",
"comment": "PyPI source archive"
},
{
"type": "vcs",
"url": "https://github.com/chardet/chardet",
"comment": "upstream repository"
}
] And it would be the task of the application to either do pattern matching in the URL to differentiate between package types or use other means like application specific |
@gernot-h is this still an open issue? |
Thanks for asking! Yes, definitely. Within Siemens AG, we created a kind of downstream specification extending and narrowing down CycloneDX (parts of it are public in https://github.com/siemens/cyclonedx-property-taxonomy). As a workaround, we specify defined comment fields: We would highly appreciate if there would be some interoperable upstream solution for it, so BOM scanners can be extended to provide this information over time. We btw also had a discussion whether a 2nd purl entry for stating source references might be needed as source urls are never unambiguous, but for now, we don't think it's a good idea. |
That Why would it be necessary to document the source of a component, if it was not distributed from source in the first place? I still do not understand. |
A
Determining this deep link to the correct sources can require specific knowledge of the source ecosystem. For example, it may be necessary to understand how Maven Central handles source archives, or what a Golang Proxy is. Currently, it can do so in an "externalReferences": [
{
"type": "distribution",
"url": "https://github.com/apache/commons-lang/archive/refs/tags/rel/commons-lang-3.12.0.zip",
"comment": "source archive (download location)"
}
] While such an entry is correct, it is very difficult to consume. There can easily be multiple A |
Looks like this topic was already picked up as proposed enhancement, but let me still try to answer the question.
For our team, this is a compliance as well as maintenance topic. Think about providing a Linux firmware image with several hundred packages based on a certain Linux distribution. Or think about providing a vendored NPM/Ruby... bundle as part of an application download or product. Now you need to not only provide a "binary" SBOM for your customer, but you also need to check the licenses of all the contained components internally. And you might want to also mirror a snapshot of the used source packages internally in case you need to patch your product/app in 5 years from now. For all these topics, we need our BOMs to describe the sources which were used by a 3rd party to provide the binary packages we used. (For well-designed eco systems like Python or Debian, the 3rd party provides this information, but all in different ways you want to import in a common format to a central place.) And we don't want to generate several hundred derived BOMs to describe how each of the integrated components was built. I'm no security guy, but according to anchore/syft#1700 (comment), having the source information for a given "binary image BOM" is also valuable in vulnerability matching. That's why they invented their own proprietry extension to include this information adding custom purl qualifiers like we did specifying Siemens-wide CycloneDX comment strings used for source links. We think this is relevant for many distribution use cases and we should have a common solution to express this information. |
Thank you very much for your insights. Distribution not only have a URL, but have other attributes, too:
There might be a lot of attributes related to a distribution, that might come in handy being documented. Just some examples:
|
Don't overthink it though. I would only need one extra item in the list of possible types. That list was already extended from 16 values in 1.4 to 39 values in 1.5. Let's make it 40 values in 1.6 by adding:
I don't need to know any additional details. (Of course, then I won't be able to actually build the component given only the SBOM, but frankly, that will be a problem no matter how much metadata you encode into the SBOM.) |
I'm with @tsjensen on this. The latest spec revision already gives people plenty of options to choose from for specialized types of references. But the one that we are still missing for our needs is the reference to source code. For us it is critical to not only have the information which specific distribution of a component is in use in an application, but also to reference the source it was generated from. This provenance information allows us to conduct additional analysis. For the scope of this analysis we do not need to have all the information to reproducibly build an artifact from source, a reference to the source itself is sufficient. To provide a simple example: |
Signed-off-by: Thomas Jensen <tsjensen@users.noreply.github.com>
Signed-off-by: Thomas Jensen <tsjensen@users.noreply.github.com>
we discussed this topic in our last core working group meeting. |
…X#98 Signed-off-by: Thomas Jensen <tsjensen@users.noreply.github.com>
…X#98 Signed-off-by: Thomas Jensen <tsjensen@users.noreply.github.com>
fixed via #269 |
## Added * Core enhancement: Attestation ([#192](#192) via [#348](#348)) * Core enhancement: Cryptography Bill of Materials — CBOM ([#171](#171), [#291](#291) via [#347](#347)) * Feature to express the URL to source distribution ([#98](#98) via [#269](#269)) * Feature to express the URL to RFC 9116 compliant documents ([#380](#380) via [#381](#381)) * Feature to express tags/keywords for services and components (via [#383](#383)) * Feature to express details for component authors ([#335](#335) via [#379](#379)) * Feature to express details for component and BOM manufacturer ([#346](#346) via [#379](#379)) * Feature to express communicate concluded values from observed evidences ([#411](#411) via [#412](#412)) * Features to express license acknowledgement ([#407](#407) via [#408](#408)) * Feature to express environmental consideration information for model cards ([#396](#396) via [#395](#395)) * Feature to express the address of organizational entities (via [#395](#395)) * Feature to express additional component identifiers: Universal Bill Of Receipts Identifier and Software Heritage persistent IDs ([#413](#413) via [#414](#414)) ## Fixed * Allow multiple evidence identities by XML/JSON schema ([#272](#272) via [#359](#359)) This was already correct via ProtoBuff schema. * Prevent empty `license` entities by XML schema ([#288](#288) via [#292](#292)) This was already correct in JSON/ProtoBuff schema. * Prevent empty or malformed `property` entities by JSON schema ([#371](#371) via [#375](#375)) This was already correct in XML/ProtoBuff schema. * Allow multiple `licenses` in `Metadata` by ProtoBuff schema ([#264](#264) via [#401](#401)) This was already correct in XML/JSON schema. ## Changed * Allow arbitrary `$schema` values by JSON schema ([#402](#402) via [#403](#403)) * Increased max length of `versionRange` (via [`3e01ce6`](3e01ce6)) * Harmonized length of `version` (via [#417](#417)) ## Deprecated * Data model "Component"'s field `author` was deprecated. (via [#379](#379)) Use field `authors` or field `manufacturer` instead. * Data model "Metadata"'s field `manufacture` was deprecated. ([#346](#346) via [#379](#379)) Use "Metadata"'s field `component`'s field `manufacturer` instead. - for XML: `/bom/metadata/component/manufacturer` - for JSON: `$.metadata.component.manufacturer` - for ProtoBuf: `Bom:metadata.component.manufacturer` ## Documentation * Centralize version and version-range (via [#322](#322)) * Streamlined SPDX expression related descriptions (via [#327](#327)) * Enhanced descriptions of `bom-ref`/`refType` ([#336](#336) via [#344](#344)) * Enhanced readability of enum documentation in JSON schema ([#361](#361) via [#362](#362)) * Fixed typo "compliment" -> "complement" (via [#369](#369)) * Added documentation for enum "ComponentScope"'s values in JSON schema ([#293](#293) via [`d92e58e`](d92e58e)) Texts were a taken from the existing ones in XML/ProtoBuff schema. * Added documentation for enum "TaskType"'s values ([#245](#245) via [#377](#377)) * Improve documentation for data model "Metadata"'s field `licenses` ([#273](#273) via [#378](#378)) * Added documentation for enum "MachineLearningApproachType"'s values ([#351](#351) via [#416](#416)) * Rephrased some texts here and there. ## Test data * Added test data for newly added use cases * Added quality assurance for our ProtoBuf schemas ([#384](#384) via [#385](#385))
Sorry if I overlooked something obvious, but I miss a way to specify a
source
archive url for a component, as logical counterpart to thedistribution
type.Many ecosystems have the concept of a source and a somehow derived package. In Python's PyPI you have a "wheel" and a "source" package (check https://pypi.org/project/chardet/#files), for Linux packages there are binary and corresponding source packages (check https://packages.debian.org/buster/libgcc1) etc.
Deriving the correct "source" package for a component isn't always straight-forward, but important for many use-cases (for example for license clearing, for mapping source-level sec advisories to binary components etc.). So it would be very helpful to store them in a CycloneDX BOM in a canonical way. Therefore I suggest to add a
source
type for externalReferences.Note that this is in most cases not equal to the "vcs" type (which is often some kind of upstream project) because many repositories provide an own
source
archive exactly reflecting what was used when building their "binary" packages.Example:
The text was updated successfully, but these errors were encountered: