Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use package id from cyclonedx when provided #1872

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jneate
Copy link
Contributor

@jneate jneate commented Jun 10, 2023

This is further progress on anchore/grype#1265

Currently when cyclone-dx has a package-id specified it's ignored and syft generates a new ID, this change ensures the cyclone-dx ID is re-used.

The fix was mainly for grype but it also fixes syft convert when switching between cyclonedx -> syft-json, the package-id is re-used in the new output.

Left the TODO in place as it was there when I started.

Signed-off-by: James Neate <jamesmneate@gmail.com>
@jneate
Copy link
Contributor Author

jneate commented Jun 10, 2023

The alternate is that the entire BOM-Ref field becomes the ID instead of the package-id suffix? Happy to make said change if needed.

Comment on lines 92 to -97
syftID := extractSyftPacakgeID(component.BOMRef)
if syftID != "" {
idMap[syftID] = p
p.OverrideID(artifact.ID(syftID))
} else {
// TODO there must be a better way than needing to call this manually:
p.SetID()
}
// TODO there must be a better way than needing to call this manually:
p.SetID()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way this p.OverrideID gets called is if the component.BOMRef is a PURL that contains the package-id qualifier (which is probably only set when Syft generated the SBOM in the first place, using a PURL as the BOMRef). Additionally, this syftID might not even match the ID generated by the current version of Syft, so the way it's being used is probably wrong.

It was originally a conscious decision to re-generate the IDs for components (by calling the p.SetID() instead of reusing whatever ID is provided by the format in order to get reasonably consistent output. For example: if we used the BOMRef as the Syft package ID, then re-encoded CycloneDX, we'd have something like: pkg:thing/name@version?package-id=pkg:thing/name@version?package-id=2a7443b435345b2 (ignore the lack of URL encoding). Obviously we can handle this during encoding, for example, by comparing the ID to what SetID would set.

A few things here:

  • we should probably get rid of extractSyftPacakgeID. I don't think this is providing any value -- at best it's extracting the same ID that SetID() would generate, but more likely it is setting some different ID that is not useful in any way
  • we should adjust output formats to be aware when an ID is explicitly set vs generated based on the data (this would probably look like 1 or 2 utility methods on the package -- maybe a GenerateID that could be used for comparison and/or an IsIDGenerated())
    • We would also need to update the CycloneDX encoder and possibly other encoders to handle cases with explicitly set IDs

If we want to support reading IDs from formats instead of always generating them based on the data (I'm not sure we always do -- this seems like a configurable option), I think your observation that we should be using the BOMRef is accurate, maybe something like this:

		if component.BOMRef != "" {
			p.OverrideID(artifact.ID(component.BOMRef))
		} else {
			p.SetID()
		}
		idMap[string(p.ID())] = p

PackageURL: "pkg:maven/org.springframework.boot/spring-boot-starter-test",
}

packageID := extractSyftPacakgeID(packageWithId.BOMRef)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo here extractSyftPacakgeID

packageID := extractSyftPacakgeID(packageWithId.BOMRef)

packageWithoutId := cyclonedx.Component{
BOMRef: "pkg:maven/org.springframework.boot/spring-boot-starter-webflux",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why don't we reuse the BomRef value ? I assume sboms use the BomRef as an unique reference, therefore I am wondering why this initial unique ID for this SBOM needs to be adapted as done on sbom.Artifacts.Packages.PackagesByName(packageWithoutId.Name) or sbom.Artifacts.Packages.Package(artifact.ID(packageID))).

@kzantow
Copy link
Contributor

kzantow commented Jun 15, 2023

Hi @jneate -- I had a discussion with the team about this and there are some definite issues with directly using IDs from SBOMs as Syft internal object IDs. Although this PR is specific to CycloneDX, if we were to do this with SPDX, without more intelligent behavior, we would output IDs that increasingly grow in size.

A directly related thing we are interested in is maintaining additional information from SBOM formats that Syft doesn't support -- for example VEX information in CycloneDX currently will just be dropped if we simply converted CycloneDX -> CycloneDX.

I think we'd like to try to do a couple things on this front:
a) include this unknown information as some sort of metadata that can be output at least by the same format
b) track IDs in this metadata somewhere in a specific data location such that we could reuse them when outputting the same format

Unfortunately, for this PR, we probably don't want to accept as-is where we're just setting the Syft ID directly, but instead with some sort of additional field like FormatID. If it would help move things forward, we could chat about this more synchronously on our community Slack or even jump on a Zoom to discuss (or, of course one of our community meetings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants