Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify group ID and artifact ID from maven central when pom is missing #3127

Open
tetzla opened this issue Aug 15, 2024 · 5 comments
Open
Labels
enhancement New feature or request online Requires access to online data

Comments

@tetzla
Copy link

tetzla commented Aug 15, 2024

What happened:
The component dom4j was relocated with version 2.0.0 from dom4j to org.dom4j:

Syft generates SBOMs with swapped group ids.

SBOM

Subsequent tools processing the SBOM have problems identifying the components correctly.

What you expected to happen:
The group ids should to be corrected.

Steps to reproduce the issue:
Create an SBOM for a bundle including dom4j 1.6.1 and dom4j 2.1.3.

Anything else we need to know?:
-

Environment:

  • Output of syft version: 1.10.0
  • OS (e.g: cat /etc/os-release or similar): Debian GNU/Linux 12 (bookworm)
@tetzla tetzla added the bug Something isn't working label Aug 15, 2024
@tetzla tetzla changed the title Handling of Relocated Components Incorrect Goup IDs for dom4j Aug 15, 2024
@tetzla tetzla changed the title Incorrect Goup IDs for dom4j dom4j. Incorrect Goup IDs (Relocation) Aug 15, 2024
@tetzla tetzla changed the title dom4j. Incorrect Goup IDs (Relocation) dom4j: Incorrect Goup IDs (Relocation) Aug 15, 2024
@douglasclarke
Copy link

I was looking at an issue with dom4j as well this week.

I believe the issue here is that there is no identifying metadata in MANIFEST.MF and there is no packaged pom metadata either. I believe the cataloger in this case just uses the base jar name for both the group and the artifact.

Without very specific dom4j rules in the cataloger the only reliable way to identify the specific jar might be with checksums and I am unclear how that would fit in the syft cataloger approach.

@kzantow
Copy link
Contributor

kzantow commented Sep 4, 2024

I wonder if there was a way to identify these by classpaths or some sort of classpath signature.

We have also discussed having some sort of data feed for Syft in the past. I think it's a great idea -- we have a number of capabilities in Syft now that use the network to resolve information, like Maven poms. It seems like a logical leap to be able to provide Syft with some curated data, such as file hashes for identification of certain artifacts when there is little else to go by.

Having recently gone through a bit of instability with the Grype database downloads, there are some aspects we would need to give some thought to: how do we provide this information reliably to end users if it ends up being something we implement? Could we have a data set small enough to download a database every day? Probably not -- some experiments have been done that seem to indicate Maven central alone would end up being well over 1 GB compressed by itself and we would need a lot more for things like binary executables (I'd be interested to run Syft on every JAR in maven central and only include entries for ones that Syft misidentifies, I'm not sure anyone ran this experiment). We have managed to keep the Grype database reasonably small: under 200 MB, but don't think we would be able to have a similarly small data set for Syft, which means we would have to provide some "API". I think we could probably just have some static files using a well-known URL scheme that are easily cached by a CDN, and very small and probably keep data transfer to a minimum. I guess, the point here being that given the popularity of our tools, we can't just drop some files somewhere and expect them to solve everything without some planning at this point.

@kzantow
Copy link
Contributor

kzantow commented Oct 14, 2024

Does anyone know if there are some specific public images we could use to reproduce this behavior?

@wagoodman wagoodman added the online Requires access to online data label Oct 16, 2024
@wagoodman
Copy link
Contributor

wagoodman commented Oct 16, 2024

Given both jars:

.
├── dom4j-1.6.1.jar
└── dom4j-2.1.3.jar

running syft against this directory yields:

{
  "id": "57073a041ff5db91",
  "name": "dom4j",
  "version": "1.6.1",
  "type": "java-archive",
  "foundBy": "java-archive-cataloger",
  "locations": [
    {
      "path": "/dom4j-1.6.1.jar",
      "accessPath": "/dom4j-1.6.1.jar",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "java",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:metastuff-ltd-:dom4j:1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    },
    {
      "cpe": "cpe:2.3:a:metastuff_ltd_:dom4j:1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    },
    {
      "cpe": "cpe:2.3:a:org.dom4j:dom4j:1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    },
    {
      "cpe": "cpe:2.3:a:dom4j:dom4j:1.6.1:*:*:*:*:*:*:*",
      "source": "syft-generated"
    }
  ],
  "purl": "pkg:maven/org.dom4j/dom4j@1.6.1",
  "metadataType": "java-archive",
  "metadata": {
    "virtualPath": "/dom4j-1.6.1.jar",
    "manifest": {
      "main": [
        {
          "key": "Manifest-Version",
          "value": "1.0"
        },
        {
          "key": "Ant-Version",
          "value": "Apache Ant 1.5.3"
        },
        {
          "key": "Created-By",
          "value": "Apache Maven"
        },
        {
          "key": "Built-By",
          "value": "Maarten"
        },
        {
          "key": "Package",
          "value": "org.dom4j"
        },
        {
          "key": "Build-Jdk",
          "value": "1.4.2_02"
        },
        {
          "key": "Extension-Name",
          "value": "dom4j"
        },
        {
          "key": "Specification-Title",
          "value": "dom4j : XML framework for Java"
        },
        {
          "key": "Specification-Vendor",
          "value": "MetaStuff Ltd."
        },
        {
          "key": "Implementation-Title",
          "value": "org.dom4j"
        },
        {
          "key": "Implementation-Vendor",
          "value": "MetaStuff Ltd."
        },
        {
          "key": "Implementation-Version",
          "value": "1.6.1"
        }
      ]
    },
    "digest": [
      {
        "algorithm": "sha1",
        "value": "5d3ccc056b6f056dbf0dddfdf43894b9065a8f94"
      }
    ]
  }
}
{
  "id": "c8663631d4b90329",
  "name": "dom4j",
  "version": "2.1.3",
  "type": "java-archive",
  "foundBy": "java-archive-cataloger",
  "locations": [
    {
      "path": "/dom4j-2.1.3.jar",
      "accessPath": "/dom4j-2.1.3.jar",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "java",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:dom4j:dom4j:2.1.3:*:*:*:*:*:*:*",
      "source": "syft-generated"
    }
  ],
  "purl": "pkg:maven/dom4j/dom4j@2.1.3",
  "metadataType": "java-archive",
  "metadata": {
    "virtualPath": "/dom4j-2.1.3.jar",
    "manifest": {
      "main": [
        {
          "key": "Manifest-Version",
          "value": "1.0"
        }
      ]
    },
    "digest": [
      {
        "algorithm": "sha1",
        "value": "a75914155a9f5808963170ec20653668a2ffd2fd"
      }
    ]
  }
}

What is looks like is happening is:

  • dom4j-1.6.1.jar is missing the pom.xml, and the MANIFEST.mf lists no explicit group ID, but we are inferring this from org.dom4j from the Implementation-Title field.
  • dom4j-2.1.3.jar is also missing the pom.xml, and the MANIFEST.mf lists no explicit group ID and no additional fields to really infer a good group ID from (thus we extract this from the filename

From syft's perspective, which is to gather this information without an online lookup, this information is as accurate as it could be.

That being said, we've recently added online enrichment capabilities, and a search of the sha1 hash against maven could be one that we add (and is infact one of the example in #1115).

@wagoodman
Copy link
Contributor

I'm going to repurpose this issue to really be about enhancing syft to be able to reach out to maven central with a sha1 digest to clarify any missing group ID and artifact ID.

@wagoodman wagoodman added enhancement New feature or request and removed bug Something isn't working labels Oct 16, 2024
@wagoodman wagoodman changed the title dom4j: Incorrect Goup IDs (Relocation) Clarify group ID and artifact ID from maven central when pom is missing Oct 16, 2024
@wagoodman wagoodman moved this to Ready in OSS Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request online Requires access to online data
Projects
Status: Ready
Development

No branches or pull requests

4 participants