Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improved Python binary detection #1648

Merged

Conversation

kzantow
Copy link
Contributor

@kzantow kzantow commented Mar 3, 2023

Originally, when Python binary detection was added, it checked 3 sources:

  • python* binaries directly
  • libpython* shared libraries
  • patchlevel.h source file

The reason for the latter 2 - especially the last one, is that some Python binaries do not actually contain version information but instead load a libpython* shared library, which contains this.

This has caused some confusion for users and did not necessarily prove to be the most accurate. This PR changes the behavior such that patchlevel.h is never looked for, as there is no guarantee this would be applicable to the particular python binary file found. Additionally, when inspecting Python binaries, the behavior is now:

  1. look for version information in the file and use this if found
  2. look for the corresponding libpython* shared library in the binary dynamic library list, and extract version information from that file

With this change, Syft does still look for libpython* files independently, and these may or may not show up as the primary location for a package.

The result is a few different cases:

  • A Python binary is found, with version information and a libpython shared library is found. If the versions are the same, both locations will be merged to the same package. If the versions are different, there will be 2 distinct packages with the corresponding location information.

  • A Python binary is found, without version information and a libpython shared library is found. If the libpython library is referenced from the Python binary, the primary location is the Python binary, and secondary location information includes the libpython library.

  • No Python binary is found but a libpython shared library is found. This will continue to surface Python findings, with the primary location of the libpython library.

The reason for the last described case is that there are situations a container may have libpython without a binary -- for example an Apache HTTP Server image with a Python module. In this case, it is important to surface the fact that there may be Python execution with library and version found.

Fixes: #1643
Also fixes: #1646

Signed-off-by: Keith Zantow <kzantow@gmail.com>
@kzantow kzantow marked this pull request as draft March 3, 2023 19:38
…chers

Signed-off-by: Keith Zantow <kzantow@gmail.com>
@kzantow kzantow marked this pull request as ready for review March 3, 2023 20:23
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
@kzantow kzantow requested a review from a team March 7, 2023 15:14
Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only blocking comment is the note about mode as an indicator for parsing, other than that 💯 !

Signed-off-by: Keith Zantow <kzantow@gmail.com>
@noqcks
Copy link
Contributor

noqcks commented Mar 11, 2023

we lost some information on python binaries after this PR:

syft v0.74.0

➜  ~ syft -q python:3.4 | grep binary
python                        2.7.13                          binary
python                        3.4.10                          binary
python                        3.5.3                           binary

syft v0.74.1

➜  ~ syft -q python:3.4 | grep binary
python                        2.7.13                          binary
python                        35                              binary

@kzantow
Copy link
Contributor Author

kzantow commented Mar 11, 2023

@noqcks could you give the json output for those packages between the versions?

@noqcks
Copy link
Contributor

noqcks commented Mar 11, 2023

@kzantow
Copy link
Contributor Author

kzantow commented Mar 13, 2023

Thanks for this report @noqcks!

This has led me to uncover a couple issues with the python matching and I've fixed them here: #1667

spiffcs added a commit to deitch/syft that referenced this pull request Mar 21, 2023
* main: (47 commits)
  Deprecate config.yaml as valid config source; Add unit regression for correct config paths (anchore#1640)
  chore: Update syft bootstrap tools to latest versions. (anchore#1682)
  Update documentation: (anchore#1680)
  chore: Update Stereoscope to 7928713c391e20abaede6a029f4ce37b628a4c8b (anchore#1681)
  fix: reduce logging for bad dpkg lines (anchore#1675)
  fix ruby classifier (anchore#1678)
  feat: add shared dir for easier cleanup (anchore#1676)
  chore(deps): bump github.com/google/go-containerregistry (anchore#1672)
  chore(deps): bump actions/setup-go from 3 to 4 (anchore#1671)
  fix: move defer after error to protect panic case (anchore#1670)
  feat: add argocd, helm, kustomize and kubectl binary classifiers (anchore#1663)
  defer closing file (anchore#1668)
  fix: remove author contributing to javascript CPEs (anchore#1669)
  fix: more python matching support (anchore#1667)
  Update syft bootstrap tools to latest versions. (anchore#1666)
  feat: add ruby classifier (anchore#1665)
  Update syft bootstrap tools to latest versions. (anchore#1658)
  fix: improved Python binary detection (anchore#1648)
  fix: suppress some known incorrect vendor candidates for npm CPEs (anchore#1659)
  fix: sanitize SPDX LicenseRefs (anchore#1657)
  ...

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update haproxy binary matcher Improve Python binary scanning
3 participants