Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scanning unpacked container filesystems #1485

Merged
merged 3 commits into from
Jan 30, 2023

Conversation

jedevc
Copy link
Contributor

@jedevc jedevc commented Jan 19, 2023

🐛 Fixes #1359 🎉

This PR is an attempt to explore how to modify the existing directory resolver to allow for resolving symlinks in an unpacked container filesystem. To do this, we add a new optional base parameter for the directory resolver that resolves all symlinks relative to this base. There are two intended use cases:

  • base = "/". The previous behavior, symlinks are resolved relative to the root filesystem.
  • base = path. Symlinks are resolved relative to the target filesystem, allowing correct behavior when scanning unpacked container filesystems on disk.

Currently, this only modifies the API surface for the source, which would probably be sufficient for my use case in buildkit-syft-scanner.

I've marked this as in-draft since I've only done some preliminary testing, but the following things probably need working out before this could be ready:

  • Extend this to the CLI with a root-dir input type, or similar? Not quite sure on naming, I don't have any particular preference here.
    • Update: planned for follow-up
  • Add tests for the new directory resolver behavior. The logic seems sound to me, but I may have missed something - thankfully, the symlink resolution seems isolated to a couple of places.
    • Update: tests added
  • I've temporarily replaced filepath.EvalSymlinks with cfs.RootPath from containerd/continuity. The logic of these functions is fairly similar, however, the containerd equivalent allows bounding the result to a specified directory.
    This logic is only needed for 1 of the symlink resolution components - the other is much simpler, since syft does the heavy lifting. I'm not sure how the maintainers feel about adding this as a new dependency? Maybe there's a simpler solution I'm not seeing 👀
    Update: I can remove this dependency, since we can avoid this second instance of doing symlink resolution - we can just use the file tree that we've already indexed over, which should be more efficient anyways. I think that this is a valid use? It seems to work in the cases I tried it on.

cc @wagoodman @kzantow

@kzantow
Copy link
Contributor

kzantow commented Jan 19, 2023

@jedevc it looks like you might just need to update a test (the ID is calculated based on the fields in the struct and one was added): https://github.com/anchore/syft/actions/runs/3958281654/jobs/6779661340#step:7:165

We can use the already existing file tree to peform symlink resolution
for FilesByPath, instead of traversing the symlinks again.

This moves all of the symlink logic into the indexing code, and then we
can rely on syft's resolution algorithm over the index in this part of
the codebase.

Signed-off-by: Justin Chadwell <me@jedevc.com>
The new base parameter is an optional parameter for the directory
resolver that resolves all symlinks relative to this root. There are two
intended use cases:

- base = "/". The previous behavior, symlinks are resolved relative to
the root filesystem.
- base = path. Symlinks are resolved relative to the target filesystem,
allowing correct behavior when scanning unpacked container filesystems
on disk.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc jedevc marked this pull request as ready for review January 20, 2023 14:58
Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement 👍 this additionally further leverages the already built indexes which helps some upcoming PRs regarding performance boosts

@wagoodman wagoodman merged commit b81c980 into anchore:main Jan 30, 2023
@cburgess
Copy link

cburgess commented Mar 2, 2023

Would love to the see this feature make it to the CLI. We have a use case for wanting to run syft on a root filesystem and have it work just like syft does against an existing docker image today.

@jedevc
Copy link
Contributor Author

jedevc commented Mar 13, 2023

I don't currently have the bandwidth to go and add the feature, but it should be a fairly simple plumbing exercise now that the underlying code is there, so happy for someone else to pick that up 🎉

GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* source: avoid second-step of symlink resolution in directory resolver

We can use the already existing file tree to peform symlink resolution
for FilesByPath, instead of traversing the symlinks again.

This moves all of the symlink logic into the indexing code, and then we
can rely on syft's resolution algorithm over the index in this part of
the codebase.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* source: add base parameter to directory resolver

The new base parameter is an optional parameter for the directory
resolver that resolves all symlinks relative to this root. There are two
intended use cases:

- base = "/". The previous behavior, symlinks are resolved relative to
the root filesystem.
- base = path. Symlinks are resolved relative to the target filesystem,
allowing correct behavior when scanning unpacked container filesystems
on disk.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* source: add tests for new base parameter

Signed-off-by: Justin Chadwell <me@jedevc.com>

---------

Signed-off-by: Justin Chadwell <me@jedevc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symlink traversal confusion when scanning unpacked filesystems
4 participants