Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass additional Chunk information to detectors #1517

Open
rgmz opened this issue Jul 20, 2023 · 0 comments · May be fixed by #1741
Open

Pass additional Chunk information to detectors #1517

rgmz opened this issue Jul 20, 2023 · 0 comments · May be fixed by #1741
Labels
enhancement pkg/engine PRs and Issues related to the `engine` package pkg/sources PRs and Issues related to the `sources` package

Comments

@rgmz
Copy link
Contributor

rgmz commented Jul 20, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

Presently, detectors have no knowledge of the source (e.g., "Git") or metadata (e.g., "file: package-lock.json"), and only receive a stream of bytes.

// FromData will scan bytes for results, and optionally verify them.
FromData(ctx context.Context, verify bool, data []byte) ([]Result, error)

While this design makes sense given TruffleHog's goal of scanning a multitude of sources (e.g., Git, Confluence, Slack), the lack of contextual information limits the power/usefulness of the detectors. For example, you cannot skip known bad filetypes like yarn.lock (#1460)1, nor can you write filetype/language-specific rules like checking for JDBC credentials in .java/JVM code2.

Problem to be Addressed

Provide more context to Decoders so that it's possible to ignore known bad files/filetypes and write file/filetype-specific rules.

Description of the Preferred Solution

A few potential solutions come to mind:

  1. Replace the FromData(ctx context.Context, verify bool, data []byte) ([]Result, error) function with FromChunk(ctx context.Context, chunk Chunk) ([]Result, error)
    https://github.com/trufflesecurity/trufflehog/blob/20b77938285b82bc80531ba176989b7f8bae8c4b/pkg/sources/sources.go#L14C1-L29
  2. Alter the signature of FromData to include SourceType as well as SourceMetadata (presumably you'd want SourceType to make pulling relevant metadata easier).
  3. Add a "preflight" check for each detector, separate from FromData, to determine whether or not it should run.

Additional Context

N/A

References

Footnotes

  1. As far as I can tell

  2. You can write that rule, however, it seems like it would run on every chunk which could adversely affect performance.

@zricethezav zricethezav added pkg/engine PRs and Issues related to the `engine` package pkg/sources PRs and Issues related to the `sources` package labels Aug 1, 2023
@rgmz rgmz linked a pull request Sep 1, 2023 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement pkg/engine PRs and Issues related to the `engine` package pkg/sources PRs and Issues related to the `sources` package
Development

Successfully merging a pull request may close this issue.

2 participants