Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

noseyparker 0.20.0 #192836

Merged
merged 2 commits into from
Oct 4, 2024
Merged

noseyparker 0.20.0 #192836

merged 2 commits into from
Oct 4, 2024

Conversation

BrewTestBot
Copy link
Member

Created by brew bump


Created with brew bump-formula-pr.

release notes
### Overview

The most significant feature addition to this release is a new "extensible enumerator" mechanism, which makes it possible to scan content from arbitrary sources with Nosey Parker without having to write it to the filesystem.

This release also includes several changes that speed up and slim down the scanning process. A 10-30% reduction in wall clock time and a 10-50% reduction in memory use are typical, but in some unusual cases, wall clock and memory use are reduced 10-20x.

Happy secret hunting!

Additions

  • An experimental "extensible enumerator mechanism" has been added to the scan command (#220). This allows Nosey Parker to scan inputs produced by any program that can emit JSON objects to stdout, without having to first write the inputs to the filesystem. It is invoked with the new --enumerator=FILE option, where FILE is a JSON Lines file. Each line of the enumerator file should be a JSON object with one of the following forms:

    { "content_base64": "base64-encoded bytestring to scan", "provenance": <arbitrary object> }
    { "content": "utf8 string to scan", "provenance": <arbitrary object> }
    

    Shell process substitution can make streaming invocation ergonomic, e.g., scan --enumerator=<(my-enumerator-program).

Changes

  • Inputs are now enumerated incrementally as scanning proceeds rather than done in an initial batch step (#216). This reduces peak memory use and wall clock time 10-20%, particularly in environments with slow I/O. A consequence of this change is that the total amount of data to scan is not known until it has actually been scanned, and so the scanning progress bar no longer shows a completion percentage.

  • When cloning Git repositories while scanning, the progress bar for now includes the current repository URL (#212).

  • When scanning, automatically cloned Git repositories are now recorded with the path given on the command line instead of the canonicalized path (#212). This makes datastores slightly more portable across different environments, such as within a Docker container and on the host machine, as relative paths can now be recorded.

  • The deprecated --rules=PATH alias for --rules-path=PATH has been removed from the scan and rules commands.

  • The built-in support for enumerating and interacting with GitHub is now a compile time-selectable feature that is enabled by default (#213). This makes it possible to build a slimmer release for environments where GitHub functionality is unused.

  • A new rule has been added:

    • Bitbucket App Password (#219 from @gemesa)
  • The default number of parallel scanner jobs is now higher on many systems (#222). This value is determined in part by the amount of system RAM; due to several memory use improvements, the required minim RAM per job has been reduced, allowing for more parallelism.

Fixes

  • The Google OAuth Credentials rule has been revised to avoid runtime errors about an empty capture group.

  • The AWS Secret Access Key rule has been revised to avoid runtime Regex failed to match errors.

  • The code that determines first-commit provenance information for blobs from Git repositories has been reworked to improve memory use (#222). In typical cases of scanning Git repositories, this reduces both peak memory use and wall clock time by 20-50%. In certain pathological cases, such as Homebrew or nixpkgs, the new implementation uses up to 20x less peak memory and up to 5x less wall clock time.

  • When determining blob provenance informatino from Git repositories, blobs that first appear multiple times within a single commit will now be reported with all names they appear with (#222). Previously, one of the pathnames would be arbitrarily selected.

@github-actions github-actions bot added rust Rust use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` boost Boost use is a significant feature of the PR or issue labels Oct 4, 2024
Copy link
Contributor

github-actions bot commented Oct 4, 2024

🤖 An automated task has requested bottles to be published to this PR.

@github-actions github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Oct 4, 2024
@BrewTestBot BrewTestBot added this pull request to the merge queue Oct 4, 2024
Merged via the queue into master with commit d450526 Oct 4, 2024
15 checks passed
@BrewTestBot BrewTestBot deleted the bump-noseyparker-0.20.0 branch October 4, 2024 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
boost Boost use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. rust Rust use is a significant feature of the PR or issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants