Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework input enumeration #216

Merged
merged 5 commits into from
Aug 22, 2024
Merged

Rework input enumeration #216

merged 5 commits into from
Aug 22, 2024

Conversation

bradlarsen
Copy link
Collaborator

@bradlarsen bradlarsen commented Aug 20, 2024

Inputs are now enumerated incrementally as scanning proceeds rather than done in an initial batch step. This reduces peak memory use and CPU time 10-20%, particularly in environments with slow I/O. A consequence of this change is that the total amount of data to scan is not known until it has actually been scanned, and so the scanning progress bar no longer shows a completion percentage.

@bradlarsen bradlarsen added enhancement New feature or request performance Related to runtime performance content discovery Related to enumerating or specifying content to scan ux Related to the user experience, invocation, or CLI labels Aug 20, 2024
@bradlarsen bradlarsen marked this pull request as ready for review August 21, 2024 14:52
The newer vectorscan-rs release implements several additional traits for
the types it exposes, including Clone.

In Nosey Parker scanning, each time a scanning thread is spawned, a
matching context needs to be created. With the changes here, a single
initial context is cloned rather than a new one being reinitialized from
scratch. This saves several CPU seconds on some larger scanner inputs.

(What would be even better is if each scanner thread were long-lived and
initialized exactly once. But this is difficult to control using Rayon.)
@bradlarsen bradlarsen merged commit 3e92364 into main Aug 22, 2024
15 checks passed
@bradlarsen bradlarsen deleted the rework-input-enumeration branch August 22, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content discovery Related to enumerating or specifying content to scan enhancement New feature or request performance Related to runtime performance ux Related to the user experience, invocation, or CLI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant