Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE : Provide overridable ShouldComputeHashes predicate method to prevent files from hashing. #2601

Merged
merged 2 commits into from
Dec 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/ReleaseHistory.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,11 @@
* BUGFIX : Eliminate `IOException` and `DirectoryNotFoundException` exceptions thrown by `merge` command when splitting by rule (due to invalid file characters in rule ids). [#2513](https://github.com/microsoft/sarif-sdk/pull/2513)
* BUGFIX : Fix classes inside NotYetAutoGenerated folder missing `virtual` keyword for public methods and properties, by regenerate and manually sync the changes. [#2537](https://github.com/microsoft/sarif-sdk/pull/2537)
* BUGFIX : MSBuild Converter now accepts case insensitive keywords and supports PackageValidator msbuild log output. [#2579](https://github.com/microsoft/sarif-sdk/pull/2579)
* BUGFIX: Eliminate `NullReferenceException` when file hashing fails (due to file locked or other errors reading the file). [#2596](https://github.com/microsoft/sarif-sdk/pull/2596)
* BUGFIX : Eliminate `NullReferenceException` when file hashing fails (due to file locked or other errors reading the file). [#2596](https://github.com/microsoft/sarif-sdk/pull/2596)
* FEATURE : Provide `PluginDriver` property (`AdditionalOptionsProvider`) that allows additional options to be exported (typically for command-line arguments). [#2599](https://github.com/microsoft/sarif-sdk/pull/2599)
* FEATURE : Provide `LogFileSkippedDueToSize` that fires a warning notification if any file is skipped due to exceeding size threshold. [#2599](https://github.com/microsoft/sarif-sdk/pull/2599)
* FEATURE : Provide overridable `ShouldEnqueue` predicate method to filter files from driver processing. [#2599](https://github.com/microsoft/sarif-sdk/pull/2599)
* FEATURE : Provide overridable `ShouldComputeHashes` predicate method to prevent files from hashing. [#2601](https://github.com/microsoft/sarif-sdk/pull/2601)
* FEATURE : Allow external set of `MaxFileSizeInKilobytes`, which will allow SDK users to change the value. (Default value is 1024) [#2578](https://github.com/microsoft/sarif-sdk/pull/2578)
* FEATURE : Add a Github validation rule `GH1007`, which requires flattened result message so GHAS code scanning can ingest the log. [#2580](https://github.com/microsoft/sarif-sdk/issues/2580)
* FEATURE : Provide mechanism to populate `SarifLogger` with a `FileRegionsCache` instance.
Expand Down
9 changes: 8 additions & 1 deletion src/Sarif.Driver/Sdk/MultithreadedAnalyzeCommandBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,11 @@ protected virtual bool ShouldEnqueue(string file, TContext context)
return shouldEnqueue;
}

protected virtual bool ShouldComputeHashes(string file, TContext context)
Copy link
Collaborator Author

@shaopeng-gh shaopeng-gh Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShouldComputeHashes

This change enables override of ShouldComputeHashes instead of ShouldEnqueue

speed test of my local files folder with 7k files, roughly:

  1. previous PR using ShouldEnqueue: 00:03:47
  2. similar to before any change, directly return true: 00:02:48
  3. this PR and override with the check we desire(if is binary): 00:02:13

after this merged next will send PR for BinSkim using it.

*also tested using filter .exe
no benifit at all, both around 00:02:32

{
return true;
}

private async Task<bool> EnumerateFilesOnDiskAsync(TOptions options)
{
try
Expand Down Expand Up @@ -533,7 +538,9 @@ private async Task<bool> HashFilesAndPutInAnalysisQueueAsnc()
TContext context = _fileContexts[index];
string localPath = context.TargetUri.LocalPath;

HashData hashData = HashUtilities.ComputeHashes(localPath, FileSystem);
HashData hashData = ShouldComputeHashes(localPath, _rootContext)
? HashUtilities.ComputeHashes(localPath, FileSystem)
: null;

context.Hashes = hashData;

Expand Down