Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enqueue files filter #2599

Merged
merged 16 commits into from
Dec 29, 2022
Merged

Enqueue files filter #2599

merged 16 commits into from
Dec 29, 2022

Conversation

michaelcfanning
Copy link
Member

@michaelcfanning michaelcfanning commented Dec 28, 2022

This is a significant change that touches on some core driver concepts, worth a careful review.

  • Updates max file size in kb type from int to long.
  • Marks AnalyzeCommandBase as obsolete, we should delete it soon.
  • We now emit a warning anytime a file is skipped due to exceeding size threshold.
  • Updated the file size option to allow setting it to any value 0 or great. The rationale here is that setting the threshold to 0 would allow a tool to collect a log file that documents what files would have been scanned but were not.
  • Found and fixed at least one bug accounting for the hangs observed in Linux analysis (due to exceptions enumerating files).
  • Provide an overridable mechanism for updating file filtering logic (being file size). This will allow tools like BinSkim to examine and discard files before the hashing phase (because they aren't Windows PEs or other scan targets). This will resolve the horrible perf slowdowns observed in pipelines that naively apply wildcard specifiers at analysis time.
  • I've converted the max file size in kb argument to an actual context property. This means it can be configured using exported XML. We should convert all command-line arguments in this way. i.e., you should be able to pass an XML file only to drive analysis.

@shaopeng-gh @LingZhou-gh @EasyRhin0 @suvamM @yongyan-gh

@rtaket @cfaucon, heads up: this SDK will emit skipped file warnings, as requested.

src/Sarif/AnalyzerContextBase.cs Fixed Show fixed Hide fixed
Comment on lines +474 to +478
catch (Exception e)
{
Errors.LogUnhandledEngineException(_rootContext, e);
ThrowExitApplicationException(_rootContext, ExitReason.UnhandledExceptionInEngine);
}

Check notice

Code scanning / CodeQL

Generic catch clause

Generic catch clause.
@@ -46,8 +46,10 @@ public static void LogExceptionLoadingTarget(IAnalysisContext context)
throw new ArgumentNullException(nameof(context));
}

context.RuntimeErrors |= RuntimeConditions.ExceptionLoadingTargetFile;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context

We now set RuntimeErrors early in every method, so that this data is persisted in the event that an unhandled exception occurs in the remainder of the method (as happened during my current dev/test loop).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what exception you got, guess its thrown by logger? Should we apply the same change to Warnings?

}

// '{0}' was skipped as its size ({1} kilobytes) exceeds the currently configured threshold ({2} kilobytes).
context.Logger.LogConfigurationNotification(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context

Produces a console warning at runtime like so. Note that we include the full path name of the skipped file in the output. I produced this message by configuring the max file in kb argument via a config file, demonstrating that this is functional using actual client tool.

BINSKIM : warning WRN997.FileSkippedDueToSize : 'D:\repros\NetperfServiceReleaseFolder\VMAgents\Linux\NetPerfCanaryAppOrchestrator_artifact\NetPerfOrchestrator\resources\scripts\SecondaryNic.sh' was not analyzed as its size (4 kilobytes) exceeds the currently configured threshold (1 kilobytes).

@@ -12,6 +12,7 @@

namespace Microsoft.CodeAnalysis.Sarif.Driver
{
[Obsolete("AnalyzeCommandBase will be deprecated entirely soon. Use MultithreadedAnalyzeCommandBase instead.")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeCommandBase

It grows wearisome to maintain this obsolete single-threaded analysis, let's zap it next release.

@@ -94,15 +95,16 @@
ExecutionException = ex;
return FAILURE;
}
catch (Exception ex)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogUnhandledEngineException

This helper previously returned an error condition to OR into RuntimeErrors. I've updated the helper to be consistent with other utility methods. We now retrieve the RuntimeErrors in the finally block.

@@ -214,7 +213,7 @@

// 1: First we initiate an asynchronous operation to locate disk files for
// analysis, as specified in analysis configuration (file names, wildcards).
Task<bool> enumerateFilesOnDisk = EnumerateFilesOnDiskAsync(options, rootContext);
Task<bool> enumerateFilesOnDisk = EnumerateFilesOnDiskAsync(options);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnumerateFilesOnDiskAsync

This method already access instance data, so we'll just utilize _rootContext.

@@ -363,107 +367,129 @@
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShouldEnqueue

This diff is all screwed up here (in codeflow). This is the core new feature, we provide an overridable mechanism for filtering files at file enumeration time. This will allow binskim to check to see whether a file can be scanned. Currently, we let all files in and then they all get hashed (and then discarded), resulting in egregious slowdowns.


readyToHashChannel.Writer.Complete();
catch (Exception e)
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so here's the source of some of the hangs that have been reported, if file enumeration raised an unhandled exception, we never closed the write channel. Oops.

@@ -115,7 +115,6 @@ protected override string ConstructTestOutputFromInputResource(string inputResou
Quiet = true,
PrettyPrint = true,
Optimize = true,
Kind = new List<ResultKind> { ResultKind.Fail },
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind

This is now a default in the options ctor.

@@ -9,12 +9,6 @@ namespace Microsoft.CodeAnalysis.Sarif
{
public class TestAnalyzeOptions : AnalyzeOptionsBase
{
public TestAnalyzeOptions()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestAnalyzeOptions

set now in the base ctor.

@michaelcfanning michaelcfanning marked this pull request as ready for review December 28, 2022 22:21
{
// TODO: these defaults need to be converted to the configuration
// property pattern as followed by MaxFileSizeInKilobytes.
Traces = new string[] { };

Check warning

Code scanning / CodeQL

Virtual call in constructor or destructor

Avoid virtual calls in a constructor or destructor.
@@ -89,13 +89,14 @@
if (!(ex is ExitApplicationException<ExitReason>))
{
// These exceptions escaped our net and must be logged here
RuntimeErrors |= Errors.LogUnhandledEngineException(_rootContext, ex);
Errors.LogUnhandledEngineException(_rootContext, ex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors.LogUnhandledEngineException

so this call may throw exception, should we add a 'catch' block in the first 'try' block to catch it?

if (!IsTargetWithinFileSizeLimit(file, _rootContext.MaxFileSizeInKilobytes, out long fileSizeInKb))
{
_ignoredFilesCount++;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question,
we use the ShouldEnqueue() to see if to add to sortedFiles list.
we use the IsTargetWithinFileSizeLimit to see if to add to _ignoredFilesCount.

Should we use the same logic, by just using the else

We can add more logics to ShouldEnqueue in the future.

@@ -118,7 +129,7 @@
[Option(
"max-file-size-in-kb",
HelpText = "The maximum file size (in kilobytes) that will be analyzed.",
Default = 1024)]
public int MaxFileSizeInKilobytes { get; set; } = 1024;
Default = -1)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1

the benefit of have 1024 in the CommonOption here is it will be shown to user. with changed to -1, it is actually 1024 as we coded inside c# but just is not as clear to user.

Copy link
Collaborator

@shaopeng-gh shaopeng-gh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelcfanning michaelcfanning merged commit 859c2fe into main Dec 29, 2022
@michaelcfanning michaelcfanning deleted the enqueue-files-filter branch December 29, 2022 00:25
This was referenced Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants