Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multithreaded driver analyze command w/deterministic output #2146

Merged
merged 15 commits into from
Nov 10, 2020

Conversation

michaelcfanning
Copy link
Member

This change introduces a new MultithreadedAnalyzeCommand that parallelizes analysis while also guaranteeing that all results are output in a deterministic order run-over-run. The mechanism is like so:

  1. A single-threaded enumerator/channel writer enqueues an alphabetized list of directories (based on command-line arguments), then dequeues each directory, enumerates files within each, alphabetizes them and creates an analysis context object for each that is stored in a list. The index of each created context item is written to the channel.
  2. A multithreaded channel reader receives an index for a file to scan, retrieves its corresponding context object from the global array, and analyzes it. Any results are written to the per-scan target context object. On completing analysis, the analysis thread writes the current context index to the results logging channel.
  3. A single-threaded reader receives the index of a context object which represents a scanned target. If the reader has already processed all scan targets leading up to that index, it retrieves the results from the current context and writes them again to the global, actual persisted log (as well as the console). The writer also 'runs ahead' on persisting results if it observes that other threads have completed analysis work which hasn't yet resulted in a channel callback. The writer nulls each context object in the array as work is completed.

To test the above, I've added an 'analyze-test' command to the multitool, which can be a temporary addition until we harden the new code and dropped the single-threaded analysis entirely (this analysis command can eventually be replaced by the multithread analysis configured to run on a single thread). Because we've hit our limit for verbs (as supported by CommandLineLibrary), I had to drop a result matching verb previously used mostly for testing.

Other work and open items:

  1. Most unit tests go through both the single and multi-threaded code paths but not all.
  2. This change includes an optimization to the SarifLogger to stop emitting non-useful artifacts table data when hashing is not enabled.
  3. I discovered an inefficiency in the previous analysis where file hashing is duplicated (i.e., it is performed once when processing scan targets and again the first time any file actually produces a result).

@eddynaka, this is still a draft.

@lgtm-com
Copy link

lgtm-com bot commented Nov 5, 2020

This pull request introduces 3 alerts when merging f40a308 into 173eaa0 - view on LGTM.com

new alerts:

  • 3 for Useless assignment to local variable #Resolved

@microsoft microsoft deleted a comment from lgtm-com bot Nov 6, 2020
Comment on lines 38 to 39
<Reference Condition="'$(TargetFramework)' == 'net45' Or '$(TargetFramework)' == 'net461' " Include="System.Web" />
<Reference Condition="'$(TargetFramework)' == 'net45' " Include="System.Runtime" />
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this, since we don't target net45 #Resolved

Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, for line 38, we should update and add net452 and probably net472. #Resolved

@lgtm-com
Copy link

lgtm-com bot commented Nov 10, 2020

This pull request introduces 3 alerts when merging 6ef9b22 into 3f4d953 - view on LGTM.com

new alerts:

  • 3 for Useless assignment to local variable #Resolved

Comment on lines 38 to 39
<Reference Condition="'$(TargetFramework)' == 'net45' Or '$(TargetFramework)' == 'net461' " Include="System.Web" />
<Reference Condition="'$(TargetFramework)' == 'net45' " Include="System.Runtime" />
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, for line 38, we should update and add net452 and probably net472. #Resolved

fileToHashDataMap[filePath] = HashUtilities.ComputeHashes(filePath);
HashData hashData = HashUtilities.ComputeHashes(filePath);

lock (fileToHashDataMap)
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using a ConcurrentDictionary, probably we won't need the lock. #Resolved


namespace Microsoft.CodeAnalysis.Sarif.Multitool
{
public class AnalyzeTestSkimmer : Skimmer<AnalyzeTestContext>
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this in a test, should we mark as internal or add a #if debug condition? #Resolved

namespace Microsoft.CodeAnalysis.Sarif.Multitool
{
[Verb("analyze-test", HelpText = "Test the analysis driver framework.")]
public class AnalyzeTestOptions : MultithreadedAnalyzeOptionsBase
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this in a test, should we mark as internal or add a #if debug condition? #Resolved

{
public Exception TargetLoadException { get; set; }

public bool IsValidAnalysisTarget { get; set; }
public bool IsValidAnalysisTarget
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced for this: public bool IsValidAnalysisTarget => true; #Resolved

// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System;
using System.Collections.Generic;
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove those using's that we are not using? #Resolved


namespace Microsoft.CodeAnalysis.Sarif.Multitool
{
public class AnalyzeTestCommand : MultithreadedAnalyzeCommandBase<AnalyzeTestContext, AnalyzeTestOptions>
Copy link
Collaborator

@eddynaka eddynaka Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mark as internal or add a #if debug since its a test? #Resolved

@lgtm-com
Copy link

lgtm-com bot commented Nov 10, 2020

This pull request introduces 3 alerts when merging 2de08a1 into 6443453 - view on LGTM.com

new alerts:

  • 3 for Useless assignment to local variable

@michaelcfanning michaelcfanning marked this pull request as ready for review November 10, 2020 19:42
@lgtm-com
Copy link

lgtm-com bot commented Nov 10, 2020

This pull request introduces 2 alerts when merging be28b85 into 6443453 - view on LGTM.com

new alerts:

  • 2 for Useless assignment to local variable

@lgtm-com
Copy link

lgtm-com bot commented Nov 10, 2020

This pull request introduces 2 alerts when merging c7f4fb7 into 6443453 - view on LGTM.com

new alerts:

  • 2 for Useless assignment to local variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants