Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing AnalyzeCommandBase and MultithreadedAnalyzeCommandBase artifacts generation #2433

Merged
merged 16 commits into from
Jan 31, 2022

Conversation

eddynaka
Copy link
Collaborator

@eddynaka eddynaka commented Jan 28, 2022

Description

The single/multi-threaded command base was generating all the artifacts even if we were not producing results.

Test

Testing multiple files with and without the hashes flag and checking if artifacts were produced.

This PR has a dependency on this: #2434, which will reduce the number of files to be analyzed once we merge it. That PR is just rebaselining our test files, preventing new changes from receiving all the changes we are seeing in this.

_run?.GetFileIndex(new ArtifactLocation { Uri = context.TargetUri },
addToFilesTableIfNotPresent: _persistArtifacts,
dataToInsert: _dataToInsert,
hashData: context.Hashes);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only persist if:

  1. we have results
  2. if we have hashes/textfiles/binaryfiles flag enabled

@@ -485,10 +491,6 @@ private async Task<bool> HashAsync()
_hashToFilesMap[hashData.Sha256] = paths;
}

_run?.GetFileIndex(new ArtifactLocation { Uri = context.TargetUri },
dataToInsert: _dataToInsert,
hashData: hashData);
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this to prevent us from generating artifacts for all the files #ByDesign

@@ -1165,7 +1165,7 @@ public void AnalyzeCommandBase_AutomationDetailsTests()
}
}

[Fact(Timeout = 5000)]
[Fact(Timeout = 5000, Skip = "Artifacts will be different while we don't fix SarifLogger and AnalyzeCommandBase.")]
public void AnalyzeCommandBase_ShouldGenerateSameResultsWhenRunningSingleAndMultiThread_CoyoteTest()
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeCommandBase_ShouldGenerateSameResultsWhenRunningSingleAndMultiThread_CoyoteTest

This test is comparing the output of a single thread vs multithreaded command base.

With this change, both will have different outputs which will be fixed in the future. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already fixed in this pr

@@ -1165,7 +1165,7 @@ public void AnalyzeCommandBase_AutomationDetailsTests()
}
}

[Fact(Timeout = 5000)]
[Fact(Timeout = 5000, Skip = "Artifacts will be different while we don't fix SarifLogger and AnalyzeCommandBase.")]
public void AnalyzeCommandBase_ShouldGenerateSameResultsWhenRunningSingleAndMultiThread_CoyoteTest()
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeCommandBase_ShouldGenerateSameResultsWhenRunningSingleAndMultiThread_CoyoteTest

SingleAndMultithreaded #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@@ -170,7 +170,7 @@ private static void VerifySuppressCommand(SuppressOptions options)

if (options.ExpiryInDays > 0 && suppression.TryGetProperty("expiryUtc", out DateTime expiryUtc))
{
expiryUtc.Should().BeCloseTo(DateTime.UtcNow.AddDays(options.ExpiryInDays));
expiryUtc.Should().BeCloseTo(DateTime.UtcNow.AddDays(options.ExpiryInDays), precision: 500);
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500

used a shared constant #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@michaelcfanning
Copy link
Member

michaelcfanning commented Jan 28, 2022

Where is my release note?


In reply to: 1024526220

@@ -31,15 +31,14 @@ internal static IFileSystem FileSystem

public static IDictionary<string, HashData> MultithreadedComputeTargetFileHashes(IEnumerable<string> analysisTargets, bool suppressConsoleOutput = false)
{
if (analysisTargets == null) { return null; }
var fileToHashDataMap = new ConcurrentDictionary<string, HashData>();
if (analysisTargets == null) { return fileToHashDataMap; }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prevent a NullReferenceException when we use the reerence variable from the SarifLogger

@@ -447,12 +447,16 @@ private void CaptureArtifact(ArtifactLocation fileLocation)
catch (ArgumentException) { } // Unrecognized encoding name
}

HashData hashData = null;
AnalysisTargetToHashDataMap?.TryGetValue(fileLocation.Uri.OriginalString, out hashData);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisTargetToHashDataMap?.TryGetValue

if we have the hash, let's re-use it.
this will prevent us from hashing the same file twice

@@ -169,7 +169,7 @@ private void Analyze(TOptions options, AggregatingLogger logger)
targets = ValidateTargetsExist(_rootContext, targets);

// 5. Initialize report file, if configured.
InitializeOutputFile(options, _rootContext, targets);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitializeOutputFile(options, _rootContext, targets);

we won't initialize the outputfile with targets because we don't want to emit all targets in the artifacts.

paths.Add(localPath);
context.Hashes = hashData;
_pathToHashDataMap?.Add(localPath, hashData);
}
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this will enable us to use the code from SarifLogger to create the artifacts #ByDesign

"index": 0
}
}
],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will go away, since we are not using any flag to emit artifacts.

},
{
"id": "SARIF2012",
"name": "ProvideRuleProperties",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name

the order of the properties were enforced by shaopeng in a previous PR.

@@ -2,6 +2,7 @@

## Unreleased

* BUGFIX: Fix `AnalyzeCommandBase` and `MultithreadedAnalyzeCommandBase` from outputting all artifacts to SARIF even if no results were produced when Hashes is enabled. [#2433](https://github.com/microsoft/sarif-sdk/pull/2433)
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUGFIX

  • BREAKING: AnalyzeCommandBase previously persisted all scan target artifacts to SARIF logs rather than only persisting artifacts referenced by an analysis result, when an option to persist hashes, text file or binary information was set. MultithreadedAnalyzeCommandBase previously persisted all scan targets artifacts to SARIF logs in cases when hash insertion was eenabled rather than only persisting artifacts referenced by an analysis result. #Resolved

@@ -31,15 +31,14 @@ internal static IFileSystem FileSystem

public static IDictionary<string, HashData> MultithreadedComputeTargetFileHashes(IEnumerable<string> analysisTargets, bool suppressConsoleOutput = false)
{
if (analysisTargets == null) { return null; }
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null

Please revert this change. You should be careful about allocating objects to indicate no results. Why not just have the caller handle null? This code succeeded previously, so you probably have only just introduced the requirement for this to be non-null. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@@ -169,7 +169,7 @@ private void Analyze(TOptions options, AggregatingLogger logger)
targets = ValidateTargetsExist(_rootContext, targets);

// 5. Initialize report file, if configured.
InitializeOutputFile(options, _rootContext, targets);
Copy link
Member

@michaelcfanning michaelcfanning Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targets

Revert this change please it's unnecessarily breaking customers. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API isn't exposed to customers. Only tools that use it will see a difference.

@@ -80,6 +80,9 @@ public enum OptionallyEmittedData : int
// Enrich SARIF log with git blame information
GitBlameInformation = 0x1000,

// Enrich with artifacts only.
Artifacts = 0x2000,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifacts

To keep the same behavior as before, I introduced a new flag that will be able to emit the artifacts only

@eddynaka eddynaka changed the title Fixing multithreaded artifacts generation Fixing AnalyzeCommandBase and MultithreadedAnalyzeCommandBase artifacts generation Jan 28, 2022
@@ -120,7 +120,7 @@
"index": 0
},
"region": {
"startLine": 22,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's open an issue on it.

@eddynaka eddynaka marked this pull request as ready for review January 31, 2022 17:56
Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelcfanning michaelcfanning merged commit 6263379 into main Jan 31, 2022
@michaelcfanning michaelcfanning deleted the users/ednakamu/fixing-multithreaded-artifacts branch January 31, 2022 23:35
yongyan-gh pushed a commit that referenced this pull request Feb 8, 2022
…ifacts generation (#2433)

* Fixing multithreaded artifacts generation

* Fixing tests and flags

* Loosing precision.

* Applying fix for AnalyzeCommandBase

* Enabling tests

* Updating test case and release history

* Creating const to prevent magical numbers everywhere

* Rebaselining tests

* Creating Artifacts flag to keep previous behavior

* Addressing PR feedback.

* Rollback changes

* Update SARIF2012.ProvideRuleProperties_Invalid.sarif

* updating back

* Ordering deprecated names
michaelcfanning added a commit that referenced this pull request Mar 2, 2022
)

* Add new visitor to get deterministic SARIF log by sorting results

* Fix dotnet format issue

* updating format

* remove unnecessary using

Format & minor fixes

* Add Run Comparer to support sorting logs with multiple runs.

* Add command argument unit tests

fix dotnet format

* use ContainsKey to avoid allocating variable

* Fixing `AnalyzeCommandBase` and `MultithreadedAnalyzeCommandBase` artifacts generation (#2433)

* Fixing multithreaded artifacts generation

* Fixing tests and flags

* Loosing precision.

* Applying fix for AnalyzeCommandBase

* Enabling tests

* Updating test case and release history

* Creating const to prevent magical numbers everywhere

* Rebaselining tests

* Creating Artifacts flag to keep previous behavior

* Addressing PR feedback.

* Rollback changes

* Update SARIF2012.ProvideRuleProperties_Invalid.sarif

* updating back

* Ordering deprecated names

* `SarifLogger` now emits an artifacts table entry if `artifactLocation` is not null for tool configuration and tool execution notifications. (#2437)

* Fixing artifacts generation when logging notifications

* Updating release history.

* Updating ReleaseHistory

* Fix `ArgumentException` when recurse is enabled and two file target specifiers generates the same file paths (#2438)

* Fixing ArgumentException when passing two filePaths that generates duplicated file analysis

* Fixing dotnet-format issues and updating releasehistory

* Removing comments

* Addressing PR feedback

* Addressing PR feedback

* Addressing PR review issues

Add suppression support (#2435)

* Add suppression support

* Add incompatibility check and make suppressions non-null

Co-authored-by: Eddy Nakamura <eddynaka@gmail.com>

Update releasehistory

fix couple test cases

* Fix issues in PR review

* Add xml comments

* Fix test issues

* fix dotnet format

* Addressing review feedbacks

* Fix tests

* Update extension methods names

* Change xml doc comments to normal comments

Co-authored-by: Eddy Nakamura <eddynaka@gmail.com>
Co-authored-by: Michael C. Fanning <michael.fanning@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants