Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new visitor to get deterministic SARIF log by sorting results #2422

Merged
merged 29 commits into from
Mar 2, 2022

Conversation

yongyan-gh
Copy link
Collaborator

@yongyan-gh yongyan-gh commented Jan 17, 2022

Description

Some static analysis tools generate non-deterministic SARIF results, means if scan the same targets multiple times, the same issues are detected, but in each output SARIF file the entities in collection (e.g. rules/results/locations) are arranged in random order. It causes the issues for the users want to check if 2 SARIF results present the same set of issues by comparing them using bitwise diff tools.

Fixes

  • Implement Comparer classes derived from IComparer interface for relevant SARIF entities.
  • Implement Visitor to sort the entities using the comparers and handle index remapping when traversing the SARIF log.
  • Add an option in rewrite command to sort the entities in the log and output deterministic SARIF log.

Can get sorted results by adding --sort-results to rewrite command.

This is a short term fix so that the customers can have their issue resolved asap. The comprehensive solution will be auto-generated Comparer implementation code by JSchema for all the SARIF entities instead of handwritten Comparer code.

// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System.Collections.Generic;
using System.Linq;
Copy link
Contributor

@marmegh marmegh Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see several instances where using System.Linq is greyed out in VS, can you double check where this is actually needed and remove any unnecessary? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required in this file, checked other files and removed the unnecessary using.

{
internal static readonly ReportingDescriptorSortingComparer Instance = new ReportingDescriptorSortingComparer();

public int Compare(ReportingDescriptor left, ReportingDescriptor right)
Copy link
Collaborator

@shaopeng-gh shaopeng-gh Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare

qurestion about design, do we need to compare all fields?
I see a few fields is not included like Relationships,MessageStrings

#Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is a short term fix to help resolving issues some tools (E.g. PREfast) generate non-deterministic results for same targets run-over-run asap.

It mainly focus on relevant types cause the non-determinisms we observed in the sample/test SARIF files.

To compare all the fields we need to manually implement IComparer for all SARIF types. E.g. if compare Relationships, RelationshipComparer need to be implemented.

In planned comprehensive solution, we will update JSchema to automatically generates IComparer code for all SARIF types instead of these hand-written code, at that time we will make sure all types are used for comparing. Created issue #2424 for tracking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the detail explain.

{
new CodeFlow
{
ThreadFlows = new []
Copy link
Collaborator

@shaopeng-gh shaopeng-gh Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new []

should do a auto format code for all the files to keep it consistent, I tried no need space is the recommended one. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which tool do you use? I ran dotnet format before commit, it didn't catch all of these additional spaces.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in VS right click file -> code clean up with the default profile.

{
internal class ArtifactContentSortingComparer : IComparer<ArtifactContent>
{
internal static readonly ArtifactContentSortingComparer Instance = new ArtifactContentSortingComparer();
Copy link
Collaborator

@shaopeng-gh shaopeng-gh Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instance

not sure, should we also start with s_ since it is static #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a public (internal) member. It should match naming convention: "All visible static member names are pascal cased."

Copy link
Collaborator

@shaopeng-gh shaopeng-gh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

// save old indexes
for (int i = 0; i < node.Artifacts.Count; i++)
{
if (!oldIndexes.TryGetValue(node.Artifacts[i], out int index))
Copy link
Collaborator

@eddynaka eddynaka Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryGetValue

replace this by a containskey.
this will prevent u from allocating the index. #Resolved

// before sort the rules, save old indexes
for (int i = 0; i < current.Rules.Count; i++)
{
if (!oldIndexes.TryGetValue(current.Rules[i], out int index))
Copy link
Collaborator

@eddynaka eddynaka Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryGetValue

same here. replace this for containskey #Resolved

public void SortingVisitor_ShuffleTest()
{
bool areEqual;
// create a test sarif log
Copy link
Collaborator

@shaopeng-gh shaopeng-gh Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// create a test sarif log

I got a note from Michael in one of my PR that,
comments should be like a sentence start with Capital and ends with a ending.

But that was on a header of a method, not sure if also applies here?
#Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this note applies. This is not a helpful comment in any case.

return compareResult;
}
}
}
Copy link
Member

@michaelcfanning michaelcfanning Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define a helper for this? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the common method in ComparerHelper

return compareResult;
}

compareResult = string.Compare(left.Hashes.ElementAt(i).Value, right.Hashes.ElementAt(i).Value);
Copy link
Member

@michaelcfanning michaelcfanning Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left.Hashes.ElementAt(i)

is this an n^2 access pattern? ouch. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. new implementation in ComparerHelper.CompareDictionary(...)

return compareResult;
}

for (int index_1 = 0; index_1 < left.ThreadFlows.Count; ++index_1)
Copy link
Member

@michaelcfanning michaelcfanning Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index_1

index_1? let's leave underscores out of this. :) #Resolved

/// </summary>
namespace Microsoft.CodeAnalysis.Sarif
{
internal class ReportingConfigurationSortingComparer : IComparer<ReportingConfiguration>
Copy link
Member

@michaelcfanning michaelcfanning Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportingConfigurationSortingComparer

How about we call this 'ReportingConfigurationComparer' #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all comparers class name to {typename}Comparer


public SortingVisitor()
{
ruleIndexMap = new ConcurrentDictionary<int, int>();
Copy link
Member

@michaelcfanning michaelcfanning Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConcurrentDictionary

If you are proposing to make this thread-safe, write a test to prove it. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visitor traverses entire Sarif log by properties in certain order. I could not think multi threads scenario for this. If its true I will change it to normal dictionary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to Dictioanry<>

{
public class SortingVisitor : SarifRewritingVisitor
{
private readonly IDictionary<int, int> ruleIndexMap;
Copy link
Member

@michaelcfanning michaelcfanning Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDictionary

Stop using IDictionary if you want a perf micro-optimization (so you can use TryUpdate...) #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to Dictionary

@yongyan-gh
Copy link
Collaborator Author

Add code coverage data:
image
image

internal class ComparerHelper
{
/// <summary>
/// Compare 2 object refences. Return value false presents need to
Copy link
Member

@michaelcfanning michaelcfanning Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refences

'references'. #Resolved

internal class ComparerHelper
{
/// <summary>
/// Compare 2 object refences. Return value false presents need to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare

These comment needs editing, for grammatical correctness and clarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked with Courtney got all user facing strings reviewed

/// -1 if the first object is null and the second object is not null.
/// 1 if the first object is not null and the second object is null.
/// </param>
/// <returns>Return true if can get a definite compare result, otherwise return false.</returns>
Copy link
Member

@michaelcfanning michaelcfanning Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this getting compiled? where? who can see your comment outside this source file?

otherwise convert this comment to a regular comment and make sure it's useful. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE intellisense shows this comment for people wants to call it. I d like to keep the comment for this method since it has some behaviors cannot tell by just signatures.
Removed comments for other methods in this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to normal comments since its internal class its no visible outside of Sarif.Sdk

/// 1 if the first object is not null and the second object is null.
/// </param>
/// <returns>Return true if can get a definite compare result, otherwise return false.</returns>
public static bool CompareReference(object left, object right, out int result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompareReference

Rename this to 'TryCompare'.

This is internal code, so you could also return a nullable int. If it's null, no result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed.
to be consistent with other compares method i d like to return int instead of nuallable int. what do you think?


namespace Microsoft.CodeAnalysis.Sarif.Comparers
{
internal class ComparerHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComparerHelper

Perhaps create class named ComparerExtensions.cs?

with extension methods.

public static bool TryCompare(this object left, object right, out int result)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from helper methods to extension methods.
Renamed this method to 'TryReferenceCompares', following .net "ReferenceEquals" naming


if (object.ReferenceEquals(left, right))
{
result = 0;
Copy link
Member

@michaelcfanning michaelcfanning Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary assignment here. #Resolved


private static int CompareDictionaryHelper<T>(IDictionary<string, T> left, IDictionary<string, T> right, Func<T, T, int> compareFunc)
{
if (compareFunc == null)
Copy link
Member

@michaelcfanning michaelcfanning Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareFunc

Can you use a more descriptive name here? How about 'compareFunction'? #Resolved

{
return compareResult;
}

Copy link
Member

@michaelcfanning michaelcfanning Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (left.TryReferenceCompare(right, out result))
{
return result;
}

if ((object)left.TryReferenceCompare((object)right, out result))
{
return result;
}

    #Resolved

/// </summary>
namespace Microsoft.CodeAnalysis.Sarif.Comparers
{
internal class CodeFlowComparer : IComparer<CodeFlow>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal

All of these classes s/be public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was following the same pattern of Sarif SDK's EqualityComparers. The EqualityComparer class is internal and have a public reference in Sarif types, e.g.
public static IEqualityComparer ValueComparer => ArtifactLocationEqualityComparer.Instance;

}

[Fact]
public void CompareList__Shuffle_Tests()
Copy link
Member

@michaelcfanning michaelcfanning Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompareList__Shuffle_Tests

two underscores '__' in this name? #Resolved

}

[Fact]
public void CompareList__Shuffle_Tests()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompareList__Shuffle_Tests

Go make this test fail. Show us the test output that makes it easy for a developer to repro the failure, using the random seed that generated the failing case.

Copy link
Collaborator Author

@yongyan-gh yongyan-gh Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random object created with RandomSarifLogGenerator.GenerateRandomAndLog logs the seed in test output. Below is an example:

[xUnit.net 00:00:25.02]     Microsoft.CodeAnalysis.Test.UnitTests.Sarif.Comparers.ComparersTests.CompareList_Shuffle_Tests [FAIL]
  Failed Microsoft.CodeAnalysis.Test.UnitTests.Sarif.Comparers.ComparersTests.CompareList_Shuffle_Tests [203 ms]
  Error Message:
   Expected result to be 0, but found -1.
  Stack Trace:
     at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message)
   at FluentAssertions.Execution.TestFrameworkProvider.Throw(String message)
   at FluentAssertions.Execution.DefaultAssertionStrategy.HandleFailure(String message)
   at FluentAssertions.Execution.AssertionScope.FailWith(Func`1 failReasonFunc)
   at FluentAssertions.Execution.AssertionScope.FailWith(Func`1 failReasonFunc)
   at FluentAssertions.Execution.AssertionScope.FailWith(String message, Object[] args)
   at FluentAssertions.Numeric.NumericAssertions`1.Be(T expected, String because, Object[] becauseArgs)
   at Microsoft.CodeAnalysis.Test.UnitTests.Sarif.Comparers.ComparersTests.CompareList_Shuffle_Tests() in C:\repo\github.com\sarif-sdk\src\Test.UnitTests.Sarif\Comparers\ComparersTests.cs:line 40
  Standard Output Messages:
 TestName: CompareList_Shuffle_Tests has seed 727824218

Also updated the RandomSarifLogGenerator.GenerateRandomAndLog to accept a specific seed so dev can easily reuse the seed in the failed case:
Random random = RandomSarifLogGenerator.GenerateRandomAndLog(this.output, seed: 727824218);

Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yongyan-gh
Copy link
Collaborator Author

CodeQL reported 1 error and 1 warning both are false-positive.

  • Error: Dereferenced variable is always null

    IList<int> list1 = null;
    IList<int> list2 = null;
    list1.ListCompare(list2).Should().Be(0);
    list2.ListCompare(list1).Should().Be(0);

    The ListCompare() method is an extension method, it handles the case that extended type is null.

  • Warning: Useless upcast

    new {
    Title = "fail case: argument value not provided",
    Args = new string[] {
    "test.sarif",
    "--output",
    "updated.sarif",
    "--remove",
    "--sort-results",
    "--force"
    },
    Valid = false,
    ExpectedOptions = (RewriteOptions)null,
    },

    Because it instantiates an anonymous object, have to explicitly cast null to specific type for an anonymous type

@michaelcfanning michaelcfanning enabled auto-merge (squash) March 2, 2022 23:25
@michaelcfanning michaelcfanning merged commit 839537f into main Mar 2, 2022
@michaelcfanning michaelcfanning deleted the users/yongyan-gh/sortingvisitor branch March 2, 2022 23:34
@lgtm-com
Copy link

lgtm-com bot commented Mar 2, 2022

This pull request introduces 1 alert when merging f18e005 into bedc46e - view on LGTM.com

new alerts:

  • 1 for Dereferenced variable may be null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants