Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating SARIF2004 #1995

Merged
merged 12 commits into from
Jul 20, 2020
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions docs/Producing effective SARIF.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,13 +393,25 @@ Similarly, most 'result' objects contain at least one 'artifactLocation' object.

#### Messages

##### `AvoidDuplicativeAnalysisTarget`: warning
Copy link
Member

@michaelcfanning michaelcfanning Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have code that auto-emits this content, rather than maintaining it in three places (resource strings, as comments in rule source, and here). Action item for later. #Pending


The 'analysisTarget' property '{1}' at '{0}' is unnecessary because it is the same as the result location. Remove the 'analysisTarget' property.
Copy link
Member

@michaelcfanning michaelcfanning Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary [](start = 48, length = 11)

Please add something like: 'The 'analysisTarget' property is used to distinguish cases when a result fires in a file (such as an included header) that is different than the file that was scanned (such as a .cpp file that included the header). This C++ example example is the canonical scenario that led to this design pattern.

Also, though it is implied, the reason we recommend removing it is to optimize file size.

The 'analysisTarget' property '{1}' at '{0}' can be removed because it is the same as the result location. This unnecessarily increases log file size. 'The 'analysisTarget' property is used to distinguish cases when a result fires in a file (such as an included header) that is different than the file that was scanned (such as a .cpp file that included the header).
#Closed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddynaka, +1 on Michael's rephrasing (the very last paragraph in his comment above).


In reply to: 455904829 [](ancestors = 455904829)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... except that instead of "when a result fires" I suggest "when a tool detects a result".


In reply to: 455907085 [](ancestors = 455907085,455904829)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't intend to take over reauthoring the words, just point you in the right direction. :) feel free to send Eddy updates after it goes through your wordsmithing process. Every string needs a pass in my estimation. I need the team to send me the logs you generated in testing so that I can see what your new output looks like in bulk. If we aren't yet doing that testing, it needs to be included in the formalized process.


In reply to: 455907934 [](ancestors = 455907934,455907085,455904829)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just updated this as well!


In reply to: 455913218 [](ancestors = 455913218,455907934,455907085,455904829)


#### `AvoidDuplicativeResultRuleInformation`: warning

The result at '{0}' uses the 'rule' property to specify the violated rule, so it is not necessary also to specify 'ruleId' or 'ruleIndex'. Remove the 'ruleId' and 'ruleIndex' properties.
Copy link
Member

@michaelcfanning michaelcfanning Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a tendency to introduce descriptive prefixes in rules that I find distracting. The '{0}' value tends to be sufficiently descriptive to illuminate what's under discussion. This is most easily seen when actually reviewing output of the tool (which I encourage you to do), particularly real world quantities of output, not just a small # of test messages. In this second example. we see that we're trying to evolve to a common output format string for duplicative information (and we should converge on a standard).

'{0}' uses the 'rule' property to specify the violated rule, so it is not necessary also to specify 'ruleId' or 'ruleIndex'. This unnecessarily increases log file size. Remove the 'ruleId' and 'ruleIndex' properties. #Closed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to this but I ask you to consider the potential value of the descriptive prefixes. The problem is that the JSON paths can be very long, and without the prefix, the first thing that meets your eye is this long string. Consider an example where it's really long:

'/runs/0/results/0/locations/0/physicalLocation/artifactLocation/uri' contains an invalid URI 'fi:le:///C:/dev/'

I think this provides the reader with an easier entry to the message:

The URI 'fi:le:///C:/dev/' at '/runs/0/results/0/locations/0/physicalLocation/artifactLocation/uri' is invalid.

... because it says "I'm about to tell you something about a URI". Without the prefix, you have to scan to the end of the path string to learn that it's about a URI.


In reply to: 455907068 [](ancestors = 455907068)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example doesn't correspond to my feedback, which notes that describing the SARIF JSON type is generally not required because the type information tends to be encoded in the JSON path. The term 'URI' isn't typically encoded in the URI and so it is useful to identify the string literal as one.


In reply to: 455936290 [](ancestors = 455936290,455907068)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand the distinction.


In reply to: 455941928 [](ancestors = 455941928,455936290,455907068)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated


In reply to: 455982124 [](ancestors = 455982124,455941928,455936290,455907068)


##### `EliminateLocationOnlyArtifacts`: warning

{0}: The 'artifacts' array contains no information beyond the locations of the artifacts. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'artifacts' array might be used to record the full set of artifacts that were analyzed. In such a scenario, the 'artifacts' array should be retained even if it contains only location information.
The 'artifacts' array at '{0}' contains no information beyond the locations of the artifacts. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'artifacts' array might be used to record the full set of artifacts that were analyzed. In such a scenario, the 'artifacts' array should be retained even if it contains only location information.

##### `EliminateIdOnlyRules`: warning

{0}: The 'rules' array contains no information beyond the ids of the rules. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'rules' array might be used to record the full set of rules that were evaluated. In such a scenario, the 'rules' array should be retained even if it contains only id information.
The 'rules' array at '{0}' contains no information beyond the ids of the rules. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'rules' array might be used to record the full set of rules that were evaluated. In such a scenario, the 'rules' array should be retained even if it contains only id information.
Copy link
Member

@michaelcfanning michaelcfanning Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'rules' array [](start = 0, length = 17)

I'd like to see actual instances of this rule output before approving. Can you provide some from your testing? Note that you have some language here 'Removing this array might reduce log file size' that should converge on a standard message. Also, why do we have the 'might'? This analysis s/be very clear whether our suggestion can be taken without implying information loss. If not, let's keep working on the analysis. #Resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just sent you the output from running the validator on a SARIF file produced from scanning a significant repo.


In reply to: 455908031 [](ancestors = 455908031)


#### `PreferRuleId`: warning

The result at '{0}' uses the 'rule' property to specify the violated rule, but this is not necessary because the rule is defined by 'tool.driver'. Use the 'ruleId' and 'ruleIndex' instead, because they are shorter and just as clear.

---

Expand Down
24 changes: 24 additions & 0 deletions src/Sarif.Multitool/Extensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
using System;

namespace Microsoft.CodeAnalysis.Sarif.Multitool
{
public static class Extensions
{
public static bool RefersToDriver(this ToolComponentReference toolComponent, string driverGuid)
{
if (toolComponent.Index == -1)
{
if (toolComponent.Guid == null)
{
return true;
}
else
{
return toolComponent.Guid.Equals(driverGuid, StringComparison.OrdinalIgnoreCase);
}
}

return false;
}
}
}
31 changes: 29 additions & 2 deletions src/Sarif.Multitool/Rules/RuleResources.Designer.cs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 18 additions & 3 deletions src/Sarif.Multitool/Rules/RuleResources.resx
Original file line number Diff line number Diff line change
Expand Up @@ -272,10 +272,16 @@ Many tools follow a conventional format for the 'reportingDescriptor.id' propert

In several parts of a SARIF log file, a subset of information about an object appears in one place, and the full information describing all such objects appears in an array elsewhere in the log file. For example, each 'result' object has a 'ruleId' property that identifies the rule that was violated. Elsewhere in the log file, the array 'run.tool.driver.rules' contains additional information about the rules. But if the elements of the 'rules' array contained no information about the rules beyond their ids, then there might be no reason to include the 'rules' array at all, and the log file could be made smaller simply by omitting it. In some scenarios (for example, when assessing compliance with policy), the 'rules' array might be used to record the full set of rules that were evaluated. In such a scenario, the 'rules' array should be retained even if it contains only id information.

Similarly, most 'result' objects contain at least one 'artifactLocation' object. Elsewhere in the log file, the array 'run.artifacts' contains additional information about the artifacts that were analyzed. But if the elements of the 'artifacts' array contained not information about the artifacts beyond their locations, then there might be no reason to include the 'artifacts' array at all, and again the log file could be made smaller by omitting it. In some scenarios (for example, when assessing compliance with policy), the 'artifacts' array might be used to record the full set of artifacts that were analyzed. In such a scenario, the 'artifacts' array should be retained even if it contains only location information.</value>
Similarly, most 'result' objects contain at least one 'artifactLocation' object. Elsewhere in the log file, the array 'run.artifacts' contains additional information about the artifacts that were analyzed. But if the elements of the 'artifacts' array contained not information about the artifacts beyond their locations, then there might be no reason to include the 'artifacts' array at all, and again the log file could be made smaller by omitting it. In some scenarios (for example, when assessing compliance with policy), the 'artifacts' array might be used to record the full set of artifacts that were analyzed. In such a scenario, the 'artifacts' array should be retained even if it contains only location information.

In addition to the avoiding unnecessary arrays, there are other ways to optimize the size of SARIF log files.

Prefer the result object properties 'ruleId' and 'ruleIndex' to the nested object-valued property 'result.rule', unless the rule comes from a tool component other than the driver (in which case only 'result.rule' can accurately point to the metadata for the rule). The 'ruleId' and 'ruleIndex' properties are shorter and just as clear.

Do not specify the result object's 'analysisTarget' property unless it differs from the result location. The canonical scenario for using 'result.analysisTarget' is a C/C++ language analyzer that is instructed to analyze example.c, and detects a result in the included file example.h. In this case, 'analysisTarget' is example.c, and the result location is in example.h.</value>
</data>
<data name="SARIF2004_OptimizeFileSize_Warning_EliminateLocationOnlyArtifacts_Text" xml:space="preserve">
<value>{0}: The 'artifacts' array contains no information beyond the locations of the artifacts. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'artifacts' array might be used to record the full set of artifacts that were analyzed. In such a scenario, the 'artifacts' array should be retained even if it contains only location information.</value>
<value>The 'artifacts' array at '{0}' contains no information beyond the locations of the artifacts. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'artifacts' array might be used to record the full set of artifacts that were analyzed. In such a scenario, the 'artifacts' array should be retained even if it contains only location information.</value>
</data>
<data name="SARIF2002_ProvideMessageArguments_FullDescription_Text" xml:space="preserve">
<value>In result messages, use the 'message.id' and 'message.arguments' properties rather than 'message.text'. This has several advantages. If 'text' is lengthy, using 'id' and 'arguments' makes the SARIF file smaller. If the rule metadata is stored externally to the SARIF log file, the message text can be improved (for example, by adding more text, clarifying the phrasing, or fixing typos), and the result messages will pick up the improvements the next time it is displayed. Finally, SARIF supports localizing messages into different languages, which is possible if the SARIF file contains 'message.id' and 'message.arguments', but not if it contains 'message.text' directly.</value>
Expand All @@ -290,7 +296,7 @@ Similarly, most 'result' objects contain at least one 'artifactLocation' object.
<value>{0}: This run does not provide 'versionControlProvenance'. As a result, it is not possible to determine which version of code was analyzed, nor to map relative paths to their locations within the repository.</value>
</data>
<data name="SARIF2004_OptimizeFileSize_Warning_EliminateIdOnlyRules_Text" xml:space="preserve">
<value>{0}: The 'rules' array contains no information beyond the ids of the rules. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'rules' array might be used to record the full set of rules that were evaluated. In such a scenario, the 'rules' array should be retained even if it contains only id information.</value>
<value>The 'rules' array at '{0}' contains no information beyond the ids of the rules. Removing this array might reduce the log file size without losing information. In some scenarios (for example, when assessing compliance with policy), the 'rules' array might be used to record the full set of rules that were evaluated. In such a scenario, the 'rules' array should be retained even if it contains only id information.</value>
</data>
<data name="SARIF2006_UrisShouldBeReachable_FullDescription_Text" xml:space="preserve">
<value>URIs that refer to locations such as rule help pages and result-related work items should be reachable via an HTTP GET request.</value>
Expand Down Expand Up @@ -360,4 +366,13 @@ This is part of a set of authoring practices that make your rule messages more r
<data name="SARIF2007_ExpressPathsRelativeToRepoRoot_Warning_ExpressResultLocationsRelativeToMappedTo_Text" xml:space="preserve">
<value>{0}: This result location does not provide any of the 'uriBaseId' values that specify repository locations: '{1}'. As a result, it will not be possible to determine the location of the file containing this result relative to the root of the repository that contains it.</value>
</data>
<data name="SARIF2004_OptimizeFileSize_Warning_AvoidDuplicativeAnalysisTarget_Text" xml:space="preserve">
<value>The 'analysisTarget' property '{1}' at '{0}' is unnecessary because it is the same as the result location. Remove the 'analysisTarget' property.</value>
</data>
<data name="SARIF2004_OptimizeFileSize_Warning_AvoidDuplicativeResultRuleInformation_Text" xml:space="preserve">
<value>The result at '{0}' uses the 'rule' property to specify the violated rule, so it is not necessary also to specify 'ruleId' or 'ruleIndex'. Remove the 'ruleId' and 'ruleIndex' properties.</value>
</data>
<data name="SARIF2004_OptimizeFileSize_Warning_PreferRuleId_Text" xml:space="preserve">
<value>The result at '{0}' uses the 'rule' property to specify the violated rule, but this is not necessary because the rule is defined by 'tool.driver'. Use the 'ruleId' and 'ruleIndex' instead, because they are shorter and just as clear.</value>
</data>
</root>
Loading