Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SARIF output format for report #4 #33

Merged
merged 13 commits into from
Feb 28, 2023

Conversation

Coruscant11
Copy link
Contributor

@Coruscant11 Coruscant11 commented Feb 25, 2023

Fixes #4

Hello 👋

Resources

https://github.com/microsoft/sarif-tutorials/

For this PR I used the crate serde-sarif.
Documentation available here : https://docs.rs/serde-sarif/latest/serde_sarif/
And I took inspiration from this code.

Here is my PR. Feel free to tell me where I may have done something wrong! It is the first time that I try to contribute to another project. 😄
I tested on several report and everything should work.
The sarif_format function in cmd_report.rs is quite long. Is it an issue? Should I dispatch in multiple functions or not?

Improvements

  • Add commits details : If in the future commits details will be added to blob match, I will add them. Linked to Report detailed Git provenance information for matches #16
  • Specify only used regex rules : One of the features of sarif is to specify used rules. In my case I simply added all default rules of noseyparker in the file. We could add only used rules in order to save space. I know that rules names are available but I didn't find a way to efficiently found their associated regex, in the report code scope.

Tools

Here is a tool allowing to try to read SARIF files : https://microsoft.github.io/sarif-web-component/
And there is also the VSCode extension Sarif Viewer : https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer

Examples

I tried in several repositories, and everything seems ok.
Here is an example for CPython :

image

SARIF Template example for NoseyParker

{
  "version": "2.1.0",
  "$schema": "https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "noseyparker",
          "semanticVersion": "0.11.0",
          "rules": [
            {
              "id": "RULE_NAME",
              "name": "RULE_NAME",
              "shortDescription": {
                "text": "REGEX_RULE"
              }
            }
            ...
          ]
        }
      },
      "results": [
        {
          "message": {
            "text": "Rule {} has detected a secret with {} matches.\nFirst blob id matched : {}"
          },
          "ruleId": "RULE_NAME",
          "level": "warning",
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "PATH_TO_FILE"
                },
                "region": {
                  "startLine": "STARTLINE",
                  "startColumn": "STARTCOLUMN",
                  "endLine": "ENDLINE",
                  "endColumn": "ENDCOLUMN",
                  "snippet": {
                    "text": "THE_SECRET"
                  }
                }
              }
            }
            ...
          ]
        }
        ...
      ]
    }
  ]
}

@Coruscant11
Copy link
Contributor Author

Coruscant11 commented Feb 25, 2023

https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning

It miss some property in order to be compatible with Github. I am going to add them.

@Coruscant11
Copy link
Contributor Author

Added. The only thing missing is partialFingerprints which depends on #16.

@bradlarsen
Copy link
Collaborator

Thank you @Coruscant11! I will be looking at this soon.

Specify only used regex rules : One of the features of sarif is to specify used rules. In my case I simply added all default rules of noseyparker in the file. We could add only used rules in order to save space. I know that rules names are available but I didn't find a way to efficiently found their associated regex, in the report code scope.

Yes, good observation. The current Nosey Parker implementation saves its findings in a sqlite database, but the schema it uses was chosen for expedience rather than completeness of information. It doesn't record anything but the rule name for findings at present. So the approach you've taken is very reasonable.

src/bin/noseyparker/cmd_report.rs Fixed Show resolved Hide resolved
src/bin/noseyparker/cmd_report.rs Fixed Show fixed Hide fixed
@bradlarsen
Copy link
Collaborator

The CI / Tests failure is from the tests needing to be retrained. I'll take care of it.

More detail:

cargo test will reproduce the failure locally. I've been using cargo insta for some of the testing as well. Specifically, there are some tests there than check that the command-line help has not changed. (Not that it's bad if it does, but it's something I want to know about when it happens, hence the test case.)

@bradlarsen bradlarsen self-requested a review February 27, 2023 23:55
bradlarsen
bradlarsen previously approved these changes Feb 28, 2023
Copy link
Collaborator

@bradlarsen bradlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Coruscant11. Thanks for this contribution! This will help make Nosey Parker simpler to integrate into various workflows by supporting SARIF format.

Your code overall looks good to me, and I'm going to merge it back shortly.

I made a few small changes, which you can see in the commit log if you're curious of details. The most notable ones:

  • I fixed the end column locations in the SARIF output. They were off by one character, due to a discrepancy in how Nosey Parker and the SARIF format represent ranges (closed intervals vs half-open intervals).
  • I added additional byte offset location to the SARIF output. I don't think it hurts to include it also.
  • I restructured the sarif_format function a bit to reduce nesting and simplify some error handling. (I don't really mind how big it is at present.)

Including it seems to confuse the VSCode SARIF Results Viewer at times.
There is probably something not quite right about the locations.
@bradlarsen
Copy link
Collaborator

I was just about to merge, but then noticed that VSCode's SARIF viewer extension was not working properly in some cases when Nosey Parker's SARIF output included byte-based ranges in addition to line+column-based ranges. I commented out that additional range information. Will revisit later.

@bradlarsen bradlarsen self-requested a review February 28, 2023 00:34
@bradlarsen bradlarsen merged commit 4045b9c into praetorian-inc:main Feb 28, 2023
@bradlarsen
Copy link
Collaborator

@Coruscant11 thanks again—Nice work!

@Coruscant11
Copy link
Contributor Author

@bradlarsen Thank you too for your changes! Very interesting to see all things that I forgot. Will try to keep that in mind as much as possible in the future 😄

@bradlarsen
Copy link
Collaborator

You didn't really forget anything—all good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SARIF output format for report
2 participants