Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing originalUriBaseIds property from the SARIF specification #35043

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

marcandre-larochelle-bell
Copy link

@marcandre-larochelle-bell marcandre-larochelle-bell commented Oct 22, 2024

The SARIF format specified in the documentation is missing a field allowing resolution of the references for SARIF consumers (which is invalid):

SARIF Producer file scheme instructions (see Step 6 of when the URI is split)

SARIF producers SHALL create "file" scheme URIs by means of the following procedure or any
procedure with the same result:

1. In the case of a direct producer, preserve the file system’s casing, even if the file system is caseinsensitive. In the case of a converter (which might not know the file system’s casing), preserve
the casing specified in the analysis tool’s native output file.

2. Remove "." path segments.

3. Remove empty path segments.

4. If the path contains ".." path segments, then in the case of a direct producer, resolve the path
to a canonical absolute path, using an appropriate algorithm for the operating system on which
the tool ran.
   NOTE 1: This is necessary because, for example, the path /d1/../f naively converted
   to a URI is file:///d1/../f, which resolves to file:///f according to the URI
   standard [RFC3986]. But if /d1 is a symbolic link to the directory d2/d3, then the correct
   URI is file:///d2/f.
   
   NOTE 2: ".." path segments are dangerous because the semantics of the file system on
   which the SARIF log file was produced might not match the semantics of the file system
   on which it is consumed. For example, the presence of a symbolic link in the path might
   redirect the consumer to an unpredictable location.

5. Create a URI from the resulting path.

6. Optionally, divide the resulting URI into a base URI and a relative URI (preserving case in both
parts), and create an entry for the base URI in theRun.originalUriBaseIds (§3.14.14).
   NOTE 3: URI and path manipulation are complex topics. Many operating systems,
languages, and frameworks provide methods to perform these operations, which is
preferable to having every SARIF producer reimplement them. For example, in C#, the
operation can be performed as follows

And related to the originalUriBaseIds the property:

A run object MAY contain a property named originalUriBaseIds whose value is an object (§3.6)
each of whose property names designates a URI base id (§3.4.4) and each of whose property values is
an artifactLocation object (§3.4) that specifies (in the manner described below) the absolute URI
[RFC3986] of that URI base id on the machine where the SARIF producer ran.

If the artifactLocation object’s uri property (§3.4.3) is a relative reference, its uriBaseId property
(§3.4.4) SHALL be present. Otherwise (that is, if uri is an absolute URI, or if it is absent), uriBaseId
SHALL be absent.

Why:

As per the SARIF specification:

This property allows SARIF consumers to resolve any relative references which appear in any
artifactLocation objects elsewhere in the run, as long as the consumer runs either on the same
machine as the producer, or on a machine with an identical file system layout. This is useful for individual
developers who wish to run analysis tools and examine the results in a viewer. It is also useful for teams
which share a convention for their file system layout.

A SARIF consumer SHALL use the following procedure to resolve a URI base id from the information in
originalUriBaseIds:

NOTE 3: This procedure is part of an overall URI base id resolution procedure described
in §3.4.4.
NOTE 4: In this procedure, we refer to the resolved URI value by the variable name
resolvedUri.

1. Set resolvedUri to an empty string.

2. Fetch the artifactLocation object whose property name within originalUriBaseIds is
the value of uriBaseId. If there is no such property, the resolution procedure fails.

3. Prepend artifactLocation.uri to resolvedUri.

4. If artifactLocation.uri is an absolute URI, resolvedUri is the final resolved URI, and the
procedure succeeds.

Otherwise:

5. If uriBaseId is absent, the resolution procedure fails.
    NOTE 3: This would not occur in a valid SARIF file, but the file might not be valid.
  
6. If the value of uriBaseId has already been encountered during this resolution procedure (that
is, if there is a loop in the sequence of URI base ids), the resolution procedure fails.
    NOTE 4: This would not occur in a valid SARIF file, but the file might not be valid.
  
7. Otherwise (that is, if uriBaseId is present and its value has not previously been encountered
during this resolution), return to Step 2.

Multiple vendors are following GitHub's documentation here and omitting the originalUriBaseIds during their implementation, making it impossible to perform the above mentioned relative references resolution for other SARIF consumers. This is to correct this missing property.

What's being changed (if available, include any code snippets, screenshots, or gifs):

Added the property originalUriBaseIds with the example data from the SARIF specification.

Check off the following:

  • I have reviewed my changes in staging, available via the View deployment link in this PR's timeline (this link will be available after opening the PR).

    • For content changes, you will also see an automatically generated comment with links directly to pages you've modified. The comment won't appear if your PR only edits files in the data directory.
  • For content changes, I have completed the self-review checklist.

Fix invalid SARIF specification, missing originalUriBaseIds from the format
Copy link

welcome bot commented Oct 22, 2024

Thanks for opening this pull request! A GitHub docs team member should be by to give feedback soon. In the meantime, please check out the contributing guidelines.

@github-actions github-actions bot added the triage Do not begin working on this issue until triaged by the team label Oct 22, 2024
Copy link
Contributor

github-actions bot commented Oct 22, 2024

Automatically generated comment ℹ️

This comment is automatically generated and will be overwritten every time changes are committed to this branch.

The table contains an overview of files in the content directory that have been changed in this pull request. It's provided to make it easy to review your changes on the staging site. Please note that changes to the data directory will not show up in this table.


Content directory changes

You may find it useful to copy this table into the pull request summary. There you can edit it to share links to important articles or changes and to give a high-level overview of how the changes in your pull request support the overall goals of the pull request.

Source Preview Production What Changed
code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning.md fpt
ghec
ghes@ 3.14 3.13 3.12 3.11 3.10
fpt
ghec
ghes@ 3.14 3.13 3.12 3.11 3.10

fpt: Free, Pro, Team
ghec: GitHub Enterprise Cloud
ghes: GitHub Enterprise Server

@nguyenalex836 nguyenalex836 added content This issue or pull request belongs to the Docs Content team waiting for review Issue/PR is waiting for a writer's review code security Content related to code security and removed triage Do not begin working on this issue until triaged by the team labels Oct 22, 2024
@nguyenalex836
Copy link
Contributor

@marcandre-larochelle-bell Thanks so much for opening a PR! I'll get this triaged for review ✨

@felicitymay felicitymay added needs SME This proposal needs review from a subject matter expert and removed waiting for review Issue/PR is waiting for a writer's review labels Oct 23, 2024
Copy link
Contributor

Thanks for opening a pull request! We've triaged this issue for technical review by a subject matter expert 👀

@Kamran21202

This comment was marked as spam.

@marcandre-larochelle-bell
Copy link
Author

@nguyenalex836 I think a bot / fake account is trying to sneak-in some participation on this MR (check the empty comment above)

@nguyenalex836
Copy link
Contributor

@marcandre-larochelle-bell They've been blocked 💛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code security Content related to code security content This issue or pull request belongs to the Docs Content team needs SME This proposal needs review from a subject matter expert
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants