-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SARIF has per-line rolling (partial) hash support #2605
Conversation
…ft/sarif-sdk into users/suvam/rolling-hash
(await) discussion with codeql dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
(await) discussion with codeql dev
…ft/sarif-sdk into users/suvam/rolling-hash
using System; | ||
using System.Collections.Generic; | ||
|
||
namespace Microsoft.CodeAnalysis.Sarif.Numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you delete this and just use C# long
or ulong
? The code does some funny things that look like it was ported from JavaScript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KalleOlaviNiemitalo we did this port of the Long
library to align the semantics with that of a Javascript implementation. The C# long
was causing a behavior drift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This being a port of an Apache-2.0 licensed library, do you need to add third-party notices?
Do the rolling hashes need to be compatible with some other implementation, is the algorithm documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RollingHash code seems to be a port of https://github.com/github/codeql-action/blob/v2.1.39/src/fingerprints.ts.
What is the intended use of these hashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. CodeQL has shipped a very nice rolling hash that they've used historically for result matching. GHAS understands this fingerprint natively. Our goal here is to broaden the availability of this hash mechanism to other scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelcfanning Do we need to add any third-party notices as @KalleOlaviNiemitalo points out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see section 4 Redistribution of the apache license for this module. Let's get this in and immediately follow up with this.
src/Sarif/Numeric/Long.cs
Outdated
private static readonly Dictionary<int, Long> INT_CACHE = new Dictionary<int, Long>(); | ||
private static readonly Dictionary<uint, Long> UINT_CACHE = new Dictionary<uint, Long>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These dictionaries are not thread-safe. If they are mutated from multiple threads without locking, it can cause the dictionary to be corrupted such that every thread that subsequently tries to use it falls in an infinite loop.
At minimum, you need to use locks or switch to ConcurentDictionary or just a fixed-size array. But I'd really prefer it if Long were removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KalleOlaviNiemitalo Good catch. Switched to using ConcurrentDictionary
. Unfortunately, as pointed out in a different comment, we need to use this port of Long
to align the semantics with the CodeQL implementation. So the better option is to make this class thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KalleOlaviNiemitalo @michaelcfanning Could you kindly sign off on this change?
} | ||
} | ||
} | ||
catch (IOException) { } | ||
catch (UnauthorizedAccessException) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds a new hash utility to compute per-line rolling (partial) hashes for a given file. The PR also provides a port of a numeric library for representing and operating on 64-bit two's complement integers.