-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable passing the codescanning config file to the CLI #1105
Conversation
fe71459
to
2fd87c6
Compare
@@ -0,0 +1,59 @@ | |||
name: Check Code-Scanning Config |
Check failure
Code scanning / CodeQL
Inconsistent action input
f1cd9f6
to
1c847ad
Compare
This reverts commit 43d0664.
dbcf6d0
to
62097bc
Compare
This commit adds the packs and queries from the actions input to the config file used by the CodeQL CLI. When the `+` is used, the actions input value is combined with the config value and when it is not used, the input value overrides the config value. This commit also adds a bunch of integration tests for this feature. In order to avoid adding too many new jobs, all of the tests are run sequentially in a single job (matrixed across relevant operating systems and OSes).
62097bc
to
6fabde2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't finished reviewing this yet, but partial comments below - one is significant enough that I'll want to do a full re-review afterwards anyways hence the partial review.
src/codeql.ts
Outdated
@@ -225,6 +226,7 @@ const CODEQL_VERSION_GROUP_RULES = "2.5.5"; | |||
const CODEQL_VERSION_SARIF_GROUP = "2.5.3"; | |||
export const CODEQL_VERSION_COUNTS_LINES = "2.6.2"; | |||
const CODEQL_VERSION_CUSTOM_QUERY_HELP = "2.7.1"; | |||
export const CODEQL_VERSION_CONFIG_FILES = "2.8.2"; // Versions before 2.8.2 weren't tolerant to unknown properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably wants bumping to 2.10.1, so you only get CLI versions with https://github.com/github/semmle-code/pull/42877 in them.
@@ -933,7 +941,9 @@ async function getCodeQLForCmd( | |||
if (extraSearchPath !== undefined) { | |||
codeqlArgs.push("--additional-packs", extraSearchPath); | |||
} | |||
codeqlArgs.push(querySuitePath); | |||
if (!(await util.useCodeScanningConfigInCli(this))) { | |||
codeqlArgs.push(querySuitePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is quite broken (and probably already was in my version). We call databaseRunQueries
potentially multiple times since runQueryGroup
gets called once for each different group of queries. So, if we just do this we'll end up running all the queries every time. I think we need a new code path from the top-level runQueries
that just calls databaseRunQueries
once without any arguments if we're using the CLI-side config file parsing, and skips all the calls to runQueryGroup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, the action will run the custom, the builtin, and the packaging queries in separate calls. The logic of the CLI invocation avoids pushing the query suite path onto the args. This has the effect of passing no query specs to the CLI invocation, which means that the config-queries.qls
suite inside the database temp directory is used instead.
The config-queries.qls
suite contains all the queries to run for a given language. So, when running with the config enabled, we only want to run this suite a single time during the analysis.
Assuming I am right here, I can make the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something else to think about. We are sending status reports for various timings of the package groups. Specifically, we are sending time to evaluate builtin and custom packs also timing for interpretation of these two groups.
With this change, we are combining the groups, and we can't just add new fields to our status reports without a backend change (I think!), so I can stuff the timing into the builtin
section. However, the value when running using the config will not be comparable to the value when running in the old way.
So, we will need to expand the status report with the new fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'll stuff it into builtin
, but this isn't correct. I will do the extra work later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: it's only query evaluation that needs an extra entry in the status report, not interpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming I am right here, I can make the fix.
Right, I think your understanding is correct. Regarding what to do with the status reports, I agree that we will no longer be able to distinguish the custom queries from the built in ones, so we'll need a new field that represents the time spent on the whole set of queries being run.
I'll note that while it makes our telemetry a bit less good, there's big potential performance gains from doing this: giving everything to the evaluator at once means we won't have to rely on the disk cache between one invocation and the next so should significantly reduce our IO usage (and, IO usage is probably close to 100% of the time we spend on custom queries, since we'll already have evaluated all the standard library for our built-in queries, and custom queries are unlikely to have particularly complex logic on top of that). In a multi-threaded setting, it also means we can work on the custom queries at the same time as the built-in ones. So, we're trading some telemetry for better performance, which at least from our users point of view is a win.
tools: ${{ steps.prepare-test.outputs.tools-url }} | ||
|
||
- name: Packs from input | ||
if: success() || failure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be always()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always includes canceled()
, which we don't need here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I wasn't aware of that distinction. Thanks for clarifying!
CHANGELOG.md
Outdated
@@ -42,6 +42,7 @@ No user facing changes. | |||
## 2.1.7 - 05 Apr 2022 | |||
|
|||
- A bug where additional queries specified in the workflow file would sometimes not be respected has been fixed. [#1018](https://github.com/github/codeql-action/pull/1018) | |||
No user facing changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bad merge.
When the codescanning config is being used by the CLI, there is a single query suite that is generated that contains all queries to be run by the analysis. This is different from the traditional way, where there are potentially three query suites: builtin, custom, and packs. We need to ensure that when the codescanning config is being used, only a single call to run queries is used, and this call uses the single generated query suite. Also, this commit changes the cutoff version for codescanning config to 2.10.1. Earlier versions work, but there were some bugs that are only fixed in 2.10.1 and later.
Hmmm...the job is failing now because latest-nightly is still 2.10.0 and the feature is not being used. I think I need to hold off on merging until 2.10.1 is available as the latest nightly. |
6efc13c
to
01d16b1
Compare
|
This makes some syntax in tests somewhat simpler.
src/codeql.ts
Outdated
* @param config The configuration to use. | ||
* @returns the path to the generated user configuration file. | ||
*/ | ||
async function generateCodescanningConfig(codeql: CodeQL, config: Config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a return type? I guess Promise<string | undefined>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typescript determines the return type implicitly, but I can add it to make it easier to read.
src/codeql.ts
Outdated
} | ||
const configLocation = path.resolve(config.tempDir, "user-config.yaml"); | ||
// make a copy so we can modify it | ||
const augmentedConfig = JSON.parse(JSON.stringify(config.originalUserInput)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement a clone
method, or something similar to copy this? It seems clunky to turn it into a string
and parse it again just to make a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Generic clone functions in js are really tricky to implement since they need to keep track of the prototype chain of values. We don't need that here. Since we are only copying raw objects, this is the simplest thing to do. I can extract it into a separate function.
src/config-utils.test.ts
Outdated
@@ -1621,6 +1623,7 @@ function parseInputAndConfigMacro( | |||
configUtils.parsePacks( | |||
packsFromConfig, | |||
packsFromInput, | |||
!!packsFromInput?.trim().startsWith("+"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea of this double negation that it handles the undefined
case? Could we add a comment to explain that to future readers of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's what's happening. It coerces the result into a boolean. It's a standard js idiom, but I understand that it is weird coming from statically typed languages.
src/util.ts
Outdated
return ( | ||
(process.env[EnvVar.CODEQL_PASS_CONFIG_TO_CLI] === "true" && | ||
(await codeQlVersionAbove(codeql, CODEQL_VERSION_CONFIG_FILES))) || | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the || false
at the end doing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...I don't think its necessary.
0fcf0fc
to
fa2bc21
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a pretty scary change, but it does certainly look much better with all the extra tests. Thanks for adding those in! And I guess it's behind an env variable, so we won't break anything - let's merge it :)
Thanks for the review. We are one step closer to removing this technical debt. |
This PR un-reverts #1018
Additionally, it adds the fix for adding queries and packs from the actions input into the codescanning config file before it is sent to the CLI.
When the
+
is used, the actions input value is combined with theconfig value and when it is not used, the input value overrides the
config value.
This commit also adds a bunch of integration tests for this feature.
In order to avoid adding too many new jobs, all of the tests are
run sequentially in a single job (matrixed across relevant operating
systems and OSes).
Recommended to look at the commits individually. The first commit is the un-revert. The second commit is the new work.
This change is currently hidden behind an environment variable. I will probably convert this into a feature flag before getting external users to try this.
Merge / deployment checklist