Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable passing the codescanning config file to the CLI #1105

Merged
merged 11 commits into from
Aug 12, 2022

Conversation

aeisenberg
Copy link
Contributor

@aeisenberg aeisenberg commented Jun 19, 2022

This PR un-reverts #1018

Additionally, it adds the fix for adding queries and packs from the actions input into the codescanning config file before it is sent to the CLI.

When the + is used, the actions input value is combined with the
config value and when it is not used, the input value overrides the
config value.

This commit also adds a bunch of integration tests for this feature.
In order to avoid adding too many new jobs, all of the tests are
run sequentially in a single job (matrixed across relevant operating
systems and OSes).

Recommended to look at the commits individually. The first commit is the un-revert. The second commit is the new work.

This change is currently hidden behind an environment variable. I will probably convert this into a feature flag before getting external users to try this.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@aeisenberg aeisenberg requested a review from a team as a code owner June 19, 2022 23:45
@aeisenberg aeisenberg removed the request for review from a team June 19, 2022 23:46
@aeisenberg aeisenberg marked this pull request as draft June 19, 2022 23:46
@aeisenberg aeisenberg force-pushed the aeisenberg/fix-config-files branch 16 times, most recently from fe71459 to 2fd87c6 Compare June 24, 2022 18:10
@@ -0,0 +1,59 @@
name: Check Code-Scanning Config

Check failure

Code scanning / CodeQL

Inconsistent action input

Action [.github/check-codescanning-config](1) and action [init](2) both declare input languages, however their definitions are not identical. This may be confusing to users. Action [.github/check-codescanning-config](1) and action [init](2) both declare input packs, however their definitions are not identical. This may be confusing to users. Action [.github/check-codescanning-config](1) and action [init](2) both declare input queries, however their definitions are not identical. This may be confusing to users. Action [.github/check-codescanning-config](1) and action [init](2) both declare input tools, however their definitions are not identical. This may be confusing to users.
@aeisenberg aeisenberg force-pushed the aeisenberg/fix-config-files branch 10 times, most recently from f1cd9f6 to 1c847ad Compare June 24, 2022 23:07
@edoardopirovano edoardopirovano self-assigned this Jun 28, 2022
@aeisenberg aeisenberg force-pushed the aeisenberg/fix-config-files branch from dbcf6d0 to 62097bc Compare June 28, 2022 20:05
This commit adds the packs and queries from the actions input to the
config file used by the CodeQL CLI.

When the `+` is used, the actions input value is combined with the
config value and when it is not used, the input value overrides the
config value.

This commit also adds a bunch of integration tests for this feature.
In order to avoid adding too many new jobs, all of the tests are
run sequentially in a single job (matrixed across relevant operating
systems and OSes).
@aeisenberg aeisenberg force-pushed the aeisenberg/fix-config-files branch from 62097bc to 6fabde2 Compare June 28, 2022 21:08
Copy link
Contributor

@edoardopirovano edoardopirovano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished reviewing this yet, but partial comments below - one is significant enough that I'll want to do a full re-review afterwards anyways hence the partial review.

src/codeql.ts Outdated
@@ -225,6 +226,7 @@ const CODEQL_VERSION_GROUP_RULES = "2.5.5";
const CODEQL_VERSION_SARIF_GROUP = "2.5.3";
export const CODEQL_VERSION_COUNTS_LINES = "2.6.2";
const CODEQL_VERSION_CUSTOM_QUERY_HELP = "2.7.1";
export const CODEQL_VERSION_CONFIG_FILES = "2.8.2"; // Versions before 2.8.2 weren't tolerant to unknown properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably wants bumping to 2.10.1, so you only get CLI versions with https://github.com/github/semmle-code/pull/42877 in them.

@@ -933,7 +941,9 @@ async function getCodeQLForCmd(
if (extraSearchPath !== undefined) {
codeqlArgs.push("--additional-packs", extraSearchPath);
}
codeqlArgs.push(querySuitePath);
if (!(await util.useCodeScanningConfigInCli(this))) {
codeqlArgs.push(querySuitePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is quite broken (and probably already was in my version). We call databaseRunQueries potentially multiple times since runQueryGroup gets called once for each different group of queries. So, if we just do this we'll end up running all the queries every time. I think we need a new code path from the top-level runQueries that just calls databaseRunQueries once without any arguments if we're using the CLI-side config file parsing, and skips all the calls to runQueryGroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, the action will run the custom, the builtin, and the packaging queries in separate calls. The logic of the CLI invocation avoids pushing the query suite path onto the args. This has the effect of passing no query specs to the CLI invocation, which means that the config-queries.qls suite inside the database temp directory is used instead.

The config-queries.qls suite contains all the queries to run for a given language. So, when running with the config enabled, we only want to run this suite a single time during the analysis.

Assuming I am right here, I can make the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else to think about. We are sending status reports for various timings of the package groups. Specifically, we are sending time to evaluate builtin and custom packs also timing for interpretation of these two groups.

With this change, we are combining the groups, and we can't just add new fields to our status reports without a backend change (I think!), so I can stuff the timing into the builtin section. However, the value when running using the config will not be comparable to the value when running in the old way.

So, we will need to expand the status report with the new fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I'll stuff it into builtin, but this isn't correct. I will do the extra work later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: it's only query evaluation that needs an extra entry in the status report, not interpretation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I am right here, I can make the fix.

Right, I think your understanding is correct. Regarding what to do with the status reports, I agree that we will no longer be able to distinguish the custom queries from the built in ones, so we'll need a new field that represents the time spent on the whole set of queries being run.

I'll note that while it makes our telemetry a bit less good, there's big potential performance gains from doing this: giving everything to the evaluator at once means we won't have to rely on the disk cache between one invocation and the next so should significantly reduce our IO usage (and, IO usage is probably close to 100% of the time we spend on custom queries, since we'll already have evaluated all the standard library for our built-in queries, and custom queries are unlikely to have particularly complex logic on top of that). In a multi-threaded setting, it also means we can work on the custom queries at the same time as the built-in ones. So, we're trading some telemetry for better performance, which at least from our users point of view is a win.

tools: ${{ steps.prepare-test.outputs.tools-url }}

- name: Packs from input
if: success() || failure()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be always()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always includes canceled(), which we don't need here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wasn't aware of that distinction. Thanks for clarifying!

CHANGELOG.md Outdated
@@ -42,6 +42,7 @@ No user facing changes.
## 2.1.7 - 05 Apr 2022

- A bug where additional queries specified in the workflow file would sometimes not be respected has been fixed. [#1018](https://github.com/github/codeql-action/pull/1018)
No user facing changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a bad merge.

When the codescanning config is being used by the CLI, there is a
single query suite that is generated that contains all queries to be
run by the analysis. This is different from the traditional way, where
there are potentially three query suites: builtin, custom, and packs.

We need to ensure that when the codescanning config is being used,
only a single call to run queries is used, and this call uses the
single generated query suite.

Also, this commit changes the cutoff version for codescanning config to
2.10.1. Earlier versions work, but there were some bugs that are only
fixed in 2.10.1 and later.
@aeisenberg
Copy link
Contributor Author

Hmmm...the job is failing now because latest-nightly is still 2.10.0 and the feature is not being used. I think I need to hold off on merging until 2.10.1 is available as the latest nightly.

@aeisenberg aeisenberg force-pushed the aeisenberg/fix-config-files branch from 6efc13c to 01d16b1 Compare July 13, 2022 21:06
@aeisenberg
Copy link
Contributor Author

Code-Scanning config CLI tests / Code Scanning Configuration tests (ubuntu-latest, cached) failing because "cached" is still 2.10.0. Need to wait for 2.10.1.

src/codeql.ts Outdated
* @param config The configuration to use.
* @returns the path to the generated user configuration file.
*/
async function generateCodescanningConfig(codeql: CodeQL, config: Config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a return type? I guess Promise<string | undefined>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typescript determines the return type implicitly, but I can add it to make it easier to read.

src/codeql.ts Outdated
}
const configLocation = path.resolve(config.tempDir, "user-config.yaml");
// make a copy so we can modify it
const augmentedConfig = JSON.parse(JSON.stringify(config.originalUserInput));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we implement a clone method, or something similar to copy this? It seems clunky to turn it into a string and parse it again just to make a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Generic clone functions in js are really tricky to implement since they need to keep track of the prototype chain of values. We don't need that here. Since we are only copying raw objects, this is the simplest thing to do. I can extract it into a separate function.

@@ -1621,6 +1623,7 @@ function parseInputAndConfigMacro(
configUtils.parsePacks(
packsFromConfig,
packsFromInput,
!!packsFromInput?.trim().startsWith("+"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea of this double negation that it handles the undefined case? Could we add a comment to explain that to future readers of the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's what's happening. It coerces the result into a boolean. It's a standard js idiom, but I understand that it is weird coming from statically typed languages.

src/util.ts Outdated
return (
(process.env[EnvVar.CODEQL_PASS_CONFIG_TO_CLI] === "true" &&
(await codeQlVersionAbove(codeql, CODEQL_VERSION_CONFIG_FILES))) ||
false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the || false at the end doing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...I don't think its necessary.

@aeisenberg aeisenberg force-pushed the aeisenberg/fix-config-files branch from 0fcf0fc to fa2bc21 Compare August 11, 2022 21:57
Copy link
Contributor

@edoardopirovano edoardopirovano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a pretty scary change, but it does certainly look much better with all the extra tests. Thanks for adding those in! And I guess it's behind an env variable, so we won't break anything - let's merge it :)

@aeisenberg aeisenberg merged commit 680d08e into main Aug 12, 2022
@aeisenberg aeisenberg deleted the aeisenberg/fix-config-files branch August 12, 2022 18:15
@aeisenberg
Copy link
Contributor Author

Thanks for the review. We are one step closer to removing this technical debt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants