Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move logs, SARIF, database bundle actions uploads to post: hooks #1159

Merged
merged 40 commits into from
Aug 11, 2022

Conversation

angelapwen
Copy link
Contributor

@angelapwen angelapwen commented Jul 29, 2022

Previously, even with debug mode on, if the init step failed we did not upload the appropriate Actions artifacts. This was because the artifacts were only uploaded in the analyze step.

This change:

  • moves the uploading of log files to a post: hook in the init step. Regardless of whether the entire workflow was successful or if any steps after init failed, we will upload whatever logfiles we have as an artifact.
  • moves the uploading of database bundles to a post: hook in the init step. If the database has not been finalized, instead of running the CLI database bundle command, the action directly zips up everything in the database directory.
  • moves the uploading of SARIF results (if they exist) to a post: hook in the analyze step.

So far the change has been manually tested so that:

  • when init fails, partial logs from init are uploaded as artifacts, and the partial database bundle is uploaded.
  • when analyze fails, logs from init and analyze are uploaded as artifacts; the partial database bundle is uploaded; and the SARIF upload succeeds if there is a SARIF file generated in the output folder of the action.
  • on workflow success, all file types are successfully uploaded.

Testing strategy:

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • [N/A] Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@angelapwen angelapwen requested a review from a team as a code owner July 29, 2022 09:41
@angelapwen
Copy link
Contributor Author

angelapwen commented Jul 29, 2022

I'm correctly hitting the case where the database isn't finalized now, and just need to implement the zipping of the partial database files. I see documentation for the CLI command but am still not exactly sure which files I'm looking to zip. I'll take a look at the CLI source code but would appreciate any pointers!

Also, I see I have a CI failure on debug artifact which looks legitimate, as that's the code I'm changing. It only fails on ubuntu, as for some reason the step to download the artifact doesn't begin on the Mac runners. It looks like this is because the step to download the artifacts is executed before the post: hooks that upload them, so I'll have to find out how to reorder these.

I see the PR check comes from

- uses: actions/download-artifact@v3
which executes directly after the analyze step. Not entirely sure how to make sure it executes only after the post: hooks as I guess the entire point is that the post: hooks happen after everything else. 🤔

@angelapwen
Copy link
Contributor Author

I think perhaps splitting the PR check file into 2 jobs, one for running everything before checking artifact downloads, and one for just checking artifact downloads with a needs dependency on the former, might force things into the right order. Similar to the code blocks in the docs at https://docs.github.com/en/actions/learn-github-actions/essential-features-of-github-actions. I'll try to do this in a separate PR and see if it's at least a no-op without these new post: hook changes.

src/actions-util.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
let suffix = "";
const matrix = getRequiredInput("matrix");
if (matrix !== undefined && matrix !== "null") {
for (const entry of Object.entries(JSON.parse(matrix)).sort())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if matrix is not valid json? How is the error handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm — I actually didn't write this (just moved it to another section of the codebase) but that's a good question. I had thought that it was forced to be JSON from where it was declared in inputs:

matrix:
default: ${{ toJson(matrix) }}

and

https://github.com/github/codeql-action/blob/main/analyze/action.yml#L67-L68

but it seems like that's the default input and the user could theoretically have passed non-valid JSON.

I am adding a catch block with an error message indicating that there was an error parsing the matrix input and the debug artifact will be uploaded without the matrix input in its name. It seems like the only thing this block of code does is add a suffix to the artifact name, for every value in the matrix input, so it should be able to still upload the artifact correctly (just without the appropriate suffix(es).

src/actions-util.ts Outdated Show resolved Hide resolved
src/analyze-action-cleanup.ts Outdated Show resolved Hide resolved
}
}

async function uploadLogsDebugArtifact(config: Config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about making this module as small as possible, and passing in actionsUtil.uploadDebugArtifacts as a function parameter.

@angelapwen
Copy link
Contributor Author

I've made most of the requested changes, and will work on the refactoring for unit tests.

I also made the change to zip up the database bundle and it seems to be working as expected on a failed run.

The core.info logs state:

db-javascript is not finalized. Uploading partial database bundle at /home/runner/work/_temp/codeql_databases/db-javascript-partial.zip...

and then the zip file is uploaded as part of the artifacts. I can't attach it here due to file size limits but here is the directory structure of the partial database bundle file when unzipped:
Screen Shot 2022-08-01 at 12 26 23 PM

@angelapwen angelapwen changed the title Move logs, SARIF actions uploads to post: hooks Move logs, SARIF, database bundle actions uploads to post: hooks Aug 1, 2022
@angelapwen
Copy link
Contributor Author

I added //TODOs in locations where I am planning to unit test, with the description of what each unit test should do, to get feedback on the planned tests before/while I write them.

Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! The overall logic looks sensible -- suggestions mostly about clarity and readability.

Let me know when you've added some tests and would like another look.

import * as core from "@actions/core";
import * as toolrunner from "@actions/exec/lib/toolrunner";
import * as safeWhich from "@chrisgavin/safe-which";
import AdmZip from "adm-zip";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use the zlib library; I think we could reuse that in this situation rather than adding a dependency. One difference is that zlib produces gzip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting. I will give this a try!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I remembered that I had some trouble with zlib when I tried it out earlier because it seems to only compress streams of data, and it makes preserving the directory structure difficult. I looked around 👀 and see another package that can create zipped directories using zlib for compression: https://github.com/archiverjs/node-archiver

But as this would also add a new dependency I'm not sure if it's worthwhile 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with what you've got for now. There is an astounding number of ways to do this in the npm ecosystem, and I don't have the expertise/familiarity to know whether any one is best, so I won't hold this up. adm-zip is widely used and has no other dependencies, which is nice. (In the VS Code extension I see we use zip-a-folder.)

src/analyze-action-cleanup.ts Outdated Show resolved Hide resolved
src/analyze-action-cleanup.ts Outdated Show resolved Hide resolved
src/analyze-action-cleanup.ts Outdated Show resolved Hide resolved
src/init-action-cleanup.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
@angelapwen
Copy link
Contributor Author

Quickly closing and re-opening to trigger new PR checks from actions/toolkit#1160

@angelapwen angelapwen closed this Aug 8, 2022
@angelapwen angelapwen reopened this Aug 8, 2022
@adityasharad
Copy link
Contributor

That looks good. The other option I considered was turning this workflow off on PRs, and only running it on push events. Let's keep this for now, and if the "failing" check confuses other contributors we can adjust it.

@angelapwen angelapwen force-pushed the angelapwen/post-init-cleanup branch from ea909e8 to ff7a29d Compare August 10, 2022 10:11
Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions while you polish up the tests.

src/analyze-action-post.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
src/debug-artifacts.ts Outdated Show resolved Hide resolved
src/actions-util.ts Outdated Show resolved Hide resolved
import * as core from "@actions/core";
import * as toolrunner from "@actions/exec/lib/toolrunner";
import * as safeWhich from "@chrisgavin/safe-which";
import AdmZip from "adm-zip";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with what you've got for now. There is an astounding number of ways to do this in the npm ecosystem, and I don't have the expertise/familiarity to know whether any one is best, so I won't hold this up. adm-zip is widely used and has no other dependencies, which is nice. (In the VS Code extension I see we use zip-a-folder.)

src/util.test.ts Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
src/actions-util.ts Show resolved Hide resolved
import * as debugArtifacts from "./debug-artifacts";
import { getActionsLogger } from "./logging";

async function run(uploadSarifDebugArtifact: Function) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For easier testing, you can move this function to a different file and test that file instead. This way, you avoid invoking runWrapper by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this, though I didn't know what to call the other file (not sure what patterns we want to follow). I called it analyze-action-post-helper but would like to hear what you think!


/**
* If a database has not been finalized, we cannot run the `codeql database bundle`
* command in the CLI because it will return an error. Instead we directly zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unfortunate and avoids a good use case for codeql database bundle. Do you know why it throws? I wonder if we could add a --force option (or something similar) to allow the command to succeed even if the database is malformed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little digging into the CLI source code, but I thought it'd be too complex to somehow catch the error thrown in the action and then do something different. A --force option would be interesting though and isn't something I'd considered.

We'd like to get this change in this week but I can write up a follow-up issue on the --force option on the CLI side for our backlog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...there's no need to get this done now.

src/util.ts Outdated Show resolved Hide resolved
src/util.ts Show resolved Hide resolved
@angelapwen
Copy link
Contributor Author

angelapwen commented Aug 11, 2022

The remaining unit tests are up, requesting final review now 😄

After this is merged, I will add the "Download and check debug artifacts after failure in analyze" PR check/job to the required PR checks for each branch. I'll also write up an issue on improving the codeql database bundle command to handle non-finalized databases.

Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


test("post: analyze action with debug mode off", async (t) => {
return await util.withTmpDir(async (tmpDir) => {
process.env["RUNNER_TEMP"] = tmpDir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to save the old value and restore it at the end of this test, so it doesn't affect the environment for later tests. Or turn it into a stub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm — I see it used this way without restoring in several other existing tests like

process.env["RUNNER_TEMP"] = tmpDir;

Is there a reason this one would act differently from other tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's used elsewhere in this way, it's probably fine. I would recommend in the future that we do what Aditya suggests in all the tests. The danger if we don't do this is there may be hard to spot dependencies between tests that use this environment variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we also have this helper function that is called quite a lot

export function setupActionsVars(tempDir: string, toolsDir: string) {

although we don't need the other two environment variables here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, does this line

process.env = t.context.env;
effectively reset the environment vars after each test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...it does. I also see sinon.restore();, which makes my comment about using a sandbox irrelevant.

src/analyze-action-post-helper.ts Outdated Show resolved Hide resolved
} as unknown as configUtils.Config);

const requiredInputStub = sinon.stub(actionsUtil, "getRequiredInput");
requiredInputStub.withArgs("output").returns("fake-output-dir");
Copy link
Contributor Author

@angelapwen angelapwen Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, I see that in some places we use ${STUB}.restore() to clean up at the end of a test, but in others we don't. Is this necessary or best practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I think this line

sinon.restore();
resets the sinon stubs after each test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually...we should do that everywhere. The vscode extension uses a different (and better) idiom, which we should move to in the action.

Instead of creating stubs directly using sinon.stub, use a sandbox:

  let sandbox: sinon.SinonSandbox;
  beforeEach(() => {
    sandbox = sinon.createSandbox();
  });
  afterEach(() => {
    sandbox.restore();
  });

And create stubs using: sandbox.stub(). This ensures all stubs are removed after every test runs. In the action, we are not doing this. So far, it hasn't caused any problems, but it might later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it looks like as long as we call

export function setupTests(test: TestFn<any>) {

in each test file it should call sinon.restore() and also restore the environment variables, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes....I missed that. So, you can ignore my comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great! This was educational 🧑‍🎓


test("post: analyze action with debug mode off", async (t) => {
return await util.withTmpDir(async (tmpDir) => {
process.env["RUNNER_TEMP"] = tmpDir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's used elsewhere in this way, it's probably fine. I would recommend in the future that we do what Aditya suggests in all the tests. The danger if we don't do this is there may be hard to spot dependencies between tests that use this environment variable.


/**
* If a database has not been finalized, we cannot run the `codeql database bundle`
* command in the CLI because it will return an error. Instead we directly zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right...there's no need to get this done now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants