Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue: commit queue issues and feedback #34770

Open
7 of 14 tasks
mmarchini opened this issue Aug 14, 2020 · 53 comments
Open
7 of 14 tasks

Tracking Issue: commit queue issues and feedback #34770

mmarchini opened this issue Aug 14, 2020 · 53 comments
Labels
meta Issues and PRs related to the general management of the project.

Comments

@mmarchini
Copy link
Contributor

mmarchini commented Aug 14, 2020

commit-queue can be used to land pull requests. It is still experimental, so when using please remember to check if the Pull Request landed successfully. Please share any failed landings here so we can keep track of issues, our landing process is quite complex and full of edge cases, we'll need to tweak the commit-queue and node-core-utils before we can consider this feature stable.

I'll keep this issue updated with all known issues.

@mmarchini mmarchini added the meta Issues and PRs related to the general management of the project. label Aug 14, 2020
@mmarchini
Copy link
Contributor Author

Tried two pull requests, both failed:

@lundibundi
Copy link
Member

Simple cases of two last points may be fixed with
nodejs/node-core-utils#473

mmarchini added a commit to mmarchini/node-core-utils that referenced this issue Aug 15, 2020
skipChecks will allow skipping certain checks. Useful to skip "commit
after last approval" on commit queue.

Ref: nodejs/node#34770
mmarchini added a commit to mmarchini/node-core-utils that referenced this issue Aug 15, 2020
skipChecks will allow skipping certain checks. Useful to skip "commit
after last approval" on commit queue.

Ref: nodejs/node#34770
@mmarchini mmarchini changed the title Tracking Issue: commit queue issues Tracking Issue: commit queue issues and feedback Aug 26, 2020
@lundibundi
Copy link
Member

🤔 Anyone knows why but these PRs failed to apply patches:
#34997 (comment)
#34682 (comment)

even though locally it applied successfully with manual git node land. Is three-way-merge somehow disabled for commit-queue?

@mmarchini
Copy link
Contributor Author

mmarchini commented Sep 11, 2020

We should add a link to the Action in the bot comment: so we don't have to go chasing it every time it fails: https://github.com/nodejs/node/runs/1102351422?check_suite_focus=true

As for the failures, I'm not entirely sure, it should've worked 🤔

One thing to do when this happens is to run git node land with --yes locally, since that's how it runs on the commit queue. This will ensure we're running the same command. Maybe ncu doesn't go to 3-way-merge with --yes? It should, but there might be a bug.

@mmarchini
Copy link
Contributor Author

Should wait for GitHub Actions to finish running as well: #35160 (comment)

@aduh95
Copy link
Contributor

aduh95 commented Sep 14, 2020

Any idea why #34951 is skipped?

@mmarchini
Copy link
Contributor Author

We have quite a backlog of scheduled Actions for the past hour, so something is up with GitHub apparently:

image

@aduh95 ncu thinks the CI is still running. Since the label was added four days ago, it's unrelated to the security release lockdown, but I can't check the CI status now because of the lockdown. If I forget to look at it tomorrow after the lockdown is lifted, feel free to ping me.

@targos
Copy link
Member

targos commented Sep 16, 2020

In #33770 (comment), a PR with 3 commits, the bot commented only with the last commit hash, instead of a range.

@mmarchini
Copy link
Contributor Author

Nice catch, we added the autorebase flag but didn't update the comment generation

@mmarchini
Copy link
Contributor Author

Found another issue: if a jenkins run was deleted but that's the newest one in the PR, it'll show as "pending" by ncu, which will cause the commit queue to skip (it should fail instead). Also, having a successful CI run should take precedence over having a non-CI already-deleted Jenkins run (like a CITGM or benchmark).

Screenshot might make it easier to understand the issue:

image

@richardlau
Copy link
Member

NCU considers CITGM runs? They're hardly ever green 😞. (It's probably still worth checking one was started, particular for semver-majors.)

@mmarchini
Copy link
Contributor Author

It checks all Jenkins links it knows how to parse

@richardlau
Copy link
Member

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

@targos
Copy link
Member

targos commented Sep 30, 2020

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

The logs for that run are here: https://github.com/nodejs/node/runs/1187952829?check_suite_focus=true

It didn't post the reason because the output was apparently too long for jq.
It contains thousands of lines like this:

image

@devsnek
Copy link
Member

devsnek commented Sep 30, 2020

i never thought I would see the day jq would be bested

@richardlau
Copy link
Member

richardlau commented Sep 30, 2020

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

The logs for that run are here: https://github.com/nodejs/node/runs/1187952829?check_suite_focus=true

It didn't post the reason because the output was apparently too long for jq.
It contains thousands of lines like this:

image

Looks like that run tried to land more than one PR -- perhaps an edge case where the first failure left things in a bad state?

@BethGriggs
Copy link
Member

From #34770 (comment):

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

The same appears to have happened with #35332 (comment)

@richardlau
Copy link
Member

@richardlau
Copy link
Member

Perhaps related to nodejs/node-core-utils#486?

@mmarchini
Copy link
Contributor Author

Yeah I think 486 is definitely related

@mmarchini
Copy link
Contributor Author

Fix: #35468

@mmarchini
Copy link
Contributor Author

mmarchini commented Nov 19, 2020

Wow that's an interesting edge case. There's probably nothing we can do about CQ being unable to land PRs changing an Action, but it should add the -failed label.

@mmarchini
Copy link
Contributor Author

Actually, after reading the error again, it should be possible to push changes to .github/workflows assuming that's something we want. We'll need to request an extra scope for the PAT on nodejs/admin though.

@mmarchini
Copy link
Contributor Author

Added scope to our PAT token, so now we can land PRs with .github/workflow changes.

@aduh95
Copy link
Contributor

aduh95 commented Jan 5, 2021

The CQ is not working anymore:

fatal: unable to access 'https://github.com/nodejs/node/': The requested URL returned error: 403
Error: Process completed with exit code 128.

https://github.com/nodejs/node/actions/runs/463941833
https://github.com/nodejs/node/actions/runs/463795672
https://github.com/nodejs/node/actions/runs/463659127

Maybe related to the recent token change?

@mmarchini
Copy link
Contributor Author

That's plausible, I'll take a look.

@mmarchini
Copy link
Contributor Author

re-generated the token and added it here.

@aduh95
Copy link
Contributor

aduh95 commented Jan 6, 2021

Same error on recent run (https://github.com/nodejs/node/runs/1655255308?check_suite_focus=true), this must come from somewhere else 🤷‍♂️

@richardlau
Copy link
Member

Looks like the request ci label isn't being removed from #36190 resulting in more CI runs being started.

@targos
Copy link
Member

targos commented Jan 13, 2021

The request errors with:

+ curl -sL --request DELETE --url https://api.github.com/repos/nodejs/node/issues/36190/labels/request-ci --header authorization: *** --header content-type: application/json
{
  "message": "Resource not accessible by integration",
  "documentation_url": "https://docs.github.com/rest/reference/issues#remove-a-label-from-an-issue"
}

I temporarily disabled the workflow in https://github.com/nodejs/node/actions?query=workflow%3A%22Auto+Start+CI%22

@richardlau
Copy link
Member

I guess this needs someone from the @nodejs/tsc (@mmarchini ?) who can see/generate the tokens to reenable the auto start CI and commit queue workflows.

@mmarchini
Copy link
Contributor Author

Oh no. I'll try to take a look at it over the weekend.

@richardlau
Copy link
Member

Ping @nodejs/tsc again (will also add to the tsc-agenda) regarding the token (which requires an owner, so I guess we could also ping the commcomm) and the broken commit queue. (FTR I'm not sure what it needs/currently has.).

If we're unable to maintain the commit queue we should probably remove it 😞.

@richardlau richardlau added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label May 6, 2021
@joyeecheung
Copy link
Member

What kind of token do we need here?

@mmarchini
Copy link
Contributor Author

@joyeecheung not sure, the token that we have already should work. It needs permission to edit PRs (to add/remove labels) as well as to commit to the main branch of the repository. It might need other permissions I don't remember from the top of my head.

@mhdawson
Copy link
Member

  • Was discussed in the meeting today, nobody advocated for removing it so let’s leave
    it for a bit longer/try to fix.
  • Those in the meeting would also be ok with removing, if Mary/those involved in maintaining
    the queue decide we should remove it.

Removing agenda tag

@mhdawson mhdawson removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jun 10, 2021
@tniessen
Copy link
Member

Is it possible that this is related to actions/labeler#50 and actions/first-interaction#10?

@mmarchini
Copy link
Contributor Author

I'm re-enabling the queue so I can test out the token, if it starts to generate too much spam notifications I'll disable it again.

@mmarchini
Copy link
Contributor Author

@tniessen that shouldn't be the case since we're using the @nodejs-github-bot account and a schedule trigger (instead of push/PR) to workaround those issues.

@mmarchini
Copy link
Contributor Author

Updated the token, seems like at some point we lost the repo scope. Re-enabled the commit-queue and request-ci Actions, let's hope they work this time.

@Trott
Copy link
Member

Trott commented Jun 5, 2022

I think Should skip if GitHub Actions still running and Land purple can be checked off in the list?

@aduh95
Copy link
Contributor

aduh95 commented Jun 5, 2022

  • Can't land DEPXXX

I think that's a won't fix, we've added a lint rule to validate deprecation numbers that doesn't allow DEPXXX (#41992).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Issues and PRs related to the general management of the project.
Projects
None yet
Development

No branches or pull requests