-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ubuntu-latest: jobs fail with error code 143 #6680
Comments
@nicktrav Hello! My initial suspect is that you hit runner's limitation and we can not do anything here as you need to either reduce the resources usage or switch to large runners. Could you post a link to the failed build please? (not mandatory the link is world-readable, as we do not need the content of the job, but the link itself) |
@nicktrav nevermind, I found what I was looking for. |
Is there any way to tell if we're hitting the limitation? Are there graphs / metrics some place we can look? |
@nicktrav unfortunately we do not publish any data like this, but I found out that the runner has fallen due to high CPU usage rate, to my sadness we can not do anything on our side as I said above. The only alternatives are self-hosted runners or large runners. If you have questions, feel free to reach us out again! |
We're facing the same problem since a few days. Builds are cancelled at various stages and execution times. Did the behaviour change for Ubuntu 22.04 standard runners, which are now the default since December 1, 2022? We never had these kind of problems with Ubuntu 20.04. |
@svenjacobs hi! No, runners are same |
The same problem started to occur in our repo. Is there any way to check if the runner hit its limitation? Our builds ends with:
at random stages of the build. |
Thanks for digging into it @mikhailkoliada. If I might provide some feedback for the product, it would be nice to be able to tell why a runner is failing in these situations. The |
For me what fixed the issue was reducing the amount of memory used by Gradle JVM. https://docs.gradle.org/current/userguide/build_environment.html#sec:configuring_jvm_memory EDIT: It doesn't work anymore. No idea, maybe it was just a fluke, that it started to pass after changing those options. |
…g back Ubuntu" This reverts commit 629ddfb.
This is still an issue for us and we have no fix for this other than to schedule an elaborate retry loop to retry certain builds. |
@mikhailkoliada @nicktrav How are you able to ascertain whether it is a CPU issue? I can't find the cause of my failure, and I think many others are having the same problem. Do you or anyone else know where to find if/where GitHub has published CPU limitations? |
Is there any update on this ? |
This happened in my pipeline that was applying Terraform plan to AWS. The pipeline took 6 minutes and terminated with error 143. Then I restarted it, it took 12 minutes and finished successfully! very weird |
Description
We recently started seeing a high rate of failure in runs of a job that run
go race
on Linux runners (ubuntu-latest
). We see the following in the logs, but lack the context to say why the process receives the SIGTERM:This seems to be less of an issue with the codebase itself (the same set of tests pass under stress on dedicated Linux workstations and cloud VMs), and more with the action runner VMs. That said, the failure rate seems to have markedly increased after a recent change to the codebase.
We're speculating that we are hitting some kind of resource limit due to the recent code change, though it's hard to say definitively.
More context in cockroachdb/pebble#2159.
Platforms affected
Runner images affected
Image version and build link
Is it regression?
No - we've seen the same job passing with the same image.
Expected behavior
The job should complete without error.
Actual behavior
Job fails with exit code 143.
Repro steps
Run the
linux-race
job in the Pebble repo (e.g. via PR, etc.). NOTE: we've since temporarily disabled that job until we resolve this particular issue.We used cockroachdb/pebble#2158 to bisect down to the code change that increased the failure rate, though it's not clear why it's failing with error code 143.
The text was updated successfully, but these errors were encountered: