ubuntu-latest: jobs fail with error code 143 #6680

nicktrav · 2022-12-02T00:23:06Z

Description

We recently started seeing a high rate of failure in runs of a job that run go race on Linux runners (ubuntu-latest). We see the following in the logs, but lack the context to say why the process receives the SIGTERM:

2022-12-01T17:39:21.3663768Z make: *** [Makefile:22: test] Terminated
2022-12-01T17:39:21.5137635Z ##[error]Process completed with exit code 143.
2022-12-01T17:39:21.5192954Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2022-12-01T17:39:21.7086950Z Cleaning up orphan processes

This seems to be less of an issue with the codebase itself (the same set of tests pass under stress on dedicated Linux workstations and cloud VMs), and more with the action runner VMs. That said, the failure rate seems to have markedly increased after a recent change to the codebase.

We're speculating that we are hitting some kind of resource limit due to the recent code change, though it's hard to say definitively.

More context in cockroachdb/pebble#2159.

Platforms affected

Azure DevOps
GitHub Actions - Standard Runners
GitHub Actions - Larger Runners

Runner images affected

Image version and build link

  Image: ubuntu-22.04
  Version: 20221119.2

Is it regression?

No - we've seen the same job passing with the same image.

Expected behavior

The job should complete without error.

Actual behavior

Job fails with exit code 143.

Repro steps

Run the linux-race job in the Pebble repo (e.g. via PR, etc.). NOTE: we've since temporarily disabled that job until we resolve this particular issue.

We used cockroachdb/pebble#2158 to bisect down to the code change that increased the failure rate, though it's not clear why it's failing with error code 143.

The text was updated successfully, but these errors were encountered:

mikhailkoliada · 2022-12-02T07:16:34Z

@nicktrav Hello! My initial suspect is that you hit runner's limitation and we can not do anything here as you need to either reduce the resources usage or switch to large runners.

Could you post a link to the failed build please? (not mandatory the link is world-readable, as we do not need the content of the job, but the link itself)

mikhailkoliada · 2022-12-02T07:39:03Z

@nicktrav nevermind, I found what I was looking for.

nicktrav · 2022-12-02T14:55:46Z

hit runner's limitation and we can not do anything here as you need to either reduce the resources usage or switch to large runners

Is there any way to tell if we're hitting the limitation? Are there graphs / metrics some place we can look?

mikhailkoliada · 2022-12-02T20:03:28Z

@nicktrav unfortunately we do not publish any data like this, but I found out that the runner has fallen due to high CPU usage rate, to my sadness we can not do anything on our side as I said above. The only alternatives are self-hosted runners or large runners.

If you have questions, feel free to reach us out again!

svenjacobs · 2022-12-05T11:19:12Z

We're facing the same problem since a few days. Builds are cancelled at various stages and execution times. Did the behaviour change for Ubuntu 22.04 standard runners, which are now the default since December 1, 2022? We never had these kind of problems with Ubuntu 20.04.

mikhailkoliada · 2022-12-05T12:01:24Z

@svenjacobs hi! No, runners are same

JakubMosakowski · 2022-12-06T17:46:38Z

The same problem started to occur in our repo. Is there any way to check if the runner hit its limitation?

Our builds ends with:

2022-12-06T15:40:04.3131651Z AAPT2 aapt2-7.3.0-8691043-linux Daemon #0: shutdown
2022-12-06T15:40:04.3150422Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2022-12-06T15:40:04.4473398Z ##[error]The operation was canceled.

at random stages of the build.

nicktrav · 2022-12-06T18:34:06Z

unfortunately we do not publish any data like this, but I found out that the runner has fallen due to high CPU usage rate

Thanks for digging into it @mikhailkoliada.

If I might provide some feedback for the product, it would be nice to be able to tell why a runner is failing in these situations. The The runner has received a shutdown signal message is very vague, and required us opening an issue here to confirm that we're seeing high CPU on the runner.

JakubMosakowski · 2022-12-07T09:44:07Z

For me what fixed the issue was reducing the amount of memory used by Gradle JVM.

https://docs.gradle.org/current/userguide/build_environment.html#sec:configuring_jvm_memory
(replaced org.gradle.jvmargs=-Xmx4g with org.gradle.jvmargs=-Xmx6g).

EDIT: It doesn't work anymore. No idea, maybe it was just a fluke, that it started to pass after changing those options.

…buntu

…g back Ubuntu" This reverts commit 629ddfb.

…buntu

PGMacDesign2 · 2023-09-13T15:43:44Z

This is still an issue for us and we have no fix for this other than to schedule an elaborate retry loop to retry certain builds.
Is this still being actively worked on? Or are there any workarounds / solutions that may solve this issue?

joeflack4 · 2023-10-11T00:45:56Z

@mikhailkoliada @nicktrav How are you able to ascertain whether it is a CPU issue? I can't find the cause of my failure, and I think many others are having the same problem.

Do you or anyone else know where to find if/where GitHub has published CPU limitations?
It was very hard to find this article which shows memory and disk limitations. It mentions what kind of CPU is being used by the runner, but nothing about a limit.

KarthikAyikkathil100 · 2024-07-30T13:27:30Z

Is there any update on this ?

shojaeix · 2024-11-05T15:23:47Z

This happened in my pipeline that was applying Terraform plan to AWS. The pipeline took 6 minutes and terminated with error 143.

Then I restarted it, it took 12 minutes and finished successfully! very weird

nicktrav added bug report needs triage labels Dec 2, 2022

mikhailkoliada self-assigned this Dec 2, 2022

mikhailkoliada added OS: Ubuntu investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed bug report needs triage labels Dec 2, 2022

mikhailkoliada closed this as completed Dec 2, 2022

JakubMosakowski mentioned this issue Dec 7, 2022

The runner has received a shutdown signal. #6709

Closed

11 tasks

armanbilge mentioned this issue Dec 13, 2022

remove unused version number typelevel/cats#4358

Merged

squiddy mentioned this issue Dec 28, 2022

wasm-pack test is unreliable astral-sh/ruff#1425

Closed

timbray mentioned this issue Dec 31, 2022

kaizen: prepare for v1.0.0 timbray/quamina#167

Merged

fuweid mentioned this issue Jan 3, 2023

fix data race on tx.Stats etcd-io/bbolt#373

Merged

Pale-Blue-Dot-97 mentioned this issue Jan 17, 2023

Minor Fluffy Sprint v2 dev Pale-Blue-Dot-97/Minerva#111

Merged

timbray mentioned this issue Jan 17, 2023

Fix unit-test CI timbray/quamina#172

Closed

daico007 mentioned this issue Feb 8, 2023

replaced bondgraph with networkx mosdef-hub/mbuild#1087

Merged

4 tasks

Mehdi-Bendriss mentioned this issue Feb 19, 2023

Jobs fail with error code 143 #7146

Closed

10 tasks

jordanrfrazier mentioned this issue Mar 1, 2023

fix: temporarily use single threaded for cargo test in CI #56 kaskada-ai/kaskada#57

Merged

cyyynthia mentioned this issue Mar 28, 2023

Fix failing CI test using thread sanitizer ggerganov/llama.cpp#582

Closed

jameshalgren mentioned this issue Apr 17, 2023

Version upgrades AlabamaWaterInstitute/CloudInfra#38

Merged

ajnavarro mentioned this issue Apr 18, 2023

Performance dashboard: track and document performance improvements. gnolang/gno#689

Closed

jlucktay added a commit to ovotech/go-sync that referenced this issue Sep 11, 2023

ci(actions): work around actions/runner-images#6680 by rolling back U…

629ddfb

…buntu

jlucktay added a commit to ovotech/go-sync that referenced this issue Sep 11, 2023

Revert "ci(actions): work around actions/runner-images#6680 by rollin…

e14aee9

…g back Ubuntu" This reverts commit 629ddfb.

jlucktay added a commit to ovotech/go-sync that referenced this issue Sep 12, 2023

ci(actions): work around actions/runner-images#6680 by rolling back U…

36e1ba2

…buntu

wmontwe mentioned this issue Sep 18, 2023

Fix GitHub Actions Failure with Error Code 143 thunderbird/thunderbird-android#7178

Merged

jcoupey mentioned this issue Sep 28, 2023

Update to gcc 12 in CI VROOM-Project/vroom#1002

Closed

jiridanek mentioned this issue Sep 28, 2023

Ubuntu 22.04 GHA jobs fail with Error: Process completed with exit code 143. skupperproject/skupper-router#1240

Closed

OkanoShinri mentioned this issue Sep 28, 2023

[Mahjong] Implement Menzen sotetsuk/pgx#1045

Merged

3 tasks

peterjc mentioned this issue Oct 4, 2023

Test on Python 3.12 biopython/biopython#4454

Merged

3 tasks

seipan mentioned this issue Oct 5, 2023

ci : Regarding the failure of the Go CI in GitHub Actions vmihailenco/msgpack#358

Open

vtnate mentioned this issue Nov 2, 2023

Moar testing NREL/ThermalNetwork#8

Merged

jmhbnz mentioned this issue Nov 15, 2023

github workflow: use GOMEMLIMIT to limit mem usage etcd-io/bbolt#610

Closed

Canato mentioned this issue Dec 15, 2023

Android project update to Gradle 8 and Java 17 build fails without reason. #7915

Closed

10 tasks

cameronraysmith mentioned this issue Jan 29, 2024

Shutdown signal received during post job cleanup DeterminateSystems/magic-nix-cache-action#23

Open

sinaw369 mentioned this issue Feb 5, 2024

ubuntu-latest: jobs fail with error code 143 ormushq/ormus#59

Open

LoiNguyenCS mentioned this issue Feb 14, 2024

Handle implicit calls from interface or superclass njit-jerse/specimin#119

Merged

hackerwins mentioned this issue Apr 23, 2024

Cache npm and node_modules in CI yorkie-team/yorkie-js-sdk#783

Merged

2 tasks

azriel1rf mentioned this issue May 3, 2024

Configure pre-commit hooks and GitHub Actions HenryRLee/PokerHandEvaluator#95

Merged

mvarewyck mentioned this issue May 16, 2024

Fix v1.0.0 inbo/alien-species-portal#74

Merged

matjazv mentioned this issue Jun 3, 2024

Add automated toolings for testing contracts LiskHQ/lisk-contracts#151

Merged

felipeangelimvieira mentioned this issue Jun 10, 2024

[MNT] Update skbase to run sktime's contract validation tests felipeangelimvieira/prophetverse#47

Closed

ggkiokas mentioned this issue Jun 28, 2024

Github workflow fails for unknown reason 2 #10140

Closed

14 tasks

felipeangelimvieira mentioned this issue Jul 15, 2024

[MNT] Remove mark.skip from check_estimator tests felipeangelimvieira/prophetverse#89

Closed

fuweid mentioned this issue Aug 6, 2024

Update GitHub runners to use ubuntu-latest since they have nested virt etcd-io/bbolt#811

Merged

wpbonelli mentioned this issue Aug 13, 2024

ci: reclaim disk space on ubuntu for intel test job MODFLOW-USGS/modflow6#1980

Merged

2 tasks

lgetwan mentioned this issue Aug 28, 2024

Implement module "site" for configuring distributed monitoring Checkmk/ansible-collection-checkmk.general#654

Merged

7 tasks

lystopad mentioned this issue Sep 2, 2024

An error on Github CI while run some erigon hive tests erigontech/erigon#11834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ubuntu-latest: jobs fail with error code 143 #6680

ubuntu-latest: jobs fail with error code 143 #6680

nicktrav commented Dec 2, 2022

mikhailkoliada commented Dec 2, 2022

mikhailkoliada commented Dec 2, 2022

nicktrav commented Dec 2, 2022

mikhailkoliada commented Dec 2, 2022

svenjacobs commented Dec 5, 2022 •

edited

Loading

mikhailkoliada commented Dec 5, 2022

JakubMosakowski commented Dec 6, 2022

nicktrav commented Dec 6, 2022

JakubMosakowski commented Dec 7, 2022 •

edited

Loading

PGMacDesign2 commented Sep 13, 2023

joeflack4 commented Oct 11, 2023

KarthikAyikkathil100 commented Jul 30, 2024

shojaeix commented Nov 5, 2024

ubuntu-latest: jobs fail with error code 143 #6680

ubuntu-latest: jobs fail with error code 143 #6680

Comments

nicktrav commented Dec 2, 2022

Description

Platforms affected

Runner images affected

Image version and build link

Is it regression?

Expected behavior

Actual behavior

Repro steps

mikhailkoliada commented Dec 2, 2022

mikhailkoliada commented Dec 2, 2022

nicktrav commented Dec 2, 2022

mikhailkoliada commented Dec 2, 2022

svenjacobs commented Dec 5, 2022 • edited Loading

mikhailkoliada commented Dec 5, 2022

JakubMosakowski commented Dec 6, 2022

nicktrav commented Dec 6, 2022

JakubMosakowski commented Dec 7, 2022 • edited Loading

PGMacDesign2 commented Sep 13, 2023

joeflack4 commented Oct 11, 2023

KarthikAyikkathil100 commented Jul 30, 2024

shojaeix commented Nov 5, 2024

svenjacobs commented Dec 5, 2022 •

edited

Loading

JakubMosakowski commented Dec 7, 2022 •

edited

Loading