fix: sets `fail-fast` to `false` for matrix workflows #995

noahpb · 2024-11-08T19:36:19Z

Description

We've noticed some behavior on the nightly ci pipelines where one failed parallel job was causing other in-progress jobs in the same workflow run to non gracefully terminate as the cancellation timeout would be met. This led to inconsistent behavior with subsequent workflow runs involving terraform where pre-existing state files were locked. Workflows are continually failing until a new commit is made, which forces the workflow to generate a new state key. This would circumvent the issue, but consequentially leave behind orphaned resources.

The intent of this PR is to ensure that all failed jobs gracefully exit and do not impact the status of other jobs that are running in the same workflow. This will add additional time to workflow runs, but it will always ensure that resources are properly cleaned up and that processes gracefully terminate before the pipeline completing.
...

Related Issue

Example workflow run:
https://github.com/defenseunicorns/uds-core/actions/runs/11747339383/job/32731009925?pr=989#step:9:63

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Other (security config, docs update, etc)

Checklist before merging

Test, docs, adr added or updated as needed
Contributor Guide followed

UnicornChance

I think the time trade off is worth the cleanup

🤖 I have created a release *beep* *boop* --- ## [0.31.0](v0.30.0...v0.31.0) (2024-11-12) ### ⚠ BREAKING CHANGES * Remove the generated exception block from the remoteCidr generation. This change means that a cidr containing the META_IP could be set. ### Bug Fixes * avoids memory leak in istio sidecar termination ([#972](#972)) ([bfd415e](bfd415e)) * ensure grafana does not install plugins from the internet ([#993](#993)) ([f3def45](f3def45)) * remove remoteCidr exception block ([#987](#987)) ([264fbf6](264fbf6)) * renovate config updated to track tests ([#981](#981)) ([2494448](2494448)) * sets `fail-fast` to `false` for matrix workflows ([#995](#995)) ([3008788](3008788)) * sort auth chains when building the authservice config ([#969](#969)) ([15487fb](15487fb)) ### Miscellaneous * add prometheus, loki, and vector e2e testing ([#939](#939)) ([f271ce2](f271ce2)) * add the scorecard supply chain security workflow ([#917](#917)) ([5626f2f](5626f2f)) * **deps:** update authservice to v1.0.3 ([#893](#893)) ([5585a3c](5585a3c)) * **deps:** update grafana curl-fips image to v8.11.0 ([#994](#994)) ([dfc4c8c](dfc4c8c)) * **deps:** update grafana to 11.3.0 ([#921](#921)) ([7cdd742](7cdd742)) * **deps:** update loki to 3.2.1 ([#918](#918)) ([5fa6a24](5fa6a24)) * **deps:** update loki to v6.19.0 ([#990](#990)) ([8bbac53](8bbac53)) * **deps:** update pepr to v0.39.0 ([#932](#932)) ([27eb1bd](27eb1bd)) * **deps:** update support dependencies to v3.27.2 ([#1001](#1001)) ([8702952](8702952)) * **deps:** update support dependencies to v3.3.0 ([#985](#985)) ([4636a38](4636a38)) * **deps:** update support dependencies to v3.3.1 ([#1002](#1002)) ([8c20b49](8c20b49)) * **deps:** update support-deps ([#928](#928)) ([a9cf1f2](a9cf1f2)) * **deps:** update support-deps ([#983](#983)) ([dc3084b](dc3084b)) * **deps:** update support-deps ([#989](#989)) ([7a1c74e](7a1c74e)) * **deps:** update velero ([#956](#956)) ([7746092](7746092)) * regroup renovate support dependencies ([#979](#979)) ([6491be9](6491be9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

noahpb added 3 commits November 8, 2024 14:29

fix: sets fail-fast to false for matrix workflows

2c8f632

Merge branch 'main' into fix/fail-strategy

ea07292

lint, add fail-fast in additional matrix job

cebaf50

noahpb marked this pull request as ready for review November 8, 2024 20:56

noahpb requested a review from a team as a code owner November 8, 2024 20:56

UnicornChance approved these changes Nov 8, 2024

View reviewed changes

noahpb merged commit 3008788 into main Nov 8, 2024
14 checks passed

noahpb deleted the fix/fail-strategy branch November 8, 2024 22:46

github-actions bot mentioned this pull request Nov 8, 2024

chore(main): release 0.31.0 #971

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: sets `fail-fast` to `false` for matrix workflows #995

fix: sets `fail-fast` to `false` for matrix workflows #995

noahpb commented Nov 8, 2024 •

edited

Loading

UnicornChance left a comment

fix: sets fail-fast to false for matrix workflows #995

fix: sets fail-fast to false for matrix workflows #995

Conversation

noahpb commented Nov 8, 2024 • edited Loading

Description

Related Issue

Type of change

Checklist before merging

UnicornChance left a comment

Choose a reason for hiding this comment

fix: sets `fail-fast` to `false` for matrix workflows #995

fix: sets `fail-fast` to `false` for matrix workflows #995

noahpb commented Nov 8, 2024 •

edited

Loading