[ci] refactor cloudwatch metric publishing to avoid needing changes i… #2974
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…n primary workflows
Description
TLDR: Fixes cw metric publishing for scheduled workflow runs
The problem with the previous implementation of publishing metrics to cw for failed builds is that there is no simple way to parse the overall status of an in progress workflow (previous implementation is broken). To do so, we'd need to annotate each step with an id and parse the
steps
context, or have each step output it's status and parse all the outputs to determine the status.Instead, we can use a workflow triggered by the
workflow_run
event forcompleted
workflows. This triggered workflow can parse the status, name, and repo of the triggering workflow and use that to publish reliable metrics to CloudWatch.I have tested this in a personal repo here https://github.com/siddvenk/github-workflow-testing/actions/workflows/forking-workflow.yml, and it works as expected.
The other benefit here is we don't have to change any of the existing workflows, or new workflows to include a step to publish metrics. The CW publishing workflow only runs for workflows triggered by the scheduled event on the master branch of the repo.
One thing this does not achieve is publishing fine-grained metrics indicating which step of a workflow failed. That could be nice, but I think it's not necessary since we really just need to be alerted whenever a workflow fails.