Skip to content

Commit

Permalink
More resilient DRA packaging (#39332)
Browse files Browse the repository at this point in the history
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114
  • Loading branch information
dliappis authored May 1, 2024
1 parent 5011ccc commit 726f6e9
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 1 deletion.
32 changes: 32 additions & 0 deletions .buildkite/packaging.pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ steps:
provider: gcp
image: "${IMAGE_UBUNTU_X86_64}"
machineType: "${GCP_DEFAULT_MACHINE_TYPE}"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
commands:
- make build/distributions/dependencies.csv
- make beats-dashboards
Expand All @@ -62,6 +66,10 @@ steps:
provider: gcp
image: "${IMAGE_UBUNTU_X86_64}"
machineType: "${GCP_DEFAULT_MACHINE_TYPE}"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
commands:
- make build/distributions/dependencies.csv
- make beats-dashboards
Expand All @@ -86,6 +94,10 @@ steps:
provider: gcp
image: "${IMAGE_UBUNTU_X86_64}"
machineType: "${GCP_DEFAULT_MACHINE_TYPE}"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
artifact_paths:
- build/distributions/**/*
matrix:
Expand Down Expand Up @@ -116,6 +128,10 @@ steps:
provider: "aws"
imagePrefix: "${AWS_IMAGE_UBUNTU_ARM_64}"
instanceType: "${AWS_ARM_INSTANCE_TYPE}"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
artifact_paths:
- build/distributions/**/*
matrix:
Expand All @@ -142,6 +158,10 @@ steps:
provider: gcp
image: "${IMAGE_UBUNTU_X86_64}"
machineType: "c2-standard-16"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
artifact_paths:
- build/distributions/**/*

Expand All @@ -161,6 +181,10 @@ steps:
provider: gcp
image: "${IMAGE_UBUNTU_X86_64}"
machineType: "${GCP_DEFAULT_MACHINE_TYPE}"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
artifact_paths:
- build/distributions/**/*
matrix:
Expand Down Expand Up @@ -191,6 +215,10 @@ steps:
provider: "aws"
imagePrefix: "${AWS_IMAGE_UBUNTU_ARM_64}"
instanceType: "${AWS_ARM_INSTANCE_TYPE}"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
artifact_paths:
- build/distributions/**/*
matrix:
Expand All @@ -217,6 +245,10 @@ steps:
provider: gcp
image: "${IMAGE_UBUNTU_X86_64}"
machineType: "c2-standard-16"
timeout_in_minutes: 40
retry:
automatic:
- limit: 1
artifact_paths:
- build/distributions/**/*

Expand Down
3 changes: 2 additions & 1 deletion catalog-info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1045,7 +1045,7 @@ spec:
# branch_configuration: "main 8.* 7.17"
cancel_intermediate_builds: false
skip_intermediate_builds: false
maximum_timeout_in_minutes: 60
maximum_timeout_in_minutes: 90
provider_settings:
build_branches: true
build_pull_request_forks: false
Expand All @@ -1059,6 +1059,7 @@ spec:
ELASTIC_SLACK_NOTIFICATIONS_ENABLED: 'true'
SLACK_NOTIFICATIONS_CHANNEL: '#ingest-notifications'
SLACK_NOTIFICATIONS_ON_SUCCESS: 'false'
SLACK_NOTIFICATIONS_SKIP_FOR_RETRIES: 'true'
teams:
ingest-fp:
access_level: MANAGE_BUILD_AND_READ
Expand Down

0 comments on commit 726f6e9

Please sign in to comment.