limit to 150 concurrent jobs per workflow #216
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contributes to #162
Since I started paying close attention, I've observed that most CI runs here have at least 10 failures on their first run, with errors indicating we hit dockerhub rate limits:
This proposes using @ajschmidt8 's recommendation from rapidsai/miniforge-cuda#72 (comment) to limit the number of concurrent jobs.
Notes for Reviewers
Benefits of this change
Reduces the impact of this repo on total CPU runner availability for projects using NVIDIA runners.
Reduces the likelihood that a human will have to retrigger a build here (which costs time and money, and is easy to miss on branch builds after merges).
Why set the limit to 150?
This is not an exact science haha. I'm just looking for a number that meets these constraints:
Some relevant information:
ci-conda
/miniforge-cuda
:nvidia/cuda
,condaforge/miniforge3
,mikefarah/yq
,amazon/aws-cli
)ci-wheel
:amazon/aws-cli
... base image is from NVCR)citestwheel
:amazon/aws-cli
... base image is from NVCR)So assuming full availability of
linux-{aarch64,amd64}-cpu
runners, and if all build jobs take roughly the same amount of time, changing from "unlimited" to 150 might mean roughly 1.8x the end-to-end time for a CI run here... from around 11 minutes to maybe 20 minutes.I have no idea what the exact limit is from DockerHub. From https://docs.docker.com/docker-hub/download-rate-limit/#other-limits: