-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] Change to cache build dependencies #3562
Conversation
This commit updates the CI to cache build dependencies (LLVM, NVIDIA toolchain, and pybind11) to speed up build process.
Triggered CI for you! |
Thanks @jlebar! :D I'm expecting some push buttons on this one given needing to iterating on it a bit. But don't need to review until it's tested working. |
~/.triton/llvm | ||
~/.triton/nvidia | ||
~/.triton/pybind11 | ||
key: ${{ runner.os }}-${{ runner.arch }}-llvm-${{ steps.cache-key.outputs.llvm }}-nvidia-${{ steps.cache-key.outputs.nvidia }}-pybind11-${{ steps.cache-key.outputs.pybind11 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth doing this as three cache steps rather than one, so that when the LLVM version changes we don't have to re-download the other two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH I'm fine if you want to leave it as-is because debugging this is such a pain, and you finally found something that worked. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples around I see uses only one actions/cache
step with mulitple paths. So I just followed that to be on the tested path. Don't know what would happen if we have multiple actions/cache
steps. Maybe doable. I can give it a try later.
I think we likely would only change LLVM frequently; other parts likely would stay the same for long time. So I won't worry about cache invdalidation or similar due to frequent changes on other parts. Having everything packaged in one also helps compression/uploading/downloading a bit I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we likely would only change LLVM frequently; other parts likely would stay the same for long time.
Yeah, the problem is when we change LLVM we have to re-download the others from the Web (right?). And that exposes us to flakiness.
Anyway sgtm to leave it like this for now.
with: | ||
submodules: 'true' | ||
|
||
- name: Set ROCM ENV | ||
- name: Compute build dependency cache keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't suppose there's a way to avoid this code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now probably not. It ties to the overall structure between the AMD and NVIDIA integration tests. I'd think we need to figure out how to unify them in general if that's the path we want to be on. But in general, they are based on different approaches (base machine vs docker) so even seems duplication, it should be fine. cause we may need to take different steps down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome -- thank you so much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This commit updates the CI to cache build dependencies (llvm,
nvidia toolchain, and pybind11) to speed up build process.
In order to do this, we need to expose the versions of nvidia
toolchain and pybind11 needed like llvm.
With this chage, we now see a much faster triton install:
Diffing https://github.com/openai/triton/actions/runs/8556465920/job/23446341337?pr=3565 and
https://github.com/openai/triton/actions/runs/8558161016/job/23452293260?pr=3562:
More importantly we can reduce the chance of seeing
connection reset due to frequently download build
dependencies from the original source.