[ci] Change to cache build dependencies #3562

antiagainst · 2024-04-04T00:34:02Z

This commit updates the CI to cache build dependencies (llvm,
nvidia toolchain, and pybind11) to speed up build process.

In order to do this, we need to expose the versions of nvidia
toolchain and pybind11 needed like llvm.

With this chage, we now see a much faster triton install:

Diffing https://github.com/openai/triton/actions/runs/8556465920/job/23446341337?pr=3565 and
https://github.com/openai/triton/actions/runs/8558161016/job/23452293260?pr=3562:

NVIDIA A100: ~2.5min -> ~1.5min
NVIDIA H100: ~2min -> ~1min
AMD gfx90a: ~3min -> ~1min

More importantly we can reduce the chance of seeing
connection reset due to frequently download build
dependencies from the original source.

This commit updates the CI to cache build dependencies (LLVM, NVIDIA toolchain, and pybind11) to speed up build process.

jlebar · 2024-04-04T00:35:17Z

Triggered CI for you!

antiagainst · 2024-04-04T00:40:33Z

Thanks @jlebar! :D I'm expecting some push buttons on this one given needing to iterating on it a bit. But don't need to review until it's tested working.

jlebar · 2024-04-04T18:47:03Z

.github/workflows/integration-tests.yml

+            ~/.triton/llvm
+            ~/.triton/nvidia
+            ~/.triton/pybind11
+          key: ${{ runner.os }}-${{ runner.arch }}-llvm-${{ steps.cache-key.outputs.llvm }}-nvidia-${{ steps.cache-key.outputs.nvidia }}-pybind11-${{ steps.cache-key.outputs.pybind11 }}


Do you think it's worth doing this as three cache steps rather than one, so that when the LLVM version changes we don't have to re-download the other two?

OTOH I'm fine if you want to leave it as-is because debugging this is such a pain, and you finally found something that worked. :)

Examples around I see uses only one actions/cache step with mulitple paths. So I just followed that to be on the tested path. Don't know what would happen if we have multiple actions/cache steps. Maybe doable. I can give it a try later.

I think we likely would only change LLVM frequently; other parts likely would stay the same for long time. So I won't worry about cache invdalidation or similar due to frequent changes on other parts. Having everything packaged in one also helps compression/uploading/downloading a bit I think.

I think we likely would only change LLVM frequently; other parts likely would stay the same for long time.

Yeah, the problem is when we change LLVM we have to re-download the others from the Web (right?). And that exposes us to flakiness.

Anyway sgtm to leave it like this for now.

jlebar · 2024-04-04T18:47:38Z

.github/workflows/integration-tests.yml

        with:
          submodules: 'true'

-      - name: Set ROCM ENV
+      - name: Compute build dependency cache keys


I don't suppose there's a way to avoid this code duplication?

For now probably not. It ties to the overall structure between the AMD and NVIDIA integration tests. I'd think we need to figure out how to unify them in general if that's the path we want to be on. But in general, they are based on different approaches (base machine vs docker) so even seems duplication, it should be fine. cause we may need to take different steps down the road.

jlebar

This is awesome -- thank you so much!

ThomasRaoux

Thanks!

[ci] Change to cache build dependencies

98b2dcf

This commit updates the CI to cache build dependencies (LLVM, NVIDIA toolchain, and pybind11) to speed up build process.

antiagainst mentioned this pull request Apr 4, 2024

cache llvm library to save time for CI build #3537

Closed

ThomasRaoux and others added 11 commits April 3, 2024 18:42

Merge branch 'main' into cache-build-deps

854fcce

Fix container volumes issue

cb9cd85

Fix key to include arch and cache directory existence

5266e96

Avoid clear cache and fix docker volume again

0143a4f

Fix docker volumes and cache path

2a57cdf

Fix docker cache path again

7db5f38

Try again for dockers

8d0be94

Print more info

7c5c9e2

Another attempt

4a6c5c7

Change to avoid variables in cache action

9ad5d79

Fix and cleanup

712a9af

antiagainst marked this pull request as ready for review April 4, 2024 18:26

antiagainst requested a review from ptillet as a code owner April 4, 2024 18:26

jlebar reviewed Apr 4, 2024

View reviewed changes

ThomasRaoux approved these changes Apr 4, 2024

View reviewed changes

zahimoud approved these changes Apr 4, 2024

View reviewed changes

jlebar approved these changes Apr 4, 2024

View reviewed changes

jlebar merged commit 38e45bb into triton-lang:main Apr 4, 2024
5 checks passed

antiagainst deleted the cache-build-deps branch April 4, 2024 20:00

antiagainst mentioned this pull request Apr 5, 2024

[ci] Enable postsubmit integration if build dep changes #3581

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Change to cache build dependencies #3562

[ci] Change to cache build dependencies #3562

antiagainst commented Apr 4, 2024 •

edited

Loading

jlebar commented Apr 4, 2024

antiagainst commented Apr 4, 2024

jlebar Apr 4, 2024

jlebar Apr 4, 2024

antiagainst Apr 4, 2024 •

edited

Loading

jlebar Apr 4, 2024

jlebar Apr 4, 2024

antiagainst Apr 4, 2024

jlebar left a comment

ThomasRaoux left a comment

[ci] Change to cache build dependencies #3562

[ci] Change to cache build dependencies #3562

Conversation

antiagainst commented Apr 4, 2024 • edited Loading

jlebar commented Apr 4, 2024

antiagainst commented Apr 4, 2024

jlebar Apr 4, 2024

Choose a reason for hiding this comment

jlebar Apr 4, 2024

Choose a reason for hiding this comment

antiagainst Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

jlebar Apr 4, 2024

Choose a reason for hiding this comment

jlebar Apr 4, 2024

Choose a reason for hiding this comment

antiagainst Apr 4, 2024

Choose a reason for hiding this comment

jlebar left a comment

Choose a reason for hiding this comment

ThomasRaoux left a comment

Choose a reason for hiding this comment

antiagainst commented Apr 4, 2024 •

edited

Loading

antiagainst Apr 4, 2024 •

edited

Loading