-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GCS for Windows ccache #13183
Merged
Merged
Use GCS for Windows ccache #13183
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit afe58a8.
GMNGeoffrey
added
the
platform/windows 🚪
Windows-specific build, execution, benchmarking, and deployment
label
Apr 20, 2023
ScottTodd
approved these changes
Apr 20, 2023
GMNGeoffrey
added a commit
that referenced
this pull request
Apr 20, 2023
The GitHub-provided `actions/checkout` action is for some reason unusably slow on the large managed Windows runners. We assumed this was because of some tricky IO issue or something, but I decide to just try directly using `git` commands to see and lo the checkout time goes from 10 minutes to 1.5 🚀 With the caching improvements from #13183, this gets the Windows build down under 10 minutes, which means we can run it on presubmit (left for a future PR). Part of #11009 Tested: Enabled this workflow on push to my branch: https://github.com/openxla/iree/actions/runs/4750681034/jobs/8439091687 skip-ci: this only affects the Windows job, which isn't run on presubmit
jpienaar
pushed a commit
that referenced
this pull request
May 1, 2023
We have found the GitHub actions built-in caching mechanism to be extremely limiting: slow, small, and buggy. Switch instead to using our own remote ccache hosted on GCS. This matches our Linux builds on our self-hosted runners except that we have to do GCS auth through service account keys, unfortunately, which means that access is restricted to postsubmit runs. Luckily, for these builds we're generally doing everything in one job and just want caching (which we only write on postsubmit anyway) and don't need artifact storage (which we'd need on presubmit too). Tested: Ran on this PR (hacked the workflow a bit). An [initial run](https://github.com/openxla/iree/actions/runs/4750257226/jobs/8438272681) with an empty cache took 28m total, 15.5m of which was in the build step. This includes writing the remote cache (minor overhead). A [rerun](https://github.com/openxla/iree/actions/runs/4750257226/jobs/8438619413) with a now populated cache took 14m total, 6.5m of which was in the build step. 79% of compiler calls were cacheable and of those 99% were remote cache hits. Contrast with a [recent post-submit run](https://github.com/openxla/iree/actions/runs/4748717136/jobs/8435229260) that ran on a docs-only change (so should've had a maximally populated cache), which took 20m, 7m of which was the build step, 2m of which was fetching the cache, and 1m of which was saving the cache. That's setting aside [runs like this one](https://github.com/openxla/iree/actions/runs/4741863995/jobs/8419465087) where fetching the cache just times out entirely (with no alerting other than if you happen to look at the UI). Tragically, most of the time in all of these jobs is spent just checking out the repository and submodules (see actions/checkout#1186). Overall this seems like a marked improvement. The main wins are in avoiding tons of complexity futzing with cache compression levels and restoring and saving the cache (actual cached build time is ~unchanged). Part of #13028 skip-ci: Windows builds don't run on presubmit
jpienaar
pushed a commit
that referenced
this pull request
May 1, 2023
The GitHub-provided `actions/checkout` action is for some reason unusably slow on the large managed Windows runners. We assumed this was because of some tricky IO issue or something, but I decide to just try directly using `git` commands to see and lo the checkout time goes from 10 minutes to 1.5 🚀 With the caching improvements from #13183, this gets the Windows build down under 10 minutes, which means we can run it on presubmit (left for a future PR). Part of #11009 Tested: Enabled this workflow on push to my branch: https://github.com/openxla/iree/actions/runs/4750681034/jobs/8439091687 skip-ci: this only affects the Windows job, which isn't run on presubmit
NatashaKnk
pushed a commit
to NatashaKnk/iree
that referenced
this pull request
Jul 6, 2023
We have found the GitHub actions built-in caching mechanism to be extremely limiting: slow, small, and buggy. Switch instead to using our own remote ccache hosted on GCS. This matches our Linux builds on our self-hosted runners except that we have to do GCS auth through service account keys, unfortunately, which means that access is restricted to postsubmit runs. Luckily, for these builds we're generally doing everything in one job and just want caching (which we only write on postsubmit anyway) and don't need artifact storage (which we'd need on presubmit too). Tested: Ran on this PR (hacked the workflow a bit). An [initial run](https://github.com/openxla/iree/actions/runs/4750257226/jobs/8438272681) with an empty cache took 28m total, 15.5m of which was in the build step. This includes writing the remote cache (minor overhead). A [rerun](https://github.com/openxla/iree/actions/runs/4750257226/jobs/8438619413) with a now populated cache took 14m total, 6.5m of which was in the build step. 79% of compiler calls were cacheable and of those 99% were remote cache hits. Contrast with a [recent post-submit run](https://github.com/openxla/iree/actions/runs/4748717136/jobs/8435229260) that ran on a docs-only change (so should've had a maximally populated cache), which took 20m, 7m of which was the build step, 2m of which was fetching the cache, and 1m of which was saving the cache. That's setting aside [runs like this one](https://github.com/openxla/iree/actions/runs/4741863995/jobs/8419465087) where fetching the cache just times out entirely (with no alerting other than if you happen to look at the UI). Tragically, most of the time in all of these jobs is spent just checking out the repository and submodules (see actions/checkout#1186). Overall this seems like a marked improvement. The main wins are in avoiding tons of complexity futzing with cache compression levels and restoring and saving the cache (actual cached build time is ~unchanged). Part of iree-org#13028 skip-ci: Windows builds don't run on presubmit
NatashaKnk
pushed a commit
to NatashaKnk/iree
that referenced
this pull request
Jul 6, 2023
…3186) The GitHub-provided `actions/checkout` action is for some reason unusably slow on the large managed Windows runners. We assumed this was because of some tricky IO issue or something, but I decide to just try directly using `git` commands to see and lo the checkout time goes from 10 minutes to 1.5 🚀 With the caching improvements from iree-org#13183, this gets the Windows build down under 10 minutes, which means we can run it on presubmit (left for a future PR). Part of iree-org#11009 Tested: Enabled this workflow on push to my branch: https://github.com/openxla/iree/actions/runs/4750681034/jobs/8439091687 skip-ci: this only affects the Windows job, which isn't run on presubmit
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
infrastructure
Relating to build systems, CI, or testing
platform/windows 🚪
Windows-specific build, execution, benchmarking, and deployment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have found the GitHub actions built-in caching mechanism to be
extremely limiting: slow, small, and buggy. Switch instead to using our
own remote ccache hosted on GCS. This matches our Linux builds on our
self-hosted runners except that we have to do GCS auth through service
account keys, unfortunately, which means that access is restricted to
postsubmit runs. Luckily, for these builds we're generally doing
everything in one job and just want caching (which we only write on
postsubmit anyway) and don't need artifact storage (which we'd need on
presubmit too).
Tested:
Ran on this PR (hacked the workflow a bit). An
initial run
with an empty cache took 28m total, 15.5m of which was in the build
step. This includes writing the remote cache (minor overhead). A
rerun
with a now populated cache took 14m total, 6.5m of which was in the
build step. 79% of compiler calls were cacheable and of those 99%
were remote cache hits. Contrast with a
recent post-submit run
that ran on a docs-only change (so should've had a maximally populated
cache), which took 20m, 7m of which was the build step, 2m of which was
fetching the cache, and 1m of which was saving the cache. That's
setting aside
runs like this one
where fetching the cache just times out entirely (with no alerting
other than if you happen to look at the UI).
Tragically, most of the time in all of these jobs is spent just
checking out the repository and submodules (see
actions/checkout#1186).
Overall this seems like a marked improvement. The main wins are in
avoiding tons of complexity futzing with cache compression levels and
restoring and saving the cache (actual cached build time is
~unchanged).
Part of #13028
skip-ci: Windows builds don't run on presubmit