-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bazel: use GCS remote cache #4050
Conversation
Signed-off-by: Lizan Zhou <zlizan@google.com>
ci/setup_gcs_cache.sh
Outdated
rm -rf /tmp/gcp_service_account.json | ||
} | ||
|
||
if [[ ! -z "${GCP_SERVICE_ACCOUNT_KEY}" && ! -z "${BAZEL_REMOTE_CACHE}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to grab me on slack and I can setup the env variables for you so we can test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to add those env variables to CircleCI
Signed-off-by: Lizan Zhou <zlizan@google.com>
ci/do_circle_ci.sh
Outdated
@@ -18,6 +18,8 @@ export NUM_CPUS=8 | |||
# (see https://circleci.com/docs/2.0/executor-types/#using-machine) | |||
export BAZEL_EXTRA_TEST_OPTIONS="--test_env=ENVOY_IP_TEST_VERSIONS=v4only" | |||
|
|||
export BAZEL_REMOTE_CACHE="${BAZEL_REMOTE_CACHE:-https://storage.googleapis.com/envoy-circleci-bazel-cache/}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more appropriate to set this in .circleci/config.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, can we make this end-user configurable and document how individual developers can also make use of it? Maybe in https://github.com/envoyproxy/envoy/blob/master/bazel/README.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Specifically for when working outside a Docker image)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, excited to see this land; any idea of speedups observed so far?
ci/setup_gcs_cache.sh
Outdated
|
||
gcp_service_account_cleanup() { | ||
echo "Deleting service account key file..." | ||
rm -rf /tmp/gcp_service_account.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TMPDIR
or something, probably bad to assume the existent of /tmp
.
ci/setup_gcs_cache.sh
Outdated
if [[ ! -z "${BAZEL_REMOTE_CACHE}" ]]; then | ||
|
||
if [[ ! -z "${GCP_SERVICE_ACCOUNT_KEY}" ]]; then | ||
echo "${GCP_SERVICE_ACCOUNT_KEY}" | base64 --decode > /tmp/gcp_service_account.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mktemp
ci/setup_gcs_cache.sh
Outdated
trap gcp_service_account_cleanup EXIT | ||
|
||
export BAZEL_BUILD_EXTRA_OPTIONS="${BAZEL_BUILD_EXTRA_OPTIONS} \ | ||
--remote_http_cache=${BAZEL_REMOTE_CACHE} --google_credentials=/tmp/gcp_service_account.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to use filesystem? If so, best to make sure that the file perms are appropriately restricted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, switched to using mktemp
also addresses the permission.
ci/do_circle_ci.sh
Outdated
@@ -18,6 +18,8 @@ export NUM_CPUS=8 | |||
# (see https://circleci.com/docs/2.0/executor-types/#using-machine) | |||
export BAZEL_EXTRA_TEST_OPTIONS="--test_env=ENVOY_IP_TEST_VERSIONS=v4only" | |||
|
|||
export BAZEL_REMOTE_CACHE="${BAZEL_REMOTE_CACHE:-https://storage.googleapis.com/envoy-circleci-bazel-cache/}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Specifically for when working outside a Docker image)
Signed-off-by: Lizan Zhou <zlizan@google.com>
Performance can be tracked at https://circleci.com/gh/envoyproxy/envoy/tree/ci_gcs_remote_cache , it doesn't looks like this helped a lot, perhaps the network is not fast enough compared to compilation? Needs more investigation. |
The cache hit ratio on that branch is pretty terrible, e.g. for From https://circleci.com/gh/envoyproxy/envoy/79682 (
~0.5% cache hit ratio, total time: 1:16:05. From https://circleci.com/gh/envoyproxy/envoy/79704 (
~8% cache hit ratio, total time: 1:10:28. From https://circleci.com/gh/envoyproxy/envoy/79745 (
~6% cache hit ratio, total time: 1:11:56. From https://circleci.com/gh/envoyproxy/envoy/80260 (
~17.5% cache hit ratio, total time: 55:33. The last one is quite surprising, because it should be close to 100%, so something seems busted. Anyway, the issue with caching is that any changes to the environment, |
The low cache hit rate is mostly due to
I would expect after multiple runs the hit rate will be improved. |
This won't be true unless we expose environment variables to PRs from forks (currently it doesn't and I don't feel it is safe), so the PRs from forks (like #4049 compared to this) will have read-only cache. |
From https://circleci.com/gh/envoyproxy/envoy/80286 (
~17.5% cache hit ratio (same as before), total time: 1:20:06.
Good point, and agreed wrt security, but looking at the PRs history, it seems that only Lyft folks are using the main repo for PRs, and everybody else is using private forks, so this approach won't result in huge improvements. |
Seems mac improves quite a bit during this PR, it might be because mac runs first (without downloading docker image) and fills GCS rate limit soon. https://circleci.com/gh/envoyproxy/envoy/80290 (total time: 55:35)
https://circleci.com/gh/envoyproxy/envoy/80368 (total time: 26:35)
|
The read-only build cache will be kept updated from master builds, so each PR (unless it changes |
Signed-off-by: Lizan Zhou <zlizan@google.com>
7feefdc
to
e49e7db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff looks good, will continue discussion in PR on efficacy.
ci/setup_gcs_cache.sh
Outdated
set -e | ||
|
||
if [[ ! -z "${BAZEL_REMOTE_CACHE}" ]]; then | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove superfluous blank lines here and below.
I see evidence of rate limiting on the branch CI, e.g. https://circleci.com/gh/envoyproxy/envoy/80440. Is this killing our ability to effectively cache? Would we be better off with a dedicated cache server, e.g. Google Cloud Build? Are we should be hitting the 200 objects modifications per second at https://cloud.google.com/storage/quotas? |
Signed-off-by: Lizan Zhou <zlizan@google.com>
@htuch the 200 per second is for object composition. Per https://cloud.google.com/storage/docs/request-rate GCS doesn't really have a hard limit but need warm up starting from 1000 req/s write and 5000 req/s read, I would expect after we merge this, CI will use cache for every commit: read/write from master builds, read from each PR builds, the storage will be warmer. |
@lizan LGTM; is there anything holding us back on merging this? |
Signed-off-by: Lizan Zhou <zlizan@google.com>
@htuch I was investigating the coverage drop for runs on CircleCI, turns out the remote cache doesn't cache gcno files unless you do |
Disabled the cache for coverage, I'll address it in a follow up. Let's merge this first and see how it goes on master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look, up in the sky! It's a bird! It's a plane! It's Bazel remote caching!
* origin/master: (38 commits) test: add tests for corner-cases around sending requests before run() starts or after run() ends. (envoyproxy#4114) perf: reduce the memory usage of LC Trie construction (envoyproxy#4117) test: moving redundant code in websocket_integration_test to utilities (envoyproxy#4127) test: make YamlLoadFromStringFail less picky about error msg. (envoyproxy#4141) rbac: add rbac network filter. (envoyproxy#4083) fuzz: route lookup and header finalization fuzzer. (envoyproxy#4116) Set content-type and content-length (envoyproxy#4113) fault: use FractionalPercent for percent (envoyproxy#3978) test: Fix inverted exact match logic in IntegrationTcpClient::waitForData() (envoyproxy#4134) Added cluster_name to load assignment config for static cluster (envoyproxy#4123) ssl: refactor ContextConfig to use TlsCertificateConfig (envoyproxy#4115) syscall: refactor OsSysCalls for deeper errno latching (envoyproxy#4111) thrift_proxy: fix oneway bugs (envoyproxy#4025) Do not crash when converting YAML to JSON fails (envoyproxy#4110) config: allow unknown fields flag (take 2) (envoyproxy#4096) Use a jittered backoff strategy for handling HdsDelegate stream/connection failures (envoyproxy#4108) bazel: use GCS remote cache (envoyproxy#4050) Add thread local cache of overload action states (envoyproxy#4090) Added TCP healthcheck capabilities to the HdsDelegate (envoyproxy#4079) secret: add secret provider interface and use it for TlsCertificates (envoyproxy#4086) ... Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Lizan Zhou zlizan@google.com
Partially addresses #3741
Description:
Use GCS remote cache for CI builds when key is provided.
Risk Level: Low
Testing: CI
Docs Changes:
Release Notes: