-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bazel/ci: unify developer-local and CI build generation. #747
Conversation
This patch dedupes the distinct approaches that existed previously, where we assembled hand curated BUILD files for external dependencies to be used in developer-local builds and used prebuilt artifacts compiled under the external dependency's native build system for CI. In the new approach, the CI flow continues to prebuild artifacts with the build recipes and recursive make in ci/build_container/{Makefile,build_recipes}, ahead of time and prior to any invocation of Bazel. Developer-local builds will not prebuild, but instead invoke the same build recipes and recursive make under a Bazel repository_rule. A significant improvement that is also provided by this patch is the automatic workspace population via repositories.bzl. See the changes to WORKSPACE and ci/{WORKSPACE,WORKSPACE.consumer}. Projects that consume Envoy, e.g. to link in additional filters, no longer need to track Envoy's dependencies and maintain their own bind rules, this is now automagic. WARNING: Any external consumer of the Bazel build will need to update their WORKSPACE definitions after this patch is merged.
This is a simpler version of #716, I wish I had known about @mattklein123 for review and Lyft integration discussion. @lizan for Bazel review. @dnoe and @rlazarus for dog fooding with the new dependencies they plan on adding to gflags and backwards/elfutils. |
Is this change requires a new step to running tests? checked out the SHA and did
|
@lizan Fix pushed, please try again. |
bazel/repositories.bzl
Outdated
# v1.8.0 release | ||
commit = "ec44c6c1675c25b9827aacd08c02433cccde7780", | ||
remote = "https://github.com/google/googletest.git", | ||
envoy_repository( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having one repository for all dependencies doesn't look good. Is it possible to split them to one repository per dependency and generalize the repository_rule? (furthermore, upstreaming the rule to bazel would be nice.)
I imagine it can be generalized to something like:
autoconf_cc_repository(
name = "nghttp2",
source_tar = "https://github.com/nghttp2/nghttp2/releases/download/v1.20.0/nghttp2-1.20.0.tar.gz",
)
With one large repository it is hard to swap one of them from consumers, and rebuilding whole dependencies while upgrading one of them will take longer time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been debating which way to go with this. On the one hand, one rule per dependency is cleaner for the reasons you state. OTOH, there is a performance hit, since the dependencies are built independently and can't coordinate under a single recursive make job server.
If we build with make under each distinct target, then either we have to be conservative and set -j <some small number>
to avoid killing the machine, or set -j $NUM_CPUS
and hope/pray they don't all try and run jobs at once. @mattklein123 has already pointed out situations where the Bazel build runs too many jobs for small build VMs.
So, as much as I'd like to do one target per repository, I feel the expedient thing to do is what is there today. This is really just limitation of repository_rule
(CC bazelbuild/bazel#2814). The right thing might be for Bazel to natively support acting as a make job server and coordinate each of the invoked make processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a good reference on the make job server: http://make.mad-scientist.net/papers/jobserver-implementation/ BTW. I feel this kind of issue really speaks to the impedance mismatch that exists today in Bazel for handling non-Bazel builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related chain of thought to this. It seems that repository_rule
s execute sequentially (although this does not seem a guaranteed aspect of the specification). This opens up the possibility of just doing -j $NUM_CPUS
, since the make
jobs don't overlap. However, this will still be much slower than what we have today, since one of the other latency hiding aspects of parallel recursive make is that the automake/autoconf phases of each dependency's build, which is slow and single threaded, overlaps with other dependency's CPU intensive make
phase.
So, we're still back to a single recursive make being the best way to get build time down to something like 2 minutes (on my workstation) instead of > 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean that to run make
for each target should be parallel (of course it is possible), I just meant if you change a dependency version, this would require rebuild whole dependencies. Same for consumer, istio would consume protobuf that comes from grpc, so we don't want the dependency script build another protobuf.
Having a make job server is definitely an improvement, but even without that, given repository_rule
s execute sequentially today, so we wouldn't have too many jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we want to avoid consuming projects having to build all of Envoy's deps if they have their own versions. I've fixed this in the latest update, with a skip_targets
option to envoy_dependencies
. This list gets plumbed all the way down to the underlying make
invocation, it's not just a bind omission anymore.
Having a monolithic dependency means that developers need to rebuild everything each time a dependency changes. However, this is rare. Much more frequent is the case that a developer checks out a new tree. So, I think we should optimize for the common case and make this process fast.
I've added code to the build system to profile how long each step in the build recipes takes, and to compare two different styles of build:
-
Build everything in parallel under recursive make, under a make jobserver with all CPUs available.
-
Build each dependency (via its build recipe) sequentially with all CPUs available at each build. This simulates what we would see if we had separate repository_rules for each dependency.
The total time for (1) is 1m 49s. The total time for (2) is 3m 9s, almost 70% increase in build time and definitely noticeable.
To understand why it's slower, take a look at https://github.com/htuch/data-dump/tree/master/envoy-build-profiles. It's clear that the configure
and buildconf
steps take substantial time, and in (1) they overlap with each other and CPU intensive work, hiding the latency and in (2) they bottleneck and the machine is sitting mostly idle while they happen.
So, I think we should stick with the monolithic dep.
As far as repository_rule
goes and more general Bazel support, this is something that needs to be addressed foundationally. If repository_rule
s execute sequentially, then the machine will be underutilized during the autoconf steps. If they execute in parallel, there is a need to basically act like a make jobserver to coordinate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting the skip_targets
all the way down and the analysis. For now I think this is good to go here, let's revisit later once bazel have better support.
bazel/repositories.bzl
Outdated
envoy_repository = repository_rule( | ||
implementation = _repository_impl, | ||
local = debug_build, | ||
environ = ["CC", "CXX", "LD_LIBRARY_PATH"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bazel doesn't set these env vars (at least CC) if it wasn't there when you invoke bazel, my local build failed without setting them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've fixed this, could you try again?
bazel/repositories.bzl
Outdated
remote = "https://github.com/nodejs/http-parser.git", | ||
commit = "9b0d5b33ebdaacff1dadd06bad4e198b11ff880e", | ||
build_file_content = BUILD, | ||
native.bind( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't bind forcibly, have a flag to control this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a skip_bind
list parameter that can be used to exclude some binds.
Support to compare sequential vs. parallel builds of deps.
@@ -4,7 +4,8 @@ set -e | |||
|
|||
# Setup basic requirements and install them. | |||
apt-get update | |||
apt-get install -y wget software-properties-common make cmake git python python-pip clang-format-3.6 bc libtool automake lcov zip | |||
apt-get install -y wget software-properties-common make cmake git python python-pip \ | |||
clang-format-3.6 bc libtool automake lcov zip time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need lcov?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nup. Should I sneak the removal into this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, feel free to remove it. I can just merge once you update the PR.
After envoyproxy#747, we no longer have the BUILD_DISTINCT=1 set of dependencies in the CI image, so fix the gcovr path to solve the failures in envoyproxy#760.
After envoyproxy#747, we no longer have the BUILD_DISTINCT=1 set of dependencies in the CI image, so fix the gcovr path to solve the failures in envoyproxy#760. Also fixed a snafu in configs/configgen.sh that I hit when running Docker as non-root. It was trying to peek at the user's home directory, with no passwd entry.
After #747, we no longer have the BUILD_DISTINCT=1 set of dependencies in the CI image, so fix the gcovr path to solve the failures in #760. Also fixed a snafu in configs/configgen.sh that I hit when running Docker as non-root. It was trying to peek at the user's home directory, with no passwd entry.
This patch dedupes the distinct approaches that existed previously, where we assembled hand curated
BUILD files for external dependencies to be used in developer-local builds and used prebuilt
artifacts compiled under the external dependency's native build system for CI.
In the new approach, the CI flow continues to prebuild artifacts with the build recipes and
recursive make in ci/build_container/{Makefile,build_recipes}, ahead of time and prior to any
invocation of Bazel.
Developer-local builds will not prebuild, but instead invoke the same build recipes and recursive
make under a Bazel repository_rule.
A significant improvement that is also provided by this patch is the automatic workspace population
via repositories.bzl. See the changes to WORKSPACE and ci/{WORKSPACE,WORKSPACE.consumer}. Projects
that consume Envoy, e.g. to link in additional filters, no longer need to track Envoy's
dependencies and maintain their own bind rules, this is now automagic.
WARNING: Any external consumer of the Bazel build will need to update their WORKSPACE definitions
after this patch is merged.