Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel/ci: unify developer-local and CI build generation. #747

Merged
merged 10 commits into from
Apr 13, 2017

Conversation

htuch
Copy link
Member

@htuch htuch commented Apr 12, 2017

This patch dedupes the distinct approaches that existed previously, where we assembled hand curated
BUILD files for external dependencies to be used in developer-local builds and used prebuilt
artifacts compiled under the external dependency's native build system for CI.

In the new approach, the CI flow continues to prebuild artifacts with the build recipes and
recursive make in ci/build_container/{Makefile,build_recipes}, ahead of time and prior to any
invocation of Bazel.

Developer-local builds will not prebuild, but instead invoke the same build recipes and recursive
make under a Bazel repository_rule.

A significant improvement that is also provided by this patch is the automatic workspace population
via repositories.bzl. See the changes to WORKSPACE and ci/{WORKSPACE,WORKSPACE.consumer}. Projects
that consume Envoy, e.g. to link in additional filters, no longer need to track Envoy's
dependencies and maintain their own bind rules, this is now automagic.

WARNING: Any external consumer of the Bazel build will need to update their WORKSPACE definitions
after this patch is merged.

This patch dedupes the distinct approaches that existed previously, where we assembled hand curated
BUILD files for external dependencies to be used in developer-local builds and used prebuilt
artifacts compiled under the external dependency's native build system for CI.

In the new approach, the CI flow continues to prebuild artifacts with the build recipes and
recursive make in ci/build_container/{Makefile,build_recipes}, ahead of time and prior to any
invocation of Bazel.

Developer-local builds will not prebuild, but instead invoke the same build recipes and recursive
make under a Bazel repository_rule.

A significant improvement that is also provided by this patch is the automatic workspace population
via repositories.bzl. See the changes to WORKSPACE and ci/{WORKSPACE,WORKSPACE.consumer}. Projects
that consume Envoy, e.g. to link in additional filters, no longer need to track Envoy's
dependencies and maintain their own bind rules, this is now automagic.

WARNING: Any external consumer of the Bazel build will need to update their WORKSPACE definitions
after this patch is merged.
@htuch
Copy link
Member Author

htuch commented Apr 12, 2017

This is a simpler version of #716, I wish I had known about repository_rule beforehand, I'll update the Bazel team on the need for this kind of guidance.

@mattklein123 for review and Lyft integration discussion. @lizan for Bazel review. @dnoe and @rlazarus for dog fooding with the new dependencies they plan on adding to gflags and backwards/elfutils.

@lizan
Copy link
Member

lizan commented Apr 12, 2017

Is this change requires a new step to running tests? checked out the SHA and did bazel test //test/... failed while master doesn't, error log:

% bazel test //test/...
ERROR: /home/zlizan/github/lizan/envoy/bazel/repositories.bzl:122:5: no such package '@envoy_deps//': Not a file: /home/zlizan/github/lizan/envoy/ci/build_container/recipes.bzl and referenced by '//external:protobuf'.
ERROR: Analysis of target '//test/proto:helloworld_proto' failed; build aborted.
INFO: Elapsed time: 0.305s
ERROR: Couldn't start the build. Unable to run tests.

@htuch
Copy link
Member Author

htuch commented Apr 12, 2017

@lizan Fix pushed, please try again.

# v1.8.0 release
commit = "ec44c6c1675c25b9827aacd08c02433cccde7780",
remote = "https://github.com/google/googletest.git",
envoy_repository(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having one repository for all dependencies doesn't look good. Is it possible to split them to one repository per dependency and generalize the repository_rule? (furthermore, upstreaming the rule to bazel would be nice.)

I imagine it can be generalized to something like:

autoconf_cc_repository(
    name = "nghttp2",
    source_tar = "https://github.com/nghttp2/nghttp2/releases/download/v1.20.0/nghttp2-1.20.0.tar.gz",
)

With one large repository it is hard to swap one of them from consumers, and rebuilding whole dependencies while upgrading one of them will take longer time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been debating which way to go with this. On the one hand, one rule per dependency is cleaner for the reasons you state. OTOH, there is a performance hit, since the dependencies are built independently and can't coordinate under a single recursive make job server.

If we build with make under each distinct target, then either we have to be conservative and set -j <some small number> to avoid killing the machine, or set -j $NUM_CPUS and hope/pray they don't all try and run jobs at once. @mattklein123 has already pointed out situations where the Bazel build runs too many jobs for small build VMs.

So, as much as I'd like to do one target per repository, I feel the expedient thing to do is what is there today. This is really just limitation of repository_rule (CC bazelbuild/bazel#2814). The right thing might be for Bazel to natively support acting as a make job server and coordinate each of the invoked make processes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a good reference on the make job server: http://make.mad-scientist.net/papers/jobserver-implementation/ BTW. I feel this kind of issue really speaks to the impedance mismatch that exists today in Bazel for handling non-Bazel builds.

Copy link
Member Author

@htuch htuch Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related chain of thought to this. It seems that repository_rules execute sequentially (although this does not seem a guaranteed aspect of the specification). This opens up the possibility of just doing -j $NUM_CPUS, since the make jobs don't overlap. However, this will still be much slower than what we have today, since one of the other latency hiding aspects of parallel recursive make is that the automake/autoconf phases of each dependency's build, which is slow and single threaded, overlaps with other dependency's CPU intensive make phase.

So, we're still back to a single recursive make being the best way to get build time down to something like 2 minutes (on my workstation) instead of > 5 minutes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean that to run make for each target should be parallel (of course it is possible), I just meant if you change a dependency version, this would require rebuild whole dependencies. Same for consumer, istio would consume protobuf that comes from grpc, so we don't want the dependency script build another protobuf.

Having a make job server is definitely an improvement, but even without that, given repository_rules execute sequentially today, so we wouldn't have too many jobs.

Copy link
Member Author

@htuch htuch Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to avoid consuming projects having to build all of Envoy's deps if they have their own versions. I've fixed this in the latest update, with a skip_targets option to envoy_dependencies. This list gets plumbed all the way down to the underlying make invocation, it's not just a bind omission anymore.

Having a monolithic dependency means that developers need to rebuild everything each time a dependency changes. However, this is rare. Much more frequent is the case that a developer checks out a new tree. So, I think we should optimize for the common case and make this process fast.

I've added code to the build system to profile how long each step in the build recipes takes, and to compare two different styles of build:

  1. Build everything in parallel under recursive make, under a make jobserver with all CPUs available.

  2. Build each dependency (via its build recipe) sequentially with all CPUs available at each build. This simulates what we would see if we had separate repository_rules for each dependency.

The total time for (1) is 1m 49s. The total time for (2) is 3m 9s, almost 70% increase in build time and definitely noticeable.

To understand why it's slower, take a look at https://github.com/htuch/data-dump/tree/master/envoy-build-profiles. It's clear that the configure and buildconf steps take substantial time, and in (1) they overlap with each other and CPU intensive work, hiding the latency and in (2) they bottleneck and the machine is sitting mostly idle while they happen.

So, I think we should stick with the monolithic dep.

As far as repository_rule goes and more general Bazel support, this is something that needs to be addressed foundationally. If repository_rules execute sequentially, then the machine will be underutilized during the autoconf steps. If they execute in parallel, there is a need to basically act like a make jobserver to coordinate them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting the skip_targets all the way down and the analysis. For now I think this is good to go here, let's revisit later once bazel have better support.

envoy_repository = repository_rule(
implementation = _repository_impl,
local = debug_build,
environ = ["CC", "CXX", "LD_LIBRARY_PATH"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bazel doesn't set these env vars (at least CC) if it wasn't there when you invoke bazel, my local build failed without setting them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've fixed this, could you try again?

remote = "https://github.com/nodejs/http-parser.git",
commit = "9b0d5b33ebdaacff1dadd06bad4e198b11ff880e",
build_file_content = BUILD,
native.bind(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bind forcibly, have a flag to control this.

Copy link
Member Author

@htuch htuch Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a skip_bind list parameter that can be used to exclude some binds.

@@ -4,7 +4,8 @@ set -e

# Setup basic requirements and install them.
apt-get update
apt-get install -y wget software-properties-common make cmake git python python-pip clang-format-3.6 bc libtool automake lcov zip
apt-get install -y wget software-properties-common make cmake git python python-pip \
clang-format-3.6 bc libtool automake lcov zip time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need lcov?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nup. Should I sneak the removal into this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to remove it. I can just merge once you update the PR.

@mattklein123 mattklein123 merged commit ebbf7cd into envoyproxy:master Apr 13, 2017
htuch added a commit to htuch/envoy that referenced this pull request Apr 14, 2017
After envoyproxy#747, we no longer have the BUILD_DISTINCT=1 set of dependencies in the CI image, so fix the
gcovr path to solve the failures in envoyproxy#760.
htuch added a commit to htuch/envoy that referenced this pull request Apr 14, 2017
After envoyproxy#747, we no longer have the BUILD_DISTINCT=1 set of dependencies in the CI image, so fix the
gcovr path to solve the failures in envoyproxy#760.

Also fixed a snafu in configs/configgen.sh that I hit when running Docker as non-root. It was trying
to peek at the user's home directory, with no passwd entry.
mattklein123 pushed a commit that referenced this pull request Apr 17, 2017
After #747, we no longer have the BUILD_DISTINCT=1 set of dependencies in the CI image, so fix the
gcovr path to solve the failures in #760.

Also fixed a snafu in configs/configgen.sh that I hit when running Docker as non-root. It was trying
to peek at the user's home directory, with no passwd entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants