Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rules_go fails with bazel@HEAD, but not tied to a commit #732

Open
c-parsons opened this issue Jun 20, 2019 · 16 comments
Open

rules_go fails with bazel@HEAD, but not tied to a commit #732

c-parsons opened this issue Jun 20, 2019 · 16 comments

Comments

@c-parsons
Copy link
Contributor

rules_go fails with bazel@HEAD, but it doesn't seem tied to any particular bazel commit.
The failing tests are coverage-related, and seem to be that the coverage file is not found

https://buildkite.com/bazel/culprit-finder/builds/171#863978e5-2c65-4b87-8f9f-3aa1487bc77c

I also tried triggering the rules_go CI with release Bazel, and got partial same failures:

https://buildkite.com/bazel/rules-go-golang/builds/1196

There hasn't been a recent commit in rules_go, so I have to assume this is a CI issue.

@laurentlb
Copy link
Contributor

See also bazelbuild/rules_go#2100

@philwo
Copy link
Member

philwo commented Jun 21, 2019

@c-parsons I think this is caused by the release of Bazel 0.27.0.

We didn't change anything on CI itself in quite a while.

@c-parsons
Copy link
Contributor Author

Pardon my ignorance, but how would the release of Bazel 0.27.0 affect testing rules_go against Bazel@HEAD at various commits?

A Bazel release affecting rules_go CI for Bazel@Release would make sense, but I'd expect to be able to bisect the Bazel@HEAD failures and tie them to a Bazel commit...

@philwo
Copy link
Member

philwo commented Jun 21, 2019

@c-parsons That's a good question, but rules_go is definitely green with 0.26.1 and red with 0.27.0: https://buildkite.com/bazel/rules-go-golang/builds/1203 (upper "Ubuntu 18.04" is Bazel 0.26.1, lower "Ubuntu 18.04" is with Bazel 0.27.0).

The "known good" commit that culprit finder ran with in your linked build above (a44ea875254c5a630000f1838764e525cdb864ce) is only 7 days old, so maybe it's just not old enough? Maybe the breakage was much earlier?

@philwo
Copy link
Member

philwo commented Jun 21, 2019

I'm running another culprit finder here with good commit = 0.27.0 and bad commit = 0.26.1, let's see how that goes: https://buildkite.com/bazel/culprit-finder/builds/172

@philwo
Copy link
Member

philwo commented Jun 21, 2019

Culprit finder is failing, because it can't find the Bazel binaries it needs on GCS. :(

I have to leave the office now - maybe you can just bisect this locally on your machine?

@c-parsons
Copy link
Contributor Author

So, this is failed not tied to a release because these tests uses some details about the host machine, particularly it will, in test, run /usr/bin/bazel. I bet that when we released Bazel 0.27.0, the CI machines permanently have 0.27.0 at that location, even for bazel@HEAD tests.

I think we might be able to alter which bazel is used in this test runner by setting the BAZEL environment variable...

Still, what frustrating nondeterminism...

@laurentlb
Copy link
Contributor

cc @jayconrod

Good catch! This explains why the target was failing for me with old commits and old Bazel versions.

@jayconrod
Copy link

I opened bazelbuild/bazel#8670. I believe it's a bug in coverage.

@philwo
Copy link
Member

philwo commented Jun 21, 2019

Oh, wow. Yes, due to the Bazel detection logic in rules_go's bazel_tests.bzl, the "Bazel@HEAD" tests never ran these tests with HEAD, instead they always used the latest release. That's why we didn't detect this breakage before 0.27.0 was released.

I have to think about how to prevent this from happening on CI next week.

@laurentlb
Copy link
Contributor

laurentlb commented Jun 21, 2019

The test passes locally when I use 0.26.1 and I set the BAZEL variable:

BAZEL=~/.cache/bazelisk/bin/bazel-0.26.1-linux-x86_64 bazelisk test //tests/core/coverage:coverage_test_test 

So the breakage is caused by 0.27. But I cannot reproduce it with any incompatible flag, so it's a bug in Bazel.

@meteorcloudy
Copy link
Member

@laurentlb Did we find the bug in Bazel? Should we rollback the culprit?

@jayconrod
Copy link

@meteorcloudy The issue was bazelbuild/bazel#8670. The fix is cherry-picked in bazel 0.27.1.rc1, and I've verified it fixes rules_go CI (log).

@meteorcloudy
Copy link
Member

meteorcloudy commented Jun 28, 2019

@jayconrod Thanks for the notice!

@hlopko
Copy link
Member

hlopko commented Jul 2, 2019

For the reference, https://github.com/bazelbuild/rules_go/blob/master/tests/bazel_tests.bzl#L331 is the place where rules_go try to find Bazel binary. Bazel CI currently doesn't set BAZEL environment variable so this logic is incorrect when running on Bazel CI.

@philwo @jayconrod do you have ideas how we can make this work? Can Bazel CI set BAZEL env var? Can we communicate Bazel binary in some other way? Should we do it at all?

@jayconrod
Copy link

@hlopko The issue here was a regression in coverage support in 0.27.0. It should be fixed in 0.27.1. I think the tests were just picking up bazel from PATH.

I think the BAZEL environment variable was a vestige from Jenkins. It seems better to just get the target version of bazel from PATH though. Most tools and local tests will pick it up automatically without having to be customized for this environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants