Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bazel CI] Tests are slow and flaky on macOS arm64 #23726

Closed
meteorcloudy opened this issue Sep 24, 2024 · 10 comments
Closed

[Bazel CI] Tests are slow and flaky on macOS arm64 #23726

meteorcloudy opened this issue Sep 24, 2024 · 10 comments
Labels
breakage P1 I'll work on this now. (Assignee required) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug

Comments

@meteorcloudy
Copy link
Member

//src/test/shell/bazel:bazel_bootstrap_distfile_tar_test and //src/test/shell/bazel:bazel_determinism_test are extremely slow:
https://buildkite.com/bazel/google-bazel-presubmit/builds/84154#01922292-cca6-4e56-939b-05a5c7b59da1

Flaky tests are very frequent:
https://buildkite.com/bazel/google-bazel-presubmit/builds/84157#019222c1-653c-4a75-bdf7-7f76d5937946

//src/test/py/bazel:cc_import_test                                        FLAKY, failed in 1 out of 2 in 69.9s
  Stats over 2 runs: max = 69.9s, min = 55.5s, avg = 62.7s, dev = 7.2s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/py/bazel/cc_import_test/test_attempts/attempt_1.log
//src/test/shell/integration:config_stripped_outputs_test                 FLAKY, failed in 1 out of 2 in 126.8s
  Stats over 2 runs: max = 126.8s, min = 78.5s, avg = 102.6s, dev = 24.2s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/config_stripped_outputs_test/test_attempts/attempt_1.log
//src/test/shell/bazel:bazel_rules_java_override_test                     FLAKY, failed in 2 out of 3 in 52.4s
  Stats over 3 runs: max = 52.4s, min = 8.7s, avg = 31.2s, dev = 17.9s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/bazel_rules_java_override_test/test_attempts/attempt_1.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/bazel_rules_java_override_test/test_attempts/attempt_2.log
//src/test/shell/bazel:build_files_test                                   FLAKY, failed in 2 out of 3 in 49.8s
  Stats over 3 runs: max = 49.8s, min = 25.1s, avg = 41.6s, dev = 11.6s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/build_files_test/test_attempts/attempt_1.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/build_files_test/test_attempts/attempt_2.log
//src/test/shell/bazel:bazel_coverage_java_jdk21_toolchain_released_test FAILED in 3 out of 3 in 248.6s
  Stats over 3 runs: max = 248.6s, min = 199.0s, avg = 221.1s, dev = 20.6s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_java_jdk21_toolchain_released_test/test.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_java_jdk21_toolchain_released_test/test_attempts/attempt_1.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_java_jdk21_toolchain_released_test/test_attempts/attempt_2.log
//src/test/shell/integration:test_test                                   FAILED in 3 out of 3 in 216.0s
  Stats over 3 runs: max = 216.0s, min = 158.5s, avg = 191.3s, dev = 24.2s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/test_test/test.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/test_test/test_attempts/attempt_1.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/test_test/test_attempts/attempt_2.log
//src/test/java/com/google/devtools/build/lib/rules/config:ConfigRulesTests PASSED in 34.3s
  Stats over 5 runs: max = 34.3s, min = 17.8s, avg = 25.3s, dev = 7.0s
//src/test/shell/integration:bazel_sandboxed_worker_test                 FAILED in 6 out of 7 in 146.6s
  Stats over 7 runs: max = 146.6s, min = 98.7s, avg = 125.7s, dev = 15.4s
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/bazel_sandboxed_worker_test/shard_1_of_3/test.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/bazel_sandboxed_worker_test/shard_1_of_3/test_attempts/attempt_1.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/bazel_sandboxed_worker_test/shard_1_of_3/test_attempts/attempt_2.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/bazel_sandboxed_worker_test/shard_2_of_3/test.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/bazel_sandboxed_worker_test/shard_2_of_3/test_attempts/attempt_1.log
  /private/var/tmp/_bazel_buildkite/00e02099ed8d75d374b9c12be02eaf4c/execroot/_main/bazel-out/darwin_arm64-fastbuild/testlogs/src/test/shell/integration/bazel_sandboxed_worker_test/shard_2_of_3/test_attempts/attempt_2.log
//src/test/java/com/google/devtools/build/lib/query2/engine:AllTests     PASSED in 16.9s
  Stats over 10 runs: max = 16.9s, min = 6.2s, avg = 11.1s, dev = 3.1s
//src/test/shell/bazel/remote:remote_execution_test                      FAILED in 9 out of 12 in 256.6s
@meteorcloudy meteorcloudy added type: bug P1 I'll work on this now. (Assignee required) breakage team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Sep 24, 2024
@meteorcloudy
Copy link
Member Author

Seeing gRPC server failed to bind to IPv4 and IPv6 localhosts on port 0: [IPv4] Failed to bind to address /127.0.0.1:0 from the test log again. Maybe related to #20743

@meteorcloudy
Copy link
Member Author

@fweikert Do you know if there is any potential infrastructure change that could cause this?

copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Before:
```
//src/test/shell/bazel:bazel_java_test                          (cached) PASSED in 746.9s
//src/test/shell/bazel:bazel_java_test_jdk11_toolchain_head     (cached) PASSED in 307.3s
//src/test/shell/bazel:bazel_java_test_jdk17_toolchain_head     (cached) PASSED in 370.7s
//src/test/shell/bazel:bazel_java_test_jdk21_toolchain_head     (cached) PASSED in 340.6s
//src/test/shell/bazel:bazel_proto_library_test                 (cached) PASSED in 709.5s

```

After:
```
//src/test/shell/bazel:bazel_java_test                                   PASSED in 340.3s
//src/test/shell/bazel:bazel_java_test_jdk11_toolchain_head              PASSED in 221.2s
//src/test/shell/bazel:bazel_java_test_jdk17_toolchain_head              PASSED in 218.0s
//src/test/shell/bazel:bazel_java_test_jdk21_toolchain_head              PASSED in 315.0s
//src/test/shell/bazel:bazel_proto_library_test                          PASSED in 312.3s
```

Related: #23726
PiperOrigin-RevId: 678209188
Change-Id: I3b51584ba2893fe01a7ac084c99961623f0e4b02
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Mitigating #23726

PiperOrigin-RevId: 678213406
Change-Id: I99ea19f3dcf56a359e39274ce9043a6b4f64b6a4
@meteorcloudy
Copy link
Member Author

The issue seems to be reproducible on some VMs, so it might be related to some infra issue.

@meteorcloudy
Copy link
Member Author

I will no dig deeper since 355b000 mitigated the issue, and we probably need to upgrade gprc, netty versions and hope that could help. #22719

copybara-service bot pushed a commit that referenced this issue Sep 25, 2024
Those tests can sometimes take up to 1h to run for some reason. Disable them in presubmit and we can still monitor them in postsubmit.

Related #23726

PiperOrigin-RevId: 678608123
Change-Id: I783e1ba7b09855ee773ea066b8b49f1b36975a3b
meteorcloudy added a commit that referenced this issue Oct 1, 2024
Mitigating #23726

PiperOrigin-RevId: 678213406
Change-Id: I99ea19f3dcf56a359e39274ce9043a6b4f64b6a4

Backporting 355b000#diff-544556920c45b42cbfe40159b082ce8af6bd929e492d076769226265f215832fR85
@meteorcloudy
Copy link
Member Author

meteorcloudy commented Oct 1, 2024

@meteorcloudy
Copy link
Member Author

Found an even earlier flaky build: https://buildkite.com/bazel/google-bazel-presubmit/builds/84128 which might rule out f64cdea

github-merge-queue bot pushed a commit that referenced this issue Oct 1, 2024
Mitigating #23726

PiperOrigin-RevId: 678213406
Change-Id: I99ea19f3dcf56a359e39274ce9043a6b4f64b6a4

Backporting
355b000#diff-544556920c45b42cbfe40159b082ce8af6bd929e492d076769226265f215832fR85
@Wyverald
Copy link
Member

Do any of these fixes need to be cherry-picked back to 7.4.0 and/or 8.0.0?

@meteorcloudy
Copy link
Member Author

efa0303 should be backported to 8.0.0

@meteorcloudy
Copy link
Member Author

@bazel-io fork 8.0.0

bazel-io pushed a commit to bazel-io/bazel that referenced this issue Oct 11, 2024
Tests on macOS arm64 should be much more stable after bumping VM specs.

Also bumped the size of disk_cache_test as it sometimes timeout on Linux.

Closes bazelbuild#23726

PiperOrigin-RevId: 684412631
Change-Id: I5485f839aee0c3a1012196c6628eda9535985b82
github-merge-queue bot pushed a commit that referenced this issue Oct 28, 2024
Tests on macOS arm64 should be much more stable after bumping VM specs.

Also bumped the size of disk_cache_test as it sometimes timeout on
Linux.

Closes #23726

PiperOrigin-RevId: 684412631
Change-Id: I5485f839aee0c3a1012196c6628eda9535985b82

Commit
efa0303

Co-authored-by: Googler <pcloudy@google.com>
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 8.0.0 RC2. Please test out the release candidate and report any issues as soon as possible.
If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.0.0rc2. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breakage P1 I'll work on this now. (Assignee required) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug
Projects
None yet
Development

No branches or pull requests

3 participants