-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.3.0] Implement failure circuit breaker #18541
Conversation
Copy of bazelbuild#18120: I accidentally closed bazelbuild#18120 during rebase and doesn't have permission to reopen. We have noticed that any problems with the remote cache have a detrimental effect on build times. On investigation we found that the interface for the circuit breaker was left unimplemented. To address this issue, implemented a failure circuit breaker, which includes three new Bazel flags: 1) experimental_circuitbreaker_strategy, 2) experimental_remote_failure_threshold, and 3) experimental_emote_failure_window. In this implementation, I have implemented failure strategy for circuit breaker and used failure count to trip the circuit. Reasoning behind using failure count instead of failure rate : To measure failure rate I also need the success count. While both the failure and success count need to be an AtomicInteger as both will be modified concurrently by multiple threads. Even though getAndIncrement is very light weight operation, at very high request it might contribute to latency. Reasoning behind using failure circuit breaker : A new instance of Retrier.CircuitBreaker is created for each build. Therefore, if the circuit breaker trips during a build, the remote cache will be disabled for that build. However, it will be enabled again for the next build as a new instance of Retrier.CircuitBreaker will be created. If needed in the future we may add cool down strategy also. e.g. failure_and_cool_down_startegy. closes bazelbuild#18136 Closes bazelbuild#18359. PiperOrigin-RevId: 536349954 Change-Id: I5e1c57d4ad0ce07ddc4808bf1f327bc5df6ce704
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
@coeuvre could you approve this into release-6.3.0? |
Le paramètrage devrais être modifier pour fusionner les code de fonction open source |
Baseline: 758b44d Release Notes: + Automatic code cleanup. (#18417) + Update CODEOWNERS for 6.3.0 (#18369) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18388) + Add implementation deps support for Objective-C (#18372) + Update release notes scripts (#18400) + Prevent CredentialHelperEnvironment crash when invoking Bazel outside of a workspace. (#18430) + Use wall-time for credential helper invalidation (#18413) + blaze_util_posix: handle killpg failures (#18403) + Pass version to java_runtimes created by local_java_repository (#18415) + Add jsonproto option to query --output flag (#18438) + Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419) + rules_go & rules_python are failing in Downstream CI with Bazel@HEAD (#18447) + Move credential helper setup into remote_helpers.sh so it can be reused by other shell tests. (#18453) + Wire credential helper to repository fetching. (#18429) + Updates/fixes to relnotes script (#18470) + Report percentual download progress in repository rules (#18471) + Support remote symlink outputs when building without the bytes. (#18476) + Enrich local BEP upload errors with file path and digest possible. (#18481) + Set `GTEST_SHARD_STATUS_FILE` in test setup (#18482) + Fix relnotes script (#18491) + Fix Xcode 14.3 compatibility (#18490) + Fix #18493. (#18514) + Extend the credential helper default timeout to 10s. (#18527) + Fix formatting of release notes (#18534) + Use extension rather than local names in ModuleExtensionMetadata (#18536) + [credentialhelper] Ignore all errors when writing stdin (#18540) + Improve error on invalid `-//foo` and `-@repo//foo` options (#18516) + Implement failure circuit breaker (#18541) + Actually check `TEST_SHARD_STATUS_FILE` has been touched (#18418) + Ignore hash string casing (#18414) + Error if repository name isn't supplied (#18425) + Track repo rule label attributes after the first non-existent one (#18412) + Add ServerCapabilities into RemoteExecutionClient (#18442) + RemoteExecutionService: support output_symlinks in ActionResult (#18441) + RemoteExecutionService: Action.Command to set output_paths (#18440) + Use local_termination_grace_seconds when testing LinuxSandbox availability (#18568) + Fix dangling string literal in `extension_metadata` docs (#18598) + Include actual MODULE.bazel location in stack traces (#18612) + Make cpp file extensions case sensitive again (#18552) + Fix error when script is run after the final tag is created. (#18638) + Fix WORKSPACE toolchain resolution with `--enable_bzlmod` (#18649) + Add `ActionExecutionMetadata` as a parameter to `ActionInputPrefetcher#prefetchFiles`. (#18656) + Use failure_rate instead of failure count for circuit breaker (#18559) + Update ignored_error logic for circuit_breaker (#18662) + Don't rewind the build if invocation id stays the same (#18670) + Fix potential memory leak in UI (#18659) + Test that a credential helper can supply credentials for bzlmod. (#18663) + Add flag --experimental_collect_code_coverage_for_generated_files. (#18664) + Options specified on the pseudo-command `common` in `.rc` files are now ignored by commands that do not support them as long as they are valid options for *any* Bazel command. Previously, commands that did not support all options given for `common` would fail to run. These previous semantics of `common` are now available via the new `always` pseudo-command. Closes #18130. (#18609) + Fix split post-processing of LLVM-based coverage (#18737) + Allow module extension usages to be isolated (#18727) + BEGIN_PUBLIC (#18729) + Declare credential helpers to be a stable feature. (#18752) + Add a new provider for injecting native libs in android_binary (#18753) + Properly handle invalid credential files (#18779) + The REPO.bazel and MODULE.bazel files are now also considered workspace boundary markers. (#18787) + Report remote execution messages as events (#18780) + Fail on isolated extension usages without imports (#18793) + Add changes to cc_shared_library from head to 6.3 (#18606) + Remove option to disable FJP. (#18791) + Update to latest turbine version (#18803) + None. None (#18808) + Wait for outputs downloads before emitting local BEP events that reference these outputs. (#18815) + Perform builtins injection for WORKSPACE-loaded bzl files. (#18819) + Fix non-declared symlink issue for local actions when BwoB. (#18817) + Make grep_includes optional inside cc_common.register_linkstamp_compile_action (#18823) + add feature on windows toolchain with right tag (#18654) + coverage_common.instrumented_files_info now has a metadata_files argument (#18838) + Download directory output for test actions (#18846) + Teach DexMapper to not separate synthetic classes from their context … (#18853) + **[Incompatible]** query --output=proto --order_output=deps now returns targets in topological order (previously there was no ordering). (#18870) + Revert "Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419)" (#18886) + Additional source inputs can now be specified for compilation in cc_library targets using the additional_compiler_inputs attribute, and these inputs can be used in the $(location) function. Fixes #18766. (#18882) + Open-source Google test `ConvenienceSymlinkTest` (#18890) + Update Error Prone to 2.20.0 (#18885) + Check if json.gz files exist, not the gcov version. (#18889) + Lockfile updates (#18894) + handle exception instead of crashing (#18895) + Add a new provider for passing dex related artifacts in android_binary (#18899) + Prevent most side effects of yanked modules (#18908) + Restore the classic desugar tool in the Bazel 6.3.0 branch so that the Bazel Android tools can be built for 6.3.0 without breaking backwards compatibility (#18909) + Update java_tools to v12.5 (#18868) + Add ActionCacheStatistics to BEP (#18914) + Adjust --top_level_targets_for_symlinks (#18916) + Track dev/non-dev `use_extension` calls (#18918) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18921) + Rollforward of https://github.com/bazelbuild/bazel/commit/482d2be27ab… (#18773) + Update Android tools to 0.27.2 for fixes to DexMapper for https://gith... (#18891) + Report dev/non-dev deps imported via non-dev/dev usages (#18922) + Add reverted 'isolate' changes (#18928) + Identify isolated extensions by exported name (#18923) + test-setup.sh: Attempt to raise the original signal once more (#18932) + Ignore broken classic desugar tests (#18933) + Disable UseCorrectAssertInTests by default (#18948) + Fix VS 2022 autodetection (#18960) + Fix absolute file paths showing up in lockfiles (#18993) + Add support for isolated extension usages to the lockfile (#19008) Acknowledgements: This release contains contributions from many people at Google, as well as amishra-u, Andreas Herrmann, Andy Hamon, andyrinne12, Benjamin Lee, Benjamin Peterson, Brentley Jones, Chirag Ramani, Christopher Rydell, Daniel Wagner-Hall, Ed Schouten, Fabian Brandstetter, Fabian Meumertzheim, Greg, Ivan Golub, Jon Landis, JY Lin, Kai Zhang, Keith Smiley, kotlaja, lripoche, oquenchil, Pavan Singh, Rasrack, Son Luong Ngoc, Takeo Sawada, Vertexwahn, Xùdōng Yáng, Yannic.
Baseline: 758b44d Release Notes: + Automatic code cleanup. (#18417) + Update CODEOWNERS for 6.3.0 (#18369) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18388) + Add implementation deps support for Objective-C (#18372) + Update release notes scripts (#18400) + Prevent CredentialHelperEnvironment crash when invoking Bazel outside of a workspace. (#18430) + Use wall-time for credential helper invalidation (#18413) + blaze_util_posix: handle killpg failures (#18403) + Pass version to java_runtimes created by local_java_repository (#18415) + Add jsonproto option to query --output flag (#18438) + Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419) + rules_go & rules_python are failing in Downstream CI with Bazel@HEAD (#18447) + Move credential helper setup into remote_helpers.sh so it can be reused by other shell tests. (#18453) + Wire credential helper to repository fetching. (#18429) + Updates/fixes to relnotes script (#18470) + Report percentual download progress in repository rules (#18471) + Support remote symlink outputs when building without the bytes. (#18476) + Enrich local BEP upload errors with file path and digest possible. (#18481) + Set `GTEST_SHARD_STATUS_FILE` in test setup (#18482) + Fix relnotes script (#18491) + Fix Xcode 14.3 compatibility (#18490) + Fix #18493. (#18514) + Extend the credential helper default timeout to 10s. (#18527) + Fix formatting of release notes (#18534) + Use extension rather than local names in ModuleExtensionMetadata (#18536) + [credentialhelper] Ignore all errors when writing stdin (#18540) + Improve error on invalid `-//foo` and `-@repo//foo` options (#18516) + Implement failure circuit breaker (#18541) + Actually check `TEST_SHARD_STATUS_FILE` has been touched (#18418) + Ignore hash string casing (#18414) + Error if repository name isn't supplied (#18425) + Track repo rule label attributes after the first non-existent one (#18412) + Add ServerCapabilities into RemoteExecutionClient (#18442) + RemoteExecutionService: support output_symlinks in ActionResult (#18441) + RemoteExecutionService: Action.Command to set output_paths (#18440) + Use local_termination_grace_seconds when testing LinuxSandbox availability (#18568) + Fix dangling string literal in `extension_metadata` docs (#18598) + Include actual MODULE.bazel location in stack traces (#18612) + Make cpp file extensions case sensitive again (#18552) + Fix error when script is run after the final tag is created. (#18638) + Fix WORKSPACE toolchain resolution with `--enable_bzlmod` (#18649) + Add `ActionExecutionMetadata` as a parameter to `ActionInputPrefetcher#prefetchFiles`. (#18656) + Use failure_rate instead of failure count for circuit breaker (#18559) + Update ignored_error logic for circuit_breaker (#18662) + Don't rewind the build if invocation id stays the same (#18670) + Fix potential memory leak in UI (#18659) + Test that a credential helper can supply credentials for bzlmod. (#18663) + Add flag --experimental_collect_code_coverage_for_generated_files. (#18664) + Options specified on the pseudo-command `common` in `.rc` files are now ignored by commands that do not support them as long as they are valid options for *any* Bazel command. Previously, commands that did not support all options given for `common` would fail to run. These previous semantics of `common` are now available via the new `always` pseudo-command. Closes #18130. (#18609) + Fix split post-processing of LLVM-based coverage (#18737) + Allow module extension usages to be isolated (#18727) + BEGIN_PUBLIC (#18729) + Declare credential helpers to be a stable feature. (#18752) + Add a new provider for injecting native libs in android_binary (#18753) + Properly handle invalid credential files (#18779) + The REPO.bazel and MODULE.bazel files are now also considered workspace boundary markers. (#18787) + Report remote execution messages as events (#18780) + Fail on isolated extension usages without imports (#18793) + Add changes to cc_shared_library from head to 6.3 (#18606) + Remove option to disable FJP. (#18791) + Update to latest turbine version (#18803) + None. None (#18808) + Wait for outputs downloads before emitting local BEP events that reference these outputs. (#18815) + Perform builtins injection for WORKSPACE-loaded bzl files. (#18819) + Fix non-declared symlink issue for local actions when BwoB. (#18817) + Make grep_includes optional inside cc_common.register_linkstamp_compile_action (#18823) + add feature on windows toolchain with right tag (#18654) + coverage_common.instrumented_files_info now has a metadata_files argument (#18838) + Download directory output for test actions (#18846) + Teach DexMapper to not separate synthetic classes from their context … (#18853) + **[Incompatible]** query --output=proto --order_output=deps now returns targets in topological order (previously there was no ordering). (#18870) + Revert "Don't eagerly flatten a `NestedSet` in `RepoMappingManifestAction` (#18419)" (#18886) + Additional source inputs can now be specified for compilation in cc_library targets using the additional_compiler_inputs attribute, and these inputs can be used in the $(location) function. Fixes #18766. (#18882) + Open-source Google test `ConvenienceSymlinkTest` (#18890) + Update Error Prone to 2.20.0 (#18885) + Check if json.gz files exist, not the gcov version. (#18889) + Lockfile updates (#18894) + handle exception instead of crashing (#18895) + Add a new provider for passing dex related artifacts in android_binary (#18899) + Prevent most side effects of yanked modules (#18908) + Restore the classic desugar tool in the Bazel 6.3.0 branch so that the Bazel Android tools can be built for 6.3.0 without breaking backwards compatibility (#18909) + Update java_tools to v12.5 (#18868) + Add ActionCacheStatistics to BEP (#18914) + Adjust --top_level_targets_for_symlinks (#18916) + Track dev/non-dev `use_extension` calls (#18918) + Overrides specified by non-root modules no longer cause an error, and are silently ignored instead. They were originally treated as an error to allow for the future possibility of overrides in the transitive dependency graph working together; but we've deemed that infeasible (and even if it was, it'd be so complicated and confusing to users that it would not be a good addition). (#18921) + Rollforward of https://github.com/bazelbuild/bazel/commit/482d2be27ab… (#18773) + Update Android tools to 0.27.2 for fixes to DexMapper for https://gith... (#18891) + Report dev/non-dev deps imported via non-dev/dev usages (#18922) + Add reverted 'isolate' changes (#18928) + Identify isolated extensions by exported name (#18923) + test-setup.sh: Attempt to raise the original signal once more (#18932) + Ignore broken classic desugar tests (#18933) + Disable UseCorrectAssertInTests by default (#18948) + Fix VS 2022 autodetection (#18960) + Fix absolute file paths showing up in lockfiles (#18993) + Add support for isolated extension usages to the lockfile (#19008) Acknowledgements: This release contains contributions from many people at Google, as well as amishra-u, Andreas Herrmann, Andy Hamon, andyrinne12, Benjamin Lee, Benjamin Peterson, Brentley Jones, Chirag Ramani, Christopher Rydell, Daniel Wagner-Hall, Ed Schouten, Fabian Brandstetter, Fabian Meumertzheim, Greg, Ivan Golub, Jon Landis, JY Lin, Kai Zhang, Keith Smiley, kotlaja, lripoche, oquenchil, Pavan Singh, Rasrack, Son Luong Ngoc, Takeo Sawada, Vertexwahn, Xùdōng Yáng, Yannic.
Issue
We have noticed that any problems with the remote cache have a detrimental effect on build times. On investigation we found that the interface for the circuit breaker was left unimplemented.
Solution
To address this issue, implemented a failure circuit breaker, which includes three new Bazel flags: 1) experimental_circuitbreaker_strategy, 2) experimental_remote_failure_threshold, and 3) experimental_emote_failure_window.
In this implementation, I have implemented failure strategy for circuit breaker and used failure count to trip the circuit.
Reasoning behind using failure count instead of failure rate : To measure failure rate I also need the success count. While both the failure and success count need to be an AtomicInteger as both will be modified concurrently by multiple threads. Even though getAndIncrement is very light weight operation, at very high request it might contribute to latency.
Reasoning behind using failure circuit breaker : A new instance of Retrier.CircuitBreaker is created for each build. Therefore, if the circuit breaker trips during a build, the remote cache will be disabled for that build. However, it will be enabled again
for the next build as a new instance of Retrier.CircuitBreaker will be created. If needed in the future we may add cool down strategy also. e.g. failure_and_cool_down_startegy.
Closes #18359
commit 5575ff2