-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/grpcutil: Can no longer detect unambiguous failures #19708
Comments
SHA: https://github.com/cockroachdb/cockroach/commits/76c3503e963b26b0d31b6235339fe6397e448e20 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=396015&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/0b0fdcb5627b2c0d194fb9cf435f9cdc44cf89a6 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=396283&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/448667a522e29841b26d935927c96b2a4d127f31 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=397361&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/448667a522e29841b26d935927c96b2a4d127f31 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=397496&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/6f1f97e44b14a2eae93ef01a46c989bdc33d5b16 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=398576&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/6f1f97e44b14a2eae93ef01a46c989bdc33d5b16 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=398846&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/9c1c3f795dd0486a7fc1a35a277a8fdf5eef8906 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=399188&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/9c1c3f795dd0486a7fc1a35a277a8fdf5eef8906 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=399458&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/669c39e6cce611c2a178bbf27468d22f5a8f4482 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=400675&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/669c39e6cce611c2a178bbf27468d22f5a8f4482 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=400945&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/1debc0ebb16205371dd8231600eac14bb648618a Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=402079&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/b663a0cf109292ddbc22a872f0e301d92460b1aa Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=403370&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/b663a0cf109292ddbc22a872f0e301d92460b1aa Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=403648&tab=buildLog |
The GRPC updates in #19619 have changed the way errors are being reported here in some cases (several of the assertions in TestRequestDidNotStart are failing, both that we get an error that passes RequestDidNotStart when we didn't expect to and that we didn't get one when we should). Note that even though this has been failing every night since those changes went in, it takes a 10+ minutes to reproduce on a gceworker. I haven't been able to identify exactly what's going on on these error paths, but we knew we were relying on implementation details because GRPC hasn't provided a clean way to tell whether the RPC has been sent or not (grpc/grpc-go#1443). This affects AmbiguousResultErrors: it appears that we could be returning AmbiguousResultErrors when we don't need to and returning unambiguous errors when we shouldn't. We could fix this conservatively by making all GRPC errors ambiguous, but I think that would make ambiguous errors way too common. We may need to downgrade GRPC to 1.6 instead. |
SHA: https://github.com/cockroachdb/cockroach/commits/93b6827575a8cab4790106fc0851ec03d06ff190 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=404483&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/93b6827575a8cab4790106fc0851ec03d06ff190 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=404763&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/c3dcfe2883103317a16c266f5d6e2e0b310c5524 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=405854&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/c3dcfe2883103317a16c266f5d6e2e0b310c5524 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=406136&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/985cea5d75e09633077c42d85f0208741251e121 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=406516&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/985cea5d75e09633077c42d85f0208741251e121 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=406661&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/985cea5d75e09633077c42d85f0208741251e121 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=406806&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/82419c24cb7aaa2556477ca2ecc2a3ffa7e6a3f3 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=407406&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/10da44c9361d2f905cdf1ff2a7eacf8b4550e581 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=409015&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/87c681f94ba297726d692796e05889f315d83354 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=410073&tab=buildLog |
SHA: https://github.com/cockroachdb/cockroach/commits/87c681f94ba297726d692796e05889f315d83354 Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=410367&tab=buildLog |
In gRPC 1.6, we could identify certain unambiguous failures with string matching on the error. In 1.7, the same error message is used for both ambiguous and unambiguous cases, so we must assume it is ambiguous. Release note (bug fix): Fixed a case in which ambiguous errors were treated as unambiguous and led to inappropriate retries. Updates cockroachdb#19708
PR #20073 makes the conservative change to RequestDidNotStart to assume that all rpc errors are ambiguous, which fixes the potential bug here. However, this will increase the prevalence of ambiguous errors, so I'm leaving this issue open to track alternative solutions. This will probably require either grpc/grpc-go#1443 upstream or reverting to grpc 1.6. |
This is similar to what grpc does internally for fail-fast rpcs, but lets us control the error returned to work around grpc/grpc-go#1443 Fixes cockroachdb#19708 Release note (performance improvement): Reduced the occurrence of ambiguous errors when a node is down.
reopening as a skipped test references this issue |
SHA: https://github.com/cockroachdb/cockroach/commits/98f1623d60254505b65c457728617af5c5fec33b
Parameters:
Stress build found a failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=394944&tab=buildLog
The text was updated successfully, but these errors were encountered: