You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
renatolabs opened this issue
Nov 3, 2022
· 1 comment
· Fixed by #96743
Assignees
Labels
A-testingTesting tools and infrastructureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-testengTestEng Team
Currently, if a roachtest times out, there's a good chance the failure is reported as an SSH-problem (after the improvements we introduced in #88492). When the test times out, roachtest will stop the cluster (including every process running in it), which may cause long-running processes running over SSH to return exit code 255. This leads to our test infrastructure reporting the failure as an SSH problem. We need to improve our handling of timed out tests so that they are properly reported as a test failure to the owning team in these cases.
We should also take the chance to improve the reporting of the "time out" error added to the test, as it uses the Timeout field of the test spec, which will be 0 for tests that don't specify a custom timeout.
The text was updated successfully, but these errors were encountered:
renatolabs
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-testing
Testing tools and infrastructure
T-testeng
TestEng Team
labels
Nov 3, 2022
96743: roachtest: report timeout failures accordingly r=renatolabs,herkolategan a=smg260
Previously, a timeout failure would be deferred until after artifacts were collected, which sometimes resulted in subsequent failures being attributed as the primary cause.
- Timeout failures are now recorded at actual timeout, with subsequent failures secondary. Context cancellation occurs at the end of test teardown
- `addFailure` accepts a depth parameter and no longer includes context cancellation, which is done separately.
Epic: none
Fixes: #91237
Release note: None
Co-authored-by: Miral Gadani <miral@cockroachlabs.com>
Timeout failures are recorded at actual timeout, with
subsequent failures secondary.
`addFailure` accepts a depth parameter and no longer
includes context cancellation, which is done separately.
Epic: none
Fixes: cockroachdb#91237
Release note: None
smg260
pushed a commit
to smg260/cockroach
that referenced
this issue
Mar 7, 2023
Timeout failures are recorded at actual timeout, with
subsequent failures secondary.
`addFailure` accepts a depth parameter and no longer
includes context cancellation, which is done separately.
Epic: none
Fixes: cockroachdb#91237
Release note: None
A-testingTesting tools and infrastructureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-testengTestEng Team
Describe the problem
Currently, if a roachtest times out, there's a good chance the failure is reported as an SSH-problem (after the improvements we introduced in #88492). When the test times out, roachtest will stop the cluster (including every process running in it), which may cause long-running processes running over SSH to return exit code 255. This leads to our test infrastructure reporting the failure as an SSH problem. We need to improve our handling of timed out tests so that they are properly reported as a test failure to the owning team in these cases.
We should also take the chance to improve the reporting of the "time out" error added to the test, as it uses the
Timeout
field of the test spec, which will be 0 for tests that don't specify a custom timeout.Related failure: #90695 (comment)
Jira issue: CRDB-21167
The text was updated successfully, but these errors were encountered: