-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: roachtest: set default cluster settings, report timeout failures correctly, roachprod: capture ssh logs #96903
release-22.2: roachtest: set default cluster settings, report timeout failures correctly, roachprod: capture ssh logs #96903
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
When a cluster is started with the `--skip-init` option, the caller can run `roachprod init` at any time to initialize the cluster. Unfortunately, the code used to initialize the cluster was duplicated: one copy existed in the `start` path, and another in the `init` path. Since the latter is used far less frequently, it had a bug that went unnoticed: it hardcoded the first node index as `0`, when node indices start at 1. This commit fixes the issue by updating the constant and sharing code between `init `and `start`. Fixes cockroachdb#88226. Release note: None
In cockroachdb#88514, the cluster start logic was refactored to reuse the same code across `init` and `start`, fixing a bug in the former. However, the refactoring overlooked the fact that we previously always set the default cluster settings when there's more than one node in the cluster. This fixes that by setting the default cluster settings in that case; one particularly important cluster setting is the license key, necessary for some roachtests. Fixes cockroachdb#88660. Fixes cockroachdb#88665. Fixes cockroachdb#88666. Fixes cockroachdb#88710. Release note: None
Currently we do not capture SSH logs in the event of a command failure, which can be useful in debugging issues, transient or otherwise. This commit enables logging via the ssh switch -vvv and specifying a log filename, to be stored under an ssh/ directory in the test log root. The debug file is deleted upon successful (zero) exit of the command, but preserved for non-zero exits for further inspection. Additionally, - The name of the log is consistent with the corresponding run log and encodes a node number and timestamp. - SSH sessions must now be initialised with the command itself to re-inforce its single use nature. - Debug friendly command names can optionally be specified to influence the name of the run/ssh logs. - Retry options can optionally be omitted from any call to ParallelE to disable retries Release note: None Epic: CRDB-21386
Wait commands are issued every 500ms returning a non zero exit code until nodes have started. This results in a large number of ssh debug logs during cluster creation. Also adopts functional options. Release note: None Epic: none
Timeout failures are recorded at actual timeout, with subsequent failures secondary. `addFailure` accepts a depth parameter and no longer includes context cancellation, which is done separately. Epic: none Fixes: cockroachdb#91237 Release note: None
e621749
to
54871f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for backporting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
Backport:
roachprod init
." (roachprod: fixroachprod init
. #88514)Please see individual PRs for details.
/cc @cockroachdb/release
Epic: None
Release note: None
Release justification: test only change