Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: roachtest: set default cluster settings, report timeout failures correctly, roachprod: capture ssh logs #96903

Commits on Mar 7, 2023

  1. roachprod: fix roachprod init.

    When a cluster is started with the `--skip-init` option, the caller
    can run `roachprod init` at any time to initialize the
    cluster. Unfortunately, the code used to initialize the cluster was
    duplicated: one copy existed in the `start` path, and another in the
    `init` path. Since the latter is used far less frequently, it had a
    bug that went unnoticed: it hardcoded the first node index as `0`,
    when node indices start at 1.
    
    This commit fixes the issue by updating the constant and sharing code
    between `init `and `start`.
    
    Fixes cockroachdb#88226.
    
    Release note: None
    renatolabs authored and Miral Gadani committed Mar 7, 2023
    Configuration menu
    Copy the full SHA
    82250a4 View commit details
    Browse the repository at this point in the history
  2. roachprod: set default cluster settings when starting

    In cockroachdb#88514, the cluster start logic was refactored to reuse the same
    code across `init` and `start`, fixing a bug in the former. However,
    the refactoring overlooked the fact that we previously always set the
    default cluster settings when there's more than one node in the
    cluster.
    
    This fixes that by setting the default cluster settings in that case;
    one particularly important cluster setting is the license key,
    necessary for some roachtests.
    
    Fixes cockroachdb#88660.
    Fixes cockroachdb#88665.
    Fixes cockroachdb#88666.
    Fixes cockroachdb#88710.
    
    Release note: None
    renatolabs authored and Miral Gadani committed Mar 7, 2023
    Configuration menu
    Copy the full SHA
    3765da1 View commit details
    Browse the repository at this point in the history
  3. roachtest: capture verbose SSH logging on failures

    Currently we do not capture SSH logs in the event of a
    command failure, which can be useful in debugging issues,
    transient or otherwise.
    
    This commit enables logging via the ssh switch -vvv and
    specifying a log filename, to be stored under an ssh/
    directory in the test log root. The debug file is deleted
    upon successful (zero) exit of the command, but preserved
    for non-zero exits for further inspection.
    
    Additionally,
    - The name of the log is consistent with the
    corresponding run log and encodes a node number and
    timestamp.
    - SSH sessions must now be initialised with the
    command itself to re-inforce its single use nature.
    - Debug friendly command names can optionally be
    specified to influence the name of the run/ssh logs.
    - Retry options can optionally be omitted from any call
    to ParallelE to disable retries
    
    Release note: None
    Epic: CRDB-21386
    Miral Gadani committed Mar 7, 2023
    Configuration menu
    Copy the full SHA
    44b6979 View commit details
    Browse the repository at this point in the history
  4. roachtest: Make ssh debug logging optional for node-wait/monitor.

    Wait commands are issued every 500ms returning a non zero exit
    code until nodes have started. This results in a large number
    of ssh debug logs during cluster creation.
    
    Also adopts functional options.
    
    Release note: None
    Epic: none
    Miral Gadani committed Mar 7, 2023
    Configuration menu
    Copy the full SHA
    48c49cd View commit details
    Browse the repository at this point in the history
  5. roachtest: report timeout failures accordingly

    Timeout failures are recorded at actual timeout, with
    subsequent failures secondary.
    
    `addFailure` accepts a depth parameter and no longer
    includes context cancellation, which is done separately.
    
    Epic: none
    Fixes: cockroachdb#91237
    Release note: None
    Miral Gadani committed Mar 7, 2023
    Configuration menu
    Copy the full SHA
    54871f4 View commit details
    Browse the repository at this point in the history