Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/roachtest: add registry.OperationSpec #119796

Merged
merged 1 commit into from
Mar 11, 2024
Merged

Conversation

itsbilal
Copy link
Member

@itsbilal itsbilal commented Mar 1, 2024

This change introduces a new type in the roachtest registry for Operations; smaller subunits of tests that work strictly on existing clusters and are meant to have zero logical side-effects to a running cluster (eg. should not delete data in an irrecoverable way), but can induce chaos events like crashing nodes. This new spec type, OperationSpec, specifies one operation and is registered with a registry.Registry. It can be filtered with a registry.TestFilter similar to TestSpecs.

Split from #118364.

Epic: none

Release note: None

@itsbilal itsbilal requested a review from renatolabs March 1, 2024 00:40
@itsbilal itsbilal requested a review from a team as a code owner March 1, 2024 00:40
@itsbilal itsbilal requested review from herkolategan and removed request for a team March 1, 2024 00:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for separating the spec into a new PR. Just some initial questions around mixing tests & operations logic. In some sense I envision that we can run operations individually, and eventually a composition of operations as a roachtest.


// ClusterFeatures, if set, restricts the set of operations to those that
// have dependencies satisfied by ClusterFeatures.
ClusterFeatures OperationDependency
Copy link
Collaborator

@herkolategan herkolategan Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if adding these to TestFilter might lead to eventual confusion since things like Suite and Cloud does not make sense for Operations (Cloud maybe, comment below), and ClusterFeatures does not make sense for Roach Tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Removed TestFilter and we can do direct regex matches with operations instead of going through testFilter.

pkg/cmd/roachtest/registry/operation_spec.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/registry/operation_spec.go Outdated Show resolved Hide resolved

// CompatibleClouds is the set of clouds this test can run on (e.g. AllClouds,
// OnlyGCE, etc). Must be set.
CompatibleClouds CloudSet
Copy link
Collaborator

@herkolategan herkolategan Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the run operation will check if the target cluster matches this and then prevent me from running the operation if the target cluster is not in the compatible clouds set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's the idea. Or at least drt-run will do that. run-operation is going to be more relaxed as its' only one operation at a time.

pkg/cmd/roachtest/registry/operation_spec.go Outdated Show resolved Hide resolved
@itsbilal
Copy link
Member Author

itsbilal commented Mar 6, 2024

TFTR! Pushed an update simplifying this even more (with the assumption of a standalone, non-logging runner to start)

@itsbilal itsbilal changed the title cmd/roachtest: add registry.OperationSpec and filter.FilterOps cmd/roachtest: add registry.OperationSpec Mar 6, 2024
Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the smaller PR!

I believe Operation still has functions that are not necessary (things we would want to remove from roachtest if possible, and therefore shouldn't encourage).

Otherwise LGTM.

// ClusterCockroach returns the path to the Cockroach binary on the target
// cluster.
ClusterCockroach() string
Name() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have a spec for any reason, being able to log / report on the Name becomes easy. My version of drt-run used this directly but that's before we moved towards spinning off subprocesses. I think it's worthwhile keeping this in but getting rid of Spec.

ClusterCockroach() string
Name() string
// Spec returns the *registry.OperationSpec as an interface{}.
Spec() interface{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, so I'll get rid of this especially if we're keeping Name.

Failed() bool

L() *logger.Logger
Progress(float64)
Copy link
Contributor

@renatolabs renatolabs Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, no roachtest currently uses this. I'm not convinced every operation should implement it, especially since the notion of "progress" is most likely undefined in the majority of cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


L() *logger.Logger
Progress(float64)
Status(args ...interface{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status is mostly unnecessary in roachtest, I don't think operations should have it. Logging using the provided logger should be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually use this pretty heavily in the operations I wrote. It's a better semantic way of reporting what phase an operation is in, while letting verbose work go through the logger. The demo I did for drt-run showed statuses for each worker. I'd say we keep this, even if some operations abuse it as a shorthand to the logger.

pkg/cmd/roachtest/registry/operation_spec.go Outdated Show resolved Hide resolved
// CanRunConcurrently specifies whether this operation is safe to run
// concurrently with other operations that have CanRunConcurrently = true. For
// instance, a random-index addition is safe to run concurrently with most
// other operations like node kills, while a drop would need to run on its own
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good example? If I'm reading it right, a node kill and a random-index are not safe to run concurrently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be, especially with the load balancer that we're planning on using. We do want the sort of interleaving of operations mentioned in the comment.

pkg/cmd/roachtest/test_registry.go Outdated Show resolved Hide resolved
@itsbilal
Copy link
Member Author

TFTR! Updated PR. Will update #120023 next.

This change introduces a new type in the roachtest registry for
Operations; smaller subunits of tests that work strictly on existing
clusters and are meant to have zero logical side-effects to a running
cluster (eg. should not delete data in an irrecoverable way), but can
induce chaos events like crashing nodes. This new spec type,
OperationSpec, specifies one operation and is registered with
a registry.Registry.

Split from cockroachdb#118364.

Epic: none

Release note: None
@itsbilal
Copy link
Member Author

TFTR!

@itsbilal
Copy link
Member Author

bors r=herkolategan

@craig
Copy link
Contributor

craig bot commented Mar 11, 2024

Build succeeded:

@craig craig bot merged commit 8ad5029 into cockroachdb:master Mar 11, 2024
19 checks passed
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 11, 2024
This change adds a new roachtest command, run-operation,
that runs a singular operation defined in OperationSpec
(see cockroachdb#119796), in a simplified harness, with existing
clusters only. The existing cluster is not touched (eg.
stopped or wiped), unless the operation explicitly
does an action like that.

An implementation of operation.Operation is also bundled in this
change to make these operations runnable.

Epic: none

Release note: None
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 11, 2024
This change adds a new roachtest command, run-operation,
that runs a singular operation defined in OperationSpec
(see cockroachdb#119796), in a simplified harness, with existing
clusters only. The existing cluster is not touched (eg.
stopped or wiped), unless the operation explicitly
does an action like that.

An implementation of operation.Operation is also bundled in this
change to make these operations runnable.

Epic: none

Release note: None
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 13, 2024
This change adds a new roachtest command, run-operation,
that runs a singular operation defined in OperationSpec
(see cockroachdb#119796), in a simplified harness, with existing
clusters only. The existing cluster is not touched (eg.
stopped or wiped), unless the operation explicitly
does an action like that.

An implementation of operation.Operation is also bundled in this
change to make these operations runnable.

Epic: none

Release note: None
cockroach-dev-inf pushed a commit that referenced this pull request Mar 18, 2024
This change adds a new roachtest command, run-operation,
that runs a singular operation defined in OperationSpec
(see #119796), in a simplified harness, with existing
clusters only. The existing cluster is not touched (eg.
stopped or wiped), unless the operation explicitly
does an action like that.

An implementation of operation.Operation is also bundled in this
change to make these operations runnable.

Epic: none

Release note: None
craig bot pushed a commit that referenced this pull request Mar 20, 2024
120023: roachtest: add run-operation command to run singular operation  r=HerkoLategan a=itsbilal

NB: This change builds on #119796. Review that first.

This change adds a new roachtest command, run-operation,
that runs a singular operation defined in OperationSpec
(see #119796), in a simplified harness, with existing
clusters only. The existing cluster is not touched (eg.
stopped or wiped), unless the operation explicitly
does an action like that.

An implementation of operation.Operation is also bundled in this
change to make these operations runnable.

Epic: none

Release note: None

120430: sql/sem/tree: add FmtForFingerprint functions to collapse long lists  r=xinhaoz a=xinhaoz

Please note only the latest commit should be reviewed here.

---------------
Add functions for the `FmtForFingerprint` flag to shorten long lists
to a representative two element list where the second element is simply
`__more__` to represent the rest of the list.

Specifically, where `FmtHideConstants` would shorten tuples, arrays and
VALUES clauses to its first two elements if it coontains only literals
and placeholders, `FmtCollapseLists` shortens these lists to its first
element if all items in the list are any combination of placeholders,
literals, or a simple subexpressions containing only literals and/or
placeholders.

Enabling this behaviour for statement fingerprint generation can be done
via setting the cluster setting `sql.stats.statement_fingerprint.format_mask`
to include `FmtCollapseLists`.

```
FmtHideConstants (previous behaviour):
VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) ->
VALUES ((_, _), (_, _)), ((_, _), (_, _)), ((_, _), (_, _))

ARRAY[1, 2, 3+4] ->
ARRAY[_, _, _+4]

SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) ->
SELECT * FROM foo WHERE a IN (1, 2, __more1_10__)

---

FmtHideConstants | FmtCollapseLists:
VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) ->
VALUES ((_, __more__), __more__), (__more__)

ARRAY[1, 2, 3+4] ->
ARRAY[_, __more__]

SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) ->
SELECT * FROM foo WHERE a IN (1, __more__)
```


Epic: none
Part of: #120409
Release note: none

120594: ccl/serverccl: skip flakey assertion r=msbutler a=stevendanna

Informs #119329
Release note: None

120686: plpgsql: implement CONTINUE/EXIT with condition r=DrewKimball a=DrewKimball

PL/pgSQL allows adding a `WHEN <condition>` clause to `EXIT` and `CONTINUE` statements. This is syntactic sugar for `EXIT` or `CONTINUE` within an `IF` statement. This commit add support for this syntax.

Informs #115271

Release note (sql change): It is now possible to specify a condition for the PL/pgSQL statemenets `EXIT` and `CONTINUE`.

120760: roachprod: remove `admin-ui-port` flag from `start-sql` r=DarrylWong a=renatolabs

This flag is only applicable to the system tenant (`roachprod start`); having it on `start-sql` is misleading.

Epic: none

Release note: None

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Mar 28, 2024
120024: roachtest: add operations to add column, indexes  r=HerkoLategan a=itsbilal

NB: the first two changes are in #119796 and #120023 respectively. Review those first.

This change builds on the operation framework added in
previous changes to add new operations that randomly add
columns and indexes to an existing cluster.

Epic: none

Release note: None

121269: dbconsole: remove internal cols from jobs table r=dt a=dt

These are internal jobs system state that few jobs use. These values are visible on the per-job detail page if one is interested but are just noise on the table, and drawing attention to them can be misleading as most jobs do retries themselves causing these numbers to be incorrect/irrelevant at best.

Release note (ui change): the jobs table page no longer includes two columns related to a deprecated internal implementation detail (last execution time and execution count).

Epic: none.

121278: go.mod: bump Pebble to e28836ed0fa4 r=jbowens a=jbowens

Changes:

 * [`e28836ed`](cockroachdb/pebble@e28836ed) sstable: handle synthetic suffix when using CopySpan
 * [`b6e563f6`](cockroachdb/pebble@b6e563f6) wal: synchronously verify secondary is writable

Release note: none.
Epic: none.

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants