-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/roachtest: add registry.OperationSpec #119796
Conversation
0c94113
to
5615f0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for separating the spec into a new PR. Just some initial questions around mixing tests & operations logic. In some sense I envision that we can run operations individually, and eventually a composition of operations as a roachtest.
pkg/cmd/roachtest/registry/filter.go
Outdated
|
||
// ClusterFeatures, if set, restricts the set of operations to those that | ||
// have dependencies satisfied by ClusterFeatures. | ||
ClusterFeatures OperationDependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if adding these to TestFilter
might lead to eventual confusion since things like Suite
and Cloud
does not make sense for Operations (Cloud
maybe, comment below), and ClusterFeatures
does not make sense for Roach Tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Removed TestFilter
and we can do direct regex matches with operations instead of going through testFilter.
|
||
// CompatibleClouds is the set of clouds this test can run on (e.g. AllClouds, | ||
// OnlyGCE, etc). Must be set. | ||
CompatibleClouds CloudSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the run
operation will check if the target cluster matches this and then prevent me from running the operation if the target cluster is not in the compatible clouds set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the idea. Or at least drt-run
will do that. run-operation
is going to be more relaxed as its' only one operation at a time.
TFTR! Pushed an update simplifying this even more (with the assumption of a standalone, non-logging runner to start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the smaller PR!
I believe Operation
still has functions that are not necessary (things we would want to remove from roachtest if possible, and therefore shouldn't encourage).
Otherwise LGTM.
// ClusterCockroach returns the path to the Cockroach binary on the target | ||
// cluster. | ||
ClusterCockroach() string | ||
Name() string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have a spec for any reason, being able to log / report on the Name becomes easy. My version of drt-run used this directly but that's before we moved towards spinning off subprocesses. I think it's worthwhile keeping this in but getting rid of Spec.
ClusterCockroach() string | ||
Name() string | ||
// Spec returns the *registry.OperationSpec as an interface{}. | ||
Spec() interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, so I'll get rid of this especially if we're keeping Name.
Failed() bool | ||
|
||
L() *logger.Logger | ||
Progress(float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, no roachtest currently uses this. I'm not convinced every operation should implement it, especially since the notion of "progress" is most likely undefined in the majority of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
|
||
L() *logger.Logger | ||
Progress(float64) | ||
Status(args ...interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status
is mostly unnecessary in roachtest, I don't think operations should have it. Logging using the provided logger should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually use this pretty heavily in the operations I wrote. It's a better semantic way of reporting what phase an operation is in, while letting verbose work go through the logger. The demo I did for drt-run
showed statuses for each worker. I'd say we keep this, even if some operations abuse it as a shorthand to the logger.
// CanRunConcurrently specifies whether this operation is safe to run | ||
// concurrently with other operations that have CanRunConcurrently = true. For | ||
// instance, a random-index addition is safe to run concurrently with most | ||
// other operations like node kills, while a drop would need to run on its own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good example? If I'm reading it right, a node kill and a random-index are not safe to run concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be, especially with the load balancer that we're planning on using. We do want the sort of interleaving of operations mentioned in the comment.
TFTR! Updated PR. Will update #120023 next. |
This change introduces a new type in the roachtest registry for Operations; smaller subunits of tests that work strictly on existing clusters and are meant to have zero logical side-effects to a running cluster (eg. should not delete data in an irrecoverable way), but can induce chaos events like crashing nodes. This new spec type, OperationSpec, specifies one operation and is registered with a registry.Registry. Split from cockroachdb#118364. Epic: none Release note: None
TFTR! |
bors r=herkolategan |
Build succeeded: |
This change adds a new roachtest command, run-operation, that runs a singular operation defined in OperationSpec (see cockroachdb#119796), in a simplified harness, with existing clusters only. The existing cluster is not touched (eg. stopped or wiped), unless the operation explicitly does an action like that. An implementation of operation.Operation is also bundled in this change to make these operations runnable. Epic: none Release note: None
This change adds a new roachtest command, run-operation, that runs a singular operation defined in OperationSpec (see cockroachdb#119796), in a simplified harness, with existing clusters only. The existing cluster is not touched (eg. stopped or wiped), unless the operation explicitly does an action like that. An implementation of operation.Operation is also bundled in this change to make these operations runnable. Epic: none Release note: None
This change adds a new roachtest command, run-operation, that runs a singular operation defined in OperationSpec (see cockroachdb#119796), in a simplified harness, with existing clusters only. The existing cluster is not touched (eg. stopped or wiped), unless the operation explicitly does an action like that. An implementation of operation.Operation is also bundled in this change to make these operations runnable. Epic: none Release note: None
This change adds a new roachtest command, run-operation, that runs a singular operation defined in OperationSpec (see #119796), in a simplified harness, with existing clusters only. The existing cluster is not touched (eg. stopped or wiped), unless the operation explicitly does an action like that. An implementation of operation.Operation is also bundled in this change to make these operations runnable. Epic: none Release note: None
120023: roachtest: add run-operation command to run singular operation r=HerkoLategan a=itsbilal NB: This change builds on #119796. Review that first. This change adds a new roachtest command, run-operation, that runs a singular operation defined in OperationSpec (see #119796), in a simplified harness, with existing clusters only. The existing cluster is not touched (eg. stopped or wiped), unless the operation explicitly does an action like that. An implementation of operation.Operation is also bundled in this change to make these operations runnable. Epic: none Release note: None 120430: sql/sem/tree: add FmtForFingerprint functions to collapse long lists r=xinhaoz a=xinhaoz Please note only the latest commit should be reviewed here. --------------- Add functions for the `FmtForFingerprint` flag to shorten long lists to a representative two element list where the second element is simply `__more__` to represent the rest of the list. Specifically, where `FmtHideConstants` would shorten tuples, arrays and VALUES clauses to its first two elements if it coontains only literals and placeholders, `FmtCollapseLists` shortens these lists to its first element if all items in the list are any combination of placeholders, literals, or a simple subexpressions containing only literals and/or placeholders. Enabling this behaviour for statement fingerprint generation can be done via setting the cluster setting `sql.stats.statement_fingerprint.format_mask` to include `FmtCollapseLists`. ``` FmtHideConstants (previous behaviour): VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) -> VALUES ((_, _), (_, _)), ((_, _), (_, _)), ((_, _), (_, _)) ARRAY[1, 2, 3+4] -> ARRAY[_, _, _+4] SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) -> SELECT * FROM foo WHERE a IN (1, 2, __more1_10__) --- FmtHideConstants | FmtCollapseLists: VALUES ((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12)) -> VALUES ((_, __more__), __more__), (__more__) ARRAY[1, 2, 3+4] -> ARRAY[_, __more__] SELECT * FROM foo WHERE a IN (1, 2, 3, 4, 5) -> SELECT * FROM foo WHERE a IN (1, __more__) ``` Epic: none Part of: #120409 Release note: none 120594: ccl/serverccl: skip flakey assertion r=msbutler a=stevendanna Informs #119329 Release note: None 120686: plpgsql: implement CONTINUE/EXIT with condition r=DrewKimball a=DrewKimball PL/pgSQL allows adding a `WHEN <condition>` clause to `EXIT` and `CONTINUE` statements. This is syntactic sugar for `EXIT` or `CONTINUE` within an `IF` statement. This commit add support for this syntax. Informs #115271 Release note (sql change): It is now possible to specify a condition for the PL/pgSQL statemenets `EXIT` and `CONTINUE`. 120760: roachprod: remove `admin-ui-port` flag from `start-sql` r=DarrylWong a=renatolabs This flag is only applicable to the system tenant (`roachprod start`); having it on `start-sql` is misleading. Epic: none Release note: None Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com> Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Drew Kimball <drewk@cockroachlabs.com> Co-authored-by: Renato Costa <renato@cockroachlabs.com>
120024: roachtest: add operations to add column, indexes r=HerkoLategan a=itsbilal NB: the first two changes are in #119796 and #120023 respectively. Review those first. This change builds on the operation framework added in previous changes to add new operations that randomly add columns and indexes to an existing cluster. Epic: none Release note: None 121269: dbconsole: remove internal cols from jobs table r=dt a=dt These are internal jobs system state that few jobs use. These values are visible on the per-job detail page if one is interested but are just noise on the table, and drawing attention to them can be misleading as most jobs do retries themselves causing these numbers to be incorrect/irrelevant at best. Release note (ui change): the jobs table page no longer includes two columns related to a deprecated internal implementation detail (last execution time and execution count). Epic: none. 121278: go.mod: bump Pebble to e28836ed0fa4 r=jbowens a=jbowens Changes: * [`e28836ed`](cockroachdb/pebble@e28836ed) sstable: handle synthetic suffix when using CopySpan * [`b6e563f6`](cockroachdb/pebble@b6e563f6) wal: synchronously verify secondary is writable Release note: none. Epic: none. Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
This change introduces a new type in the roachtest registry for Operations; smaller subunits of tests that work strictly on existing clusters and are meant to have zero logical side-effects to a running cluster (eg. should not delete data in an irrecoverable way), but can induce chaos events like crashing nodes. This new spec type, OperationSpec, specifies one operation and is registered with a registry.Registry. It can be filtered with a registry.TestFilter similar to TestSpecs.
Split from #118364.
Epic: none
Release note: None