-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: backup-restore/online-restore failed #124330
Comments
i can repro this, and interestingly it only occurs after cluster restore. I annotated the error and we're only missing like 2 mb's of data. I wonder if this check is failing because we don't download the temp system db from the pre restore phase. |
my theory was correct. fix here #124348 |
roachtest.backup-restore/online-restore failed with artifacts on master @ 5d013285fa696f53df2abb39f44ffc777125fe1c:
Parameters:
|
119416: pkg/util/eventagg: general aggregation framework for reduction of event cardinality r=dhartunian a=abarganier **Reviewer note: review commit-wise** The eventagg package is (currently) a proof of concept ("POC") that aims to provide an easy-to-use library that standardizes the way in which we aggregate Observability event data in CRDB. The goal is to eventually emit that data as "exhaust" from CRDB, which downstream systems can consume to build Observability features that do not rely on CRDB's own availability to aid in debugging & investigations. Additionally, we want to provide facilities for code within CRDB to consume this same data, such that it can also power features internally. This pull request contains work to create the aggregation mechanism in `pkg/util/eventagg`. This facilities provide a way of aggregating notable events to reduce cardinality, before performing further processing and/or structured logging. In addition to the framework, a toy SQL Stats example is provided in `pkg/sql/sqlstats/aggregate.go`, which shows the current developer experience when using the APIs. See `pkg/util/eventagg/doc.go` for more details Since this feature is currently experimental, it's gated by the `COCKROACH_ENABLE_STRUCTURED_EVENTS` environment variable, which is disabled by default. --- Release note: none Epic: CRDB-35919 123120: ui: Highlight unavailable ranges in red on the summary bar with nonzero r=abarganier a=theloneexplorerquest Modify the summary bar to change the color of unavailable ranges. When the unavailable range is greater than zero, it will be displayed in red; if it is zero, it will be green. Fix: #122014 Release note (ui): Changed the color of unavailable ranges on the summary bar to red when nonzero; ranges are green when zero. 124160: roachtest: add test for admission control disk bandwidth r=sumeerbhola a=aadityasondhi This test runs a single node target cluster that has two workloads running on it. The lower priority (qos=background) is very bandwidth intensive, and without the AC bandwidth limiter would saturate the provisioned bandwidth (controlled using cgroups). This test shows how setting the cluster setting `kvadmission.store.provisioned-bandwidth` limits the disk bandwidth usage of lower priority work and shapes it at the value set in the setting. Fixes #121576. Release note: None 124293: tools: switch md5 cmd name based on existence r=dt a=dt Release note: none. Epic: none. 124348: backupccl: download pre restore data in cluster restore r=dt a=msbutler This patch adds the pre restore data spans to the list of spans to download. While these pre restore spans map to data in the temporary system table database that are then rewwritten to the actual system table, the download job ought to download all external data linked into the cluster out of principle. Fixes #124330 Release note: none 124403: roachtest: use first transient error when checking for flakes r=srosenberg a=renatolabs Previously, roachtest would only look at the outermost error in a chain that matched a `TransientError` (or `ErrorWithOwnership`) when checking for flakes. However, that is in most cases *not* what we want: if a transient error wraps another transient error, the actual reason for the failure is the original (wrapped) error. Informs: #123887 Release note: None 124486: kvclient: add WithFiltering option to rangefeed client r=nvanbenschoten,msbutler a=stevendanna This adds a WithFiltering option to the rangefeed client that passes through the option to the underlying rangefeed. Epic: none Release note: None 124491: raft: remove RawNode.TickQuiesced r=pav-kv a=nvanbenschoten This commit removes the `(*RawNode).TickQuiesced` method. The method was deprecated back in etcd-io/raft#62 and has not been in use since 2018. Epic: None Release note: None Co-authored-by: Alex Barganier <abarganier@cockroachlabs.com> Co-authored-by: theloneexplorerquest <theloneexplorerquest@gmail.com> Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Renato Costa <renato@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
This patch adds the pre restore data spans to the list of spans to download. While these pre restore spans map to data in the temporary system table database that are then rewwritten to the actual system table, the download job ought to download all external data linked into the cluster out of principle. Fixes cockroachdb#124330 Release note: none
roachtest.backup-restore/online-restore failed with artifacts on master @ 855b9cc97afa3df4f7e17f928c04ab0834b2630c:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=true
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
This test on roachdash | Improve this report!
Jira issue: CRDB-38828
The text was updated successfully, but these errors were encountered: