Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disk-stalled/wal-failover/among-stores failed #122772

Closed
cockroach-teamcity opened this issue Apr 21, 2024 · 3 comments
Closed

roachtest: disk-stalled/wal-failover/among-stores failed #122772

cockroach-teamcity opened this issue Apr 21, 2024 · 3 comments
Assignees
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 21, 2024

roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on release-24.1 @ 3111a51bdb2cbf889c92f4440bf88b0eb682cb96:

(disk_stall.go:174).runDiskStalledWALFailover: unexpectedly high p99.99 latency 1.154394337s at 2024-04-21T13:17:00Z
(cluster.go:2348).Run: context canceled
test artifacts and logs in: /artifacts/disk-stalled/wal-failover/among-stores/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=16
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-38060

@cockroach-teamcity cockroach-teamcity added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-storage Storage Team labels Apr 21, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Apr 21, 2024
@cockroach-teamcity
Copy link
Member Author

roachtest.disk-stalled/wal-failover/among-stores failed with artifacts on release-24.1 @ 2873eb728d555695bf5a392d748142e9b05cd36d:

(disk_stall.go:174).runDiskStalledWALFailover: unexpectedly high p99.99 latency 1.021233281s at 2024-04-23T13:23:00Z
(cluster.go:2348).Run: context canceled
test artifacts and logs in: /artifacts/disk-stalled/wal-failover/among-stores/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=16
  • ROACHTEST_encrypted=true
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=2
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

This test on roachdash | Improve this report!

@sumeerbhola sumeerbhola self-assigned this Apr 23, 2024
@sumeerbhola
Copy link
Collaborator

Oddly, there are no node logs from the last failure here, or in #122364.

13:58:08 test_impl.go:414: test failure #1: full stack retained in failure_1.log: (disk_stall.go:174).runDiskStalledWALFailover: unexpectedly high p99.99 latency 1.021233281s at 2024-04-23T13:23:00Z

SQL and KV latency were high, but raft log commit latency did not spike at p100. Suggests something observed the disk stall in the KV code.

Screenshot 2024-04-23 at 12 12 07 PM Screenshot 2024-04-23 at 12 12 18 PM Screenshot 2024-04-23 at 12 11 22 PM

KV qps was unaffected, which is consistent with flushed bytes being unaffected, but compactions fell, which is odd.

Screenshot 2024-04-23 at 12 13 37 PM Screenshot 2024-04-23 at 12 14 05 PM Screenshot 2024-04-23 at 12 14 21 PM

@sumeerbhola sumeerbhola removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 23, 2024
@sumeerbhola
Copy link
Collaborator

closing this as a duplicate of #122364

@sumeerbhola sumeerbhola closed this as not planned Won't fix, can't repro, duplicate, stale Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team
Projects
Archived in project
Development

No branches or pull requests

3 participants