Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: programmatically determine cockroach block device in disk-stall roachtest #123080

Closed
itsbilal opened this issue Apr 25, 2024 · 0 comments · Fixed by #123506
Closed
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team

Comments

@itsbilal
Copy link
Member

itsbilal commented Apr 25, 2024

Currently, the disk-stall roachtest hardcodes the block device major:minor numbers to pass into the cgroup controller:

This works for gce nodes with pd, but with local ssds, there's no 8:16 device and instead the /mnt/data1 volume is on a different device:

ubuntu@bilal-test-1-0003:~$ lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0     7:0    0  63.4M  1 loop /snap/core20/1974
loop1     7:1    0   346M  1 loop /snap/google-cloud-cli/157
loop2     7:2    0 111.9M  1 loop /snap/lxd/24322
loop3     7:3    0  53.3M  1 loop /snap/snapd/19457
loop4     7:4    0    87M  1 loop /snap/lxd/28373
loop5     7:5    0  63.9M  1 loop /snap/core20/2264
loop6     7:6    0 351.3M  1 loop /snap/google-cloud-cli/233
sda       8:0    0    10G  0 disk
├─sda1    8:1    0   9.9G  0 part /
├─sda14   8:14   0     4M  0 part
└─sda15   8:15   0   106M  0 part /boot/efi
nvme0n1 259:0    0   375G  0 disk /mnt/data1
nvme0n2 259:1    0   375G  0 disk /mnt/data2
nvme0n3 259:2    0   375G  0 disk /mnt/data3
nvme0n4 259:3    0   375G  0 disk /mnt/data4

As part of this change, programmatically determine the major:minor block device numbers of the device where cockroach is running within the roachtest, and use that to stall the appropriate device.

See #121912.

Jira issue: CRDB-38192

@itsbilal itsbilal added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. labels Apr 25, 2024
@blathers-crl blathers-crl bot added the T-storage Storage Team label Apr 25, 2024
craig bot pushed a commit that referenced this issue May 2, 2024
123506: roachtestutil: dynamically determine block device to stall r=RaduBerinde a=itsbilal

Previously, we hardcoded the block device on which to run the disk-stalled* roachtests and the disk-stall operation. This was a flaky approach as sometimes we'd use a local ssd as a block device which had very different numbers than a Google persistent disk.

This change updates the cgroup disk staller to programmatically determine the major/minor device numbers for the block device to stall (the one mounted at /mnt/data1). It also updates the dmsetup disk staller to dynamically determine the block device name mounted at /mnt/data1.

Fixes #123080, #123054.

Epic: none

Release note: None

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@craig craig bot closed this as completed in 8e81063 May 2, 2024
itsbilal added a commit to itsbilal/cockroach that referenced this issue May 7, 2024
Previously, we hardcoded the block device on which to run
the disk-stalled* roachtests and the disk-stall operation.
This was a flaky approach as sometimes we'd use a local
ssd as a block device which had very different numbers
than a Google persistent disk.

This change updates the cgroup disk staller to programmatically
determine the major/minor device numbers for the block device
to stall (the one mounted at /mnt/data1). It also updates the
dmsetup disk staller to dynamically determine the block device
name mounted at /mnt/data1.

Fixes cockroachdb#123080, cockroachdb#123054.

Epic: none

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant