Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission: consider IOPS as a bottleneck resource, and shape read bandwidth consumption #107623

Open
Tracked by #121779
sumeerbhola opened this issue Jul 26, 2023 · 3 comments
Assignees
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) P-3 Issues/test failures with no fix SLA T-admission-control Admission Control

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Jul 26, 2023

And shape both reads and writes consuming IOPS and bandwidth.

See (internal link) https://github.com/cockroachlabs/support/issues/2395#issuecomment-1631471135 for motivation.

Preliminary design sketch (internal link) https://docs.google.com/document/d/1KelFCIUd9jaBkAev5G6CReI_kA-u_pv6CHHNjjD4-lk/edit

Jira issue: CRDB-30133

Epic CRDB-42949

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Jul 26, 2023
@jbowens
Copy link
Collaborator

jbowens commented Aug 11, 2023

How does this relate to cockroachdb/pebble#18 or pacing of flushes/compactions, if at all? Even within some admitted set of operations, should we schedule the I/O carefully in order to avoid starving WAL writes?

@sumeerbhola sumeerbhola changed the title admission: consider IOPS as a bottleneck resource admission: consider IOPS as a bottleneck resource, and shape read bandwidth consumption Nov 17, 2023
@aadityasondhi aadityasondhi added the T-admission-control Admission Control label Apr 4, 2024
@joshimhoff
Copy link
Collaborator

This would help with a recent CC incident, where a customer exceeded their iops limit leading to crashes due to disk stalls across nodes in the cluster & thus thruput collapse. I've marked the linked jira as o-postmortem. If anyone wants more info on the postmortem, plz LMK!

@exalate-issue-sync exalate-issue-sync bot added the P-3 Issues/test failures with no fix SLA label Aug 1, 2024
@aadityasondhi
Copy link
Collaborator

Relevant internal thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) P-3 Issues/test failures with no fix SLA T-admission-control Admission Control
Projects
None yet
Development

No branches or pull requests

4 participants