admission: enable disk bandwidth bottleneck resource #86857

sumeerbhola · 2022-08-25T11:21:53Z

Followup to #82898 which created the basic infrastructure, configuration scheme, and did experimentation with regular and elastic kv0 traffic.

Eliminate the need to provide the disk-name and bandwidth fields for common environments e.g. linux with store on EBS or GCP PD. This will make it possible to enable this by default for v23.1.
We need some more experiments, with user-facing regular SQL traffic and lower-priority (and elastic) background work like index backfills. This overlaps with admission: investigate TPC-E online index creation problem #85641 and admission: metrics and tests for evaluating improvements #85469. Prior experimentation was done over at [WIP] admission: add support for disk bandwidth as a bottleneck resource #82813 (comment).
admission: roachtest for disk bandwidth limiter #121576
Enable for v24.3

cc: @irfansharif @andrewbaptist

Jira issue: CRDB-18968

Epic: CRDB-37479

Part of cockroachdb#86857. This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". Next steps: - automatically measure provisioned bandwidth, using something like github.com/irfansharif/probe, gate behind envvars or cluster settings; - add roachtests that make use of these disk bandwidth tokens; - roll it out in managed environments; - roll it out elsewhere. Release note: None

Part of cockroachdb#86857. This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". They can also do this through the kvadmission.store.provisioned_bandwidth cluster setting. Next steps: - add roachtests that make use of these disk bandwidth tokens; - automatically measure provisioned bandwidth, using something like github.com/irfansharif/probe, gate behind envvars or cluster settings; - roll it out in managed environments; - roll it out elsewhere. Release note: None

Integration test for disk bandwidth tokens, copying over what we ran in \cockroachdb#82813. Part of cockroachdb#86857 Release note: None

sumeerbhola · 2024-02-29T16:31:53Z

We should remember to also extend the disk bandwidth control to encompass disk writes of sstables due to incoming range snapshots.

This commit cleans up changes from cockroachdb#119885. There is no longer a need for users to specify a disk name when specifying the provisioned bandwidth since we can now automatically infer disk names from the `StoreSpec.Path` and the underlying block device. Informs: cockroachdb#86857. Epic: None. Release note (ops change): The provisioned-rate field, if specified, should no longer accept a disk-name or an optional bandwidth field. To use the disk bandwidth constraint the store-spec must contain provisioned-rate=bandwidth=<bandwidth-bytes/s>, otherwise the cluster setting kv.store.admission.provisioned_bandwidth will be used.

120895: admission: remove `DiskName` from `StoreSpec.ProvisionedRateSpec` r=sumeerbhola a=CheranMahalingam This commit cleans up changes from #119885. There is no longer a need for users to specify a disk name when specifying the provisioned bandwidth since we can now automatically infer disk names from the `StoreSpec.Path` and the underlying block device. Informs: #86857. Epic: None. Release note (ops change): The provisioned-rate field, if specified, should no longer accept a disk-name or an optional bandwidth field. To use the disk bandwidth constraint the store-spec must contain provisioned-rate=bandwidth=<bandwidth-bytes/s>, otherwise the cluster setting kv.store.admission.provisioned_bandwidth will be used. Co-authored-by: Cheran Mahalingam <cheran.mahalingam@cockroachlabs.com>

jbowens · 2024-04-03T17:46:50Z

As a part of online restore, we'll need admission control to control downloading of external sstables so that we can download as fast as we can without affecting foreground workload latencies.

Previously, we would calculate elastic disk bandwidth tokens using arbitrary load thresholds and an estimate on incoming bytes into the LSM through flushes and ingestions. This calculation lacked accounting for write amplification in the LSM. This patch simplifies the disk bandwidth limiter to remove the disk load watcher and simply adjust tokens using the known provisioned disk bandwidth. For token deducation, we create a write-amp model that is a relationship between incoming LSM bytes to actual disk writes. The token granting semantics are as follows: - elastic writes: deduct tokens, and wait for positive count in bucket. - regular writes: deduct tokens, but proceed even with no tokens available. The requested write bytes are "inflated" using the estimated write amplification to account for async compactions in the LSM. This patch also lays the framework for future integrations where we can account for range snapshot ingestions separately as those don't incur the same write amplification as flushed LSM bytes do. Informs: cockroachdb#86857 Release note: None

129005: admission: account for write-amp in disk bandwidth limiter r=sumeerbhola a=aadityasondhi Previously, we would calculate elastic disk bandwidth tokens using arbitrary load thresholds and an estimate on incoming bytes into the LSM through flushes and ingestions. This calculation lacked accounting for write amplification in the LSM. This patch simplifies the disk bandwidth limiter to remove the disk load watcher and simply adjust tokens using the known provisioned disk bandwidth. For token deducation, we create a write-amp model that is a relationship between incoming LSM bytes to actual disk writes. The token granting semantics are as follows: - elastic writes: deduct tokens, and wait for positive count in bucket. - regular writes: deduct tokens, but proceed even with no tokens available. The requested write bytes are "inflated" using the estimated write amplification to account for async compactions in the LSM. This patch also lays the framework for future integrations where we can account for range snapshot ingestions separately as those don't incur the same write amplification as flushed LSM bytes do. Informs: #86857 Release note: None Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>

dshjoshi · 2024-09-29T23:17:22Z

@aadityasondhi @nicktrav Is this in progress or done or still in backlog? If it is in progress or done, can we update the status?

aadityasondhi · 2024-09-30T17:35:53Z

The functionality should be complete as of #129005.

The remaining work would be to:

Enable on internal test clusters (DRT)
Enable on cloud clusters
Write public docs for users

The enable part means setting the cluster setting or store config flags for the provisioned bandwidth.

Informs cockroachdb#86857 Release note: None

This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs cockroachdb#86857 Release note: None

135019: roachtest: use disk stall utility to limit bandwidth in AC tests r=itsbilal a=aadityasondhi Informs #86857 Release note: None Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>

sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Aug 25, 2022

jbowens mentioned this issue Jan 17, 2023

schema: potential 2X index backfill perf regression during tpce init #95163

Open

irfansharif mentioned this issue Jan 20, 2023

[WIP] admission: add support for disk bandwidth as a bottleneck resource #82813

Closed

irfansharif mentioned this issue Aug 23, 2023

server,admission: automatically infer disk names for stores #109350

Closed

irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 2, 2023

roachtest: add admission-control/disk-bandwidth

ed8be57

Integration test for disk bandwidth tokens, copying over what we ran in \cockroachdb#82813. Part of cockroachdb#86857 Release note: None

nicktrav added the T-admission-control Admission Control label Dec 21, 2023

This was referenced Mar 13, 2024

admission,kvserver: subject snapshot ingestion to admission control #80607

Closed

admission: control snapshot ingest disk write bandwidth #120708

Closed

CheranMahalingam mentioned this issue Mar 22, 2024

admission: remove DiskName from StoreSpec.ProvisionedRateSpec #120895

Merged

aadityasondhi mentioned this issue Mar 26, 2024

kvserver: "auto create stats" job should use lower priority for IO #82508

Open

aadityasondhi self-assigned this Apr 2, 2024

This was referenced Apr 4, 2024

admission: disk bandwidth limiter integrations #121779

Open

admission: disk bandwidth limiter for reads #121780

Closed

admission: integrate download operation with disk bandwidth limiter #122077

Open

aadityasondhi mentioned this issue Aug 14, 2024

admission: account for write-amp in disk bandwidth limiter #129005

Merged

sumeerbhola mentioned this issue Nov 6, 2024

admission: classify TTL writes as elastic work #124216

Closed

aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Nov 12, 2024

roachtest: use disk stall utility to limit bandwidth in AC tests

63e6d36

Informs cockroachdb#86857 Release note: None

aadityasondhi mentioned this issue Nov 12, 2024

roachtest: use disk stall utility to limit bandwidth in AC tests #135019

Merged

aadityasondhi mentioned this issue Nov 12, 2024

roachtest: snapshot ingest roachtest improvements #135022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: enable disk bandwidth bottleneck resource #86857

admission: enable disk bandwidth bottleneck resource #86857

sumeerbhola commented Aug 25, 2022 •

edited by aadityasondhi

Loading

sumeerbhola commented Feb 29, 2024

jbowens commented Apr 3, 2024

dshjoshi commented Sep 29, 2024

aadityasondhi commented Sep 30, 2024

admission: enable disk bandwidth bottleneck resource #86857

admission: enable disk bandwidth bottleneck resource #86857

Comments

sumeerbhola commented Aug 25, 2022 • edited by aadityasondhi Loading

sumeerbhola commented Feb 29, 2024

jbowens commented Apr 3, 2024

dshjoshi commented Sep 29, 2024

aadityasondhi commented Sep 30, 2024

sumeerbhola commented Aug 25, 2022 •

edited by aadityasondhi

Loading