-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: support per-store IO metrics with fine granularity #119885
storage: support per-store IO metrics with fine granularity #119885
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
e7dd5eb
to
3a0538f
Compare
3a0538f
to
930890a
Compare
053daaf
to
cd39477
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r4, 1 of 12 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @itsbilal)
bors r=jbowens,abarganier |
Build failed (retrying...): |
Build failed (retrying...): |
bors r- @CheranMahalingam I think you need to rebase over master. |
Canceled. |
Currently, timeseries metrics are collected on a 10s interval which hides momentary spikes in IO operations. This commit introduces a disk monitoring system that allows callers to subscribe to stats for a block device sampled every 100ms. Epic: None. Release note: None.
Currently, admission control consumes disk bandwidth stats aggregated across all block devices, including noise introduced by reads/writes from unrelated processes. This commit adds support for aggregating disk bandwidth stats across only monitored block devices which are used by the cockroach process. Epic: None. Release note: None.
cd39477
to
3d72b2c
Compare
Currently, disk metrics are computed and aggregated at a node level. However, a cockroach node can run multiple stores. This commit adds new timeseries disk metrics computed per-store. Epic: None. Release note (general change): The following metrics were added for observability of per-store disk events: - storage.disk.read.count - storage.disk.read.bytes - storage.disk.read.time - storage.disk.write.count - storage.disk.write.bytes - storage.disk.write.time - storage.disk.io.time - storage.disk.weightedio.time - storage.disk.iopsinprogress The metrics match the definitions of the sys.host.disk.* system metrics.
3d72b2c
to
f33445b
Compare
bors r=jbowens,abarganier |
This commit cleans up changes from cockroachdb#119885. There is no longer a need for users to specify a disk name when specifying the provisioned bandwidth since we can now automatically infer disk names from the `StoreSpec.Path` and the underlying block device. Informs: cockroachdb#86857. Epic: None. Release note (ops change): The provisioned-rate field, if specified, should no longer accept a disk-name or an optional bandwidth field. To use the disk bandwidth constraint the store-spec must contain provisioned-rate=bandwidth=<bandwidth-bytes/s>, otherwise the cluster setting kv.store.admission.provisioned_bandwidth will be used.
120895: admission: remove `DiskName` from `StoreSpec.ProvisionedRateSpec` r=sumeerbhola a=CheranMahalingam This commit cleans up changes from #119885. There is no longer a need for users to specify a disk name when specifying the provisioned bandwidth since we can now automatically infer disk names from the `StoreSpec.Path` and the underlying block device. Informs: #86857. Epic: None. Release note (ops change): The provisioned-rate field, if specified, should no longer accept a disk-name or an optional bandwidth field. To use the disk bandwidth constraint the store-spec must contain provisioned-rate=bandwidth=<bandwidth-bytes/s>, otherwise the cluster setting kv.store.admission.provisioned_bandwidth will be used. Co-authored-by: Cheran Mahalingam <cheran.mahalingam@cockroachlabs.com>
Currently, timeseries metrics are collected on a 10s interval which hides momentary spikes in IO. This commit introduces a central disk monitoring system that polls for disk stats at a 100ms interval. Additionally, the current system accumulates disk metrics across all block devices which includes noise from unrelated processes. This commit also adds support for exporting per-store IO metrics (i.e. IO stats for block devices that map to stores used by Cockroach).
These changes will be followed up by a PR to remove the need for customers to specify disk names when setting the provisioned bandwidth for each store as described in #109350.
Fixes: #104114, #112898.
Informs: #89786.
Epic: None.
Release note: None.