Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: granular IO metrics #112898

Closed
jbowens opened this issue Oct 23, 2023 · 4 comments · Fixed by #119885
Closed

storage: granular IO metrics #112898

jbowens opened this issue Oct 23, 2023 · 4 comments · Fixed by #119885
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Oct 23, 2023

Our existing timeseries metrics are collected on a 10s interval. This coarse granularity makes it impossible to detect events of high variance within the 10s interval. A momentary spike of 16k IO operations in 1 second can be presented as 1.6k IOPS over the 10s interval. A spike like this could force IO operations to queue, inducing latency beyond what our customers consider acceptable, without leaving a trace of the latency's source.

We should collect finer, per-second metrics for select metrics. We shouldn't wait for our timeseries infrastructure to support finer resolution for some metrics. We can surface these metrics through a few strategies:

  • Export timeseries metrics that compute aggregation function over the per-second deltas: For example, the maximum delta between per-second measurements would be enough to give us a much better view into a momentary IOPS spike.
  • Write these metrics to the Pebble logs every N-seconds and include each of the per-second measurements over the past N seconds.
  • Write finer metrics to logs only once there's a significant change (defining this may be tricky, and probably needs to be done per-metric)

Jira issue: CRDB-32668

@jbowens jbowens added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Oct 23, 2023
@jbowens
Copy link
Collaborator Author

jbowens commented Oct 23, 2023

Would've helped with cockroachlabs/support#2673.

@jbowens jbowens added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Oct 24, 2023
@jbowens
Copy link
Collaborator Author

jbowens commented Oct 24, 2023

As a part of this, we should also do #104114. We should have one goroutine that knows the mapping of path to disk and is responsible for reading the disk stats file in /proc/. The stats it reads can be used to power these granular IO metrics, per-store metrics described in #104114, and the disk stats plumbed into admission control.

@jbowens jbowens self-assigned this Nov 7, 2023
@jbowens jbowens added the P-3 Issues/test failures with no fix SLA label Jan 10, 2024
@jbowens
Copy link
Collaborator Author

jbowens commented Jan 10, 2024

This is scheduled for the 24.1 release cycle.

@jbowens
Copy link
Collaborator Author

jbowens commented Jan 17, 2024

In November I started prototyping this and #104114 in jbowens@7103152.

The reading of mountpoints may not actually be necessary.

craig bot pushed a commit that referenced this issue Mar 18, 2024
115375: changefeedccl: reduce rebalancing memory usage from O(ranges) to O(spans) r=jayshrivastava a=jayshrivastava

### sql: count ranges per partition in PartitionSpans

This change updates span partitioning to count ranges while making
partitions. This allows callers to rebalance partitions based on
range counts without having to iterate over the spans to count
ranges.

Release note: None
Epic: None

### changefeedccl: reduce rebalancing memory usage from O(ranges) to O(spans) #115375

Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change
rewrites it to use range iterators, reducing the memory usage to O(spans).

This change also adds a randomized test to assert that all spans are accounted for after
rebalancing. It also adds one more unit test.

Informs: #113898
Epic: None

### changefeedccl: add rebalancing checks

This change adds extra test coverage for partition rebalancing in
changefeeds. It adds checks which are performed after rebalancing
to assert that the output list of spans covers exactly the same keys
as the input list of spans. These checks are expensive so they only
run if the environment variable `COCKROACH_CHANGEFEED_TESTING_REBALANCING_CHECKS`
is true. This variable is true in cdc roachtests and unit tests.

Release note: None
Epic: None

119885: storage: support per-store IO metrics with fine granularity r=jbowens,abarganier a=CheranMahalingam

Currently, timeseries metrics are collected on a 10s interval which hides momentary spikes in IO. This commit introduces a central disk monitoring system that polls for disk stats at a 100ms interval. Additionally, the current system accumulates disk metrics across all block devices which includes noise from unrelated processes. This commit also adds support for exporting per-store IO metrics (i.e. IO stats for block devices that map to stores used by Cockroach).

These changes will be followed up by a PR to remove the need for customers to specify disk names when setting the provisioned bandwidth for each store as described in #109350.

Fixes: #104114, #112898.
Informs: #89786.

Epic: None.
Release note: None.

120649: changefeedccl: avoid undefined behavior in distribution test r=wenyihu6 a=jayshrivastava

The `rangeDistributionTester` would sometimes calculate log(0) when determining the node to move a range too. Most of the time, this would be some garbage value which gets ignored. Sometimes, this may return a valid node id, causing the range distribution to be wrong and failing the test failures. This change updates the tester to handle this edge case.

Closes: #120470
Release note: None

Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
Co-authored-by: Cheran Mahalingam <cheran.mahalingam@cockroachlabs.com>
@craig craig bot closed this as completed in #119885 Mar 18, 2024
@jbowens jbowens added this to Storage Jun 4, 2024
@jbowens jbowens moved this to Done in Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-storage Storage Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants