Skip to content

Commit

Permalink
Merge pull request #2866 from aruniiird/add-new-CPU-usage-alerts-for-…
Browse files Browse the repository at this point in the history
…vertical-and-horizontal-scaling

Add alerts to notify vertical or horizontal scaling
  • Loading branch information
openshift-merge-bot[bot] authored Nov 15, 2024
2 parents a568b2a + 377ceb8 commit 07cbcfa
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions metrics/deploy/prometheus-ocs-rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -407,13 +407,19 @@ spec:
- alert: MDSCPUUsageHigh
annotations:
description: |-
Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage.
Please consider increasing the CPU request for the {{ $labels.pod }} pod as described in the runbook.
Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage
{{if query "rate(ceph_mds_request[6h]) >= 1000"}} and cannot cope
up with the current rate of mds requests. Please consider Horizontal
scaling, by adding another MDS pod{{else}}. Please consider Vertical
scaling, by adding more resources to the existing MDS pod{{end}}.
Please see 'runbook_url' for more details.
message: Ceph metadata server pod ({{ $labels.pod }}) has high cpu usage
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHigh.md
runbook_url: '{{if query "rate(ceph_mds_request[6h]) >= 1000"}}https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHighNeedsHorizontalScaling.md
{{else}}https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCpuUsageHighNeedsVerticalScaling.md
{{end}}'
severity_level: warning
expr: |
pod:container_cpu_usage:sum{pod=~"rook-ceph-mds.*"}/ on(pod) kube_pod_resource_limit{resource='cpu',pod=~"rook-ceph-mds.*"} > 0.67
label_replace(pod:container_cpu_usage:sum{pod=~"rook-ceph-mds.*"}/ on(pod, namespace) kube_pod_resource_limit{resource='cpu',pod=~"rook-ceph-mds.*"}, "ceph_daemon", "mds.$1", "pod", "rook-ceph-mds-(.*)-(.*)") + on (ceph_daemon, namespace) group_left(managedBy) (0 * (ceph_mds_metadata ==1)) > 0.67
for: 6h
labels:
severity: warning
Expand Down

0 comments on commit 07cbcfa

Please sign in to comment.