Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds alerts for thin pool usage #151

Merged
merged 1 commit into from
Apr 21, 2022

Conversation

sp98
Copy link
Contributor

@sp98 sp98 commented Apr 13, 2022

Adds alerts for thin pool data and metadata usage.
Provides near full (75%) and critical (85%) usage alerts.

Note: We don't have thin pool name in the alerts since topolvm does not provide it as of now. But since we support only one thin pool per volume group, current alerts should suffice (IMO).
We can update it to include thin pool names once topolvm shows thin pool name in the metric labels.

Tested with lower thresholds.
Screenshot from 2022-04-20 12-40-22

Screenshot from 2022-04-20 12-41-04

Signed-off-by: Santosh Pillai sapillai@redhat.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 13, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@sp98 sp98 force-pushed the thin-pool-alerts branch from a9c657c to f82f855 Compare April 20, 2022 07:11
@sp98 sp98 marked this pull request as ready for review April 20, 2022 07:13
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2022
@sp98 sp98 force-pushed the thin-pool-alerts branch from f82f855 to 9d25b63 Compare April 20, 2022 07:44
@sp98 sp98 requested a review from nbalacha April 20, 2022 07:46
Copy link
Contributor

@nbalacha nbalacha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be an issue with the percentages displayed in the messages.

Copy link
Contributor

@aruniiird aruniiird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some values are not right. We have to correct it in the libsonnet file

config/prometheus/prometheus_rules.yaml Outdated Show resolved Hide resolved
config/prometheus/prometheus_rules.yaml Outdated Show resolved Hide resolved
config/prometheus/prometheus_rules.yaml Outdated Show resolved Hide resolved
- "alert": "ThinPoolMetaDataUsageAtThresholdCritical"
"annotations":
"description": "Thin pool metadata ultilization in the VolumeGroup is critically full. Data deletion or thin pool expansion is required."
"message": "Thin Pool metadata utilization in the VolumeGroup {{ $labels.device_class }} has crossed 8500 % on node {{ $labels.node }}. Free up some space or expand the thin pool immediately."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"message": "Thin Pool metadata utilization in the VolumeGroup {{ $labels.device_class }} has crossed 8500 % on node {{ $labels.node }}. Free up some space or expand the thin pool immediately."
"message": "Thin Pool metadata utilization in the VolumeGroup {{ $labels.device_class }} has crossed 85 % on node {{ $labels.node }}. Free up some space or expand the thin pool immediately."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Also marking other similar comments are resolved.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 20, 2022

@aruniiird: changing LGTM is restricted to collaborators

In response to this:

Some values are not right. We have to correct it in the libsonnet file

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Adds alerts for thin pool data and metadata usage.
Provides near full (75%) and critical (85%) usage alerts.

Signed-off-by: Santosh Pillai <sapillai@redhat.com>
@sp98 sp98 force-pushed the thin-pool-alerts branch from 9d25b63 to 1383957 Compare April 21, 2022 07:34
@sp98 sp98 requested a review from nbalacha April 21, 2022 07:36
@sp98
Copy link
Contributor Author

sp98 commented Apr 21, 2022

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 21, 2022

@sp98: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lvm-operator-bundle-e2e-aws 1383957 link false /test lvm-operator-bundle-e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

- "alert": "ThinPoolDataUsageAtThresholdCritical"
"annotations":
"description": "Thin pool in the VolumeGroup is critically full. Data deletion or thin pool expansion is required."
"message": "Thin Pool data utilization in the VolumeGroup {{ $labels.device_class }} has crossed 85 % on node {{ $labels.node }}. Free up some space or expand the thin pool immediately."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this generated by the jsonnet? I don't see any changes to the libsonnet file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. These changes are in the monitoring/alerts/vgalerts.libsonnet file.

@sp98 sp98 requested a review from nbalacha April 21, 2022 14:08
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 21, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nbalacha, sp98

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2022
@openshift-merge-robot openshift-merge-robot merged commit cba46e1 into openshift:main Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants