Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix or replace dashboards using Angular components #674

Merged
merged 1 commit into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,877 changes: 650 additions & 1,227 deletions roles/certmanager/files/grafana_dashboard.json

Large diffs are not rendered by default.

38 changes: 22 additions & 16 deletions roles/certmanager/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,22 +36,28 @@
wait: yes
wait_timeout: "{{ certmanager_wait_timeout }}"

- name: Install Grafana dashboard for cert-manager metrics
command: kubectl apply -f -
args:
stdin: "{{ certmanager_dashboard_definition | to_nice_yaml }}"
vars:
certmanager_dashboard_definition:
apiVersion: v1
kind: ConfigMap
metadata:
name: cert-manager-grafana-dashboard
namespace: "{{ certmanager_release_namespace }}"
labels:
grafana_dashboard: "1"
data:
certmanager_dashboard.json: |-
{{ lookup("file", "grafana_dashboard.json") | from_json | to_nice_json }}
- block:
- name: Install Grafana dashboard for cert-manager metrics
command: kubectl apply -f -
args:
stdin: "{{ certmanager_dashboard_definition | to_nice_yaml }}"
vars:
certmanager_dashboard_definition:
apiVersion: v1
kind: ConfigMap
metadata:
name: cert-manager-grafana-dashboard
namespace: "{{ certmanager_release_namespace }}"
labels:
grafana_dashboard: "1"
data:
certmanager_dashboard.json: |-
{{ lookup("file", "grafana_dashboard.json") | from_json | to_nice_json }}

- name: Configure custom alerting rules for cert-manager
command: kubectl apply -f -
args:
stdin: "{{ lookup('template', 'prometheusrule.yaml.j2') }}"
when: certmanager_monitoring_enabled

- block:
Expand Down
53 changes: 53 additions & 0 deletions roles/certmanager/templates/prometheusrule.yaml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cert-manager-alerting-rules
namespace: "{{ certmanager_release_namespace }}"
labels:
release: kube-prometheus-stack
{% raw %}
spec:
groups:
- name: cert-manager.rules
rules:
- alert: CertManagerAbsent
annotations:
description: >-
New certificates will not be able to be minted, and existing ones can't
be renewed until cert-manager is back.
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagerabsent
summary: Cert Manager has disappeared from Prometheus service discovery.
expr: absent(up{job="cert-manager"})
for: 10m
labels:
severity: critical

- alert: CertManagerCertExpirySoon
annotations:
description: >-
The domain that this cert covers will be unavailable after {{ $value | humanizeDuration }}.
Clients using endpoints that this cert protects will start to fail in {{ $value | humanizeDuration }}.
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagercertexpirysoon
summary: The cert `{{ $labels.name }}` is {{ $value | humanizeDuration }} from expiry.
expr: |
avg by (exported_namespace, namespace, name) (
certmanager_certificate_expiration_timestamp_seconds - time()
) < (30 * 24 * 3600)
for: 1h
labels:
severity: warning

- alert: CertManagerHittingRateLimits
annotations:
description: >-
Depending on the rate limit, cert-manager may be unable to generate certificates for up to a week.
runbook_url: https://github.com/imusmanmalik/cert-manager-mixin/blob/main/RUNBOOK.md#certmanagerhittingratelimits
summary: Cert manager hitting LetsEncrypt rate limits.
expr: |
sum by (host) (
rate(certmanager_http_acme_client_request_count{status="429"}[5m])
) > 0
for: 5m
labels:
severity: critical
{% endraw %}
Loading
Loading