Mimir mixin Disk space utilization panels broken for mimir helm chart #7515

jmichalek132 · 2024-03-01T10:02:19Z

Describe the bug

The disk space utilization panels in the mimir mixin don't work with mimir deployed using the mimir-distributed helm chart.

Query example from:

max by(persistentvolumeclaim) (
  kubelet_volume_stats_used_bytes{cluster=~"$cluster", namespace=~"$namespace"} /
  kubelet_volume_stats_capacity_bytes{cluster=~"$cluster", namespace=~"$namespace"}
)
and
count by(persistentvolumeclaim) (
  kube_persistentvolumeclaim_labels{
    cluster=~"$cluster", namespace=~"$namespace",
    label_name=~"(ingester).*"
  }
)

Problematic part is label_name=~"(ingester).*" on the kube_persistentvolumeclaim_labels metric coming from.

There are 2 issues with it;

If you the kube-state-metrics service monitor from metamonitoring it will drop the kube_persistentvolumeclaim_labels metric due to this metric relabeling rule
The PVCs produced don't have this label:

  labels:
    app.kubernetes.io/component: ingester
    app.kubernetes.io/instance: metrics
    app.kubernetes.io/name: mimir
    rollout-group: ingester
    zone: zone-a

To Reproduce

Steps to reproduce the behavior:

Start mimir distributed helm chart
Deploy the mimir mixin

Expected behavior

For Disk space utilization panels to work.

Environment

Infrastructure: Kubernetes, AKS
Deployment tool: Helm

Additional Context

Not sure what would be the best way to fix this.
Things that come to mind

Update the service monitor to not drop more used metrics by the mixin
Allow configuring which labels is used instead of label_name in those panels

Willing to submit a PR to address this after feedback.

The text was updated successfully, but these errors were encountered:

dimitarvdimitrov · 2024-03-01T15:03:20Z

Keeping the kube_persistentvolumeclaim_labels metric just for the pods from mimir looks tricky. Can we use the 'app.kubernetess.io/name: mimir' inside the relabelling config to pick this up? To avoid scraping other mimir clusters in the same k8s cluster we can add another 'keep' relabelling which checks the namespace label (and need to make sure it's present on all currently collected metrics)

Regarding the second problem of using different labels - is it possible to use label_replace in the panel query so we take either 'label_rollout_group' or 'label_name' - whichever is present? I.e. replace label_name woth the other only if the other is non-empty, maybe theres something with regex we can do?

QuentinBisson · 2024-04-17T21:47:59Z

@dimitarvdimitrov I'm just facing this issue and would using the app.kubernetes.io/component: ingester label not be a better choice than rollout group for the query? I can work on that as I really need this to be fixed

dimitarvdimitrov · 2024-04-22T13:59:08Z

using app.kubernetess.io/name: mimir makes this slightly more reusable - we don't have to update the scraping rules every time a new component has disk (or we rename the component or add disk to an existing component, etc). If that's not possible, then app.kubernetes.io/component should also suffice

QuentinBisson · 2024-04-23T08:32:42Z

I think the issue with using app.kubernetess.io/name is that it would not work on specific write dashboard because it would write thé data of all Mimir components instead of just the ingester

QuentinBisson · 2024-04-25T12:31:05Z

See PR here #7968

QuentinBisson · 2024-04-29T10:17:32Z

Let me know how I can speed things up :)

dimitarvdimitrov · 2024-05-03T17:01:30Z

Using a common label on the `kube_persistentvolumeclaim_labels` selector

My suggestion in the issue was to add support for both labels in panel via label_replace or label_join promQL function. But I realized this won't work because we cannot filter on the series labels outside of the vector selector (kube_persistentvolumeclaim_labels{...})

Another option for solving the selector problem is to add the standard kubernetes labels (app.kubernetes.io/*) to the jsonnet mixin and use that in the dashboards like the ones here

mimir/operations/helm/charts/mimir-distributed/templates/_helpers.tpl

Lines 226 to 237 in f573a03

    
           app.kubernetes.io/name: {{ include "mimir.name" .ctx }} 
        
           app.kubernetes.io/instance: {{ .ctx.Release.Name }} 
        
           {{- if .component }} 
        
           app.kubernetes.io/component: {{ .component }} 
        
           {{- end }} 
        
           {{- if .memberlist }} 
        
           app.kubernetes.io/part-of: memberlist 
        
           {{- end }} 
        
           {{- if .ctx.Chart.AppVersion }} 
        
           app.kubernetes.io/version: {{ .ctx.Chart.AppVersion | quote }} 
        
           {{- end }} 
        
           app.kubernetes.io/managed-by: {{ .ctx.Release.Service }}

QuentinBisson · 2024-05-03T17:59:09Z

@dimitarvdimitrov I'm not sure what you mean by adding the label to the jsonnet mixin, I'm fine implementing a working solution though :D

The main issue with using the kube_persistentvolumeclaim_labels is that the newer versions of kube-state-metrics do not expose any labels by defaults (since 2.11 I think) and they need to be explicitely asked for

dimitarvdimitrov · 2024-05-06T10:05:57Z

adding the label to the jsonnet mixin

I meant adding the label to the resources created by the jsonnet library to deploy Mimir. This will help with having a single label selector in the promQL query because it will match both jsonnet and helm deployments because they will have some label in common

The main issue with using the kube_persistentvolumeclaim_labels is that the newer versions of kube-state-metrics do not expose any labels by defaults (since 2.11 I think) and they need to be explicitely asked for

That doesn't sound great. So the metric is just empty by default? I couldn't find this change in the changelog. Do you have a link to the PR or changelog entry?

QuentinBisson · 2024-05-06T10:14:02Z

I was wrong, this happened in kube-state-metrics 2.10 https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.10.0 (cf. the top message)

QuentinBisson · 2024-05-06T11:22:44Z

Regarding the fix I would assume this would be needed under operations/mimir? In that case I'm not sure I would be the best to fix it because I'm definitely lost in this folder

QuentinBisson · 2024-05-14T20:43:14Z

@dimitarvdimitrov what do you think about using something like

kubelet_volume_stats_used_bytes{cluster_id=~"$cluster", namespace=~"$namespace", persistentvolumeclaim=~".*(ingester)-.*"}
/
kubelet_volume_stats_capacity_bytes{cluster_id=~"$cluster", namespace=~"$namespace", persistentvolumeclaim=~".*(ingester)-.*"}

instead of relying on the kube-state-metrics labels metric?

dimitarvdimitrov · 2024-05-24T14:16:40Z

That's a neat idea and is much simpler too. I don't see a reason why it won't work. Happy to review a PR to carry out that change

QuentinBisson · 2024-05-29T14:05:34Z

PR to fix it #8212

QuentinBisson mentioned this issue Apr 25, 2024

Mixin: Fix disk space utilization panel #7968

Closed

4 tasks

dimitarvdimitrov added helm mixin labels May 3, 2024

QuentinBisson mentioned this issue May 29, 2024

fix(mixins): disk space utilization panels #8212

Merged

4 tasks

dimitarvdimitrov closed this as completed in #8212 Jun 5, 2024

QuentinBisson mentioned this issue Jul 11, 2024

fix(mixins): disk space utilization panels with latest KSM versions grafana/loki#13486

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mimir mixin Disk space utilization panels broken for mimir helm chart #7515

Mimir mixin Disk space utilization panels broken for mimir helm chart #7515

jmichalek132 commented Mar 1, 2024

dimitarvdimitrov commented Mar 1, 2024 •

edited

Loading

QuentinBisson commented Apr 17, 2024 •

edited

Loading

dimitarvdimitrov commented Apr 22, 2024

QuentinBisson commented Apr 23, 2024

QuentinBisson commented Apr 25, 2024

QuentinBisson commented Apr 29, 2024

dimitarvdimitrov commented May 3, 2024

QuentinBisson commented May 3, 2024

dimitarvdimitrov commented May 6, 2024

QuentinBisson commented May 6, 2024

QuentinBisson commented May 6, 2024

QuentinBisson commented May 14, 2024 •

edited by dimitarvdimitrov

Loading

dimitarvdimitrov commented May 24, 2024 •

edited

Loading

QuentinBisson commented May 29, 2024

Mimir mixin Disk space utilization panels broken for mimir helm chart #7515

Mimir mixin Disk space utilization panels broken for mimir helm chart #7515

Comments

jmichalek132 commented Mar 1, 2024

Describe the bug

To Reproduce

Expected behavior

Environment

Additional Context

dimitarvdimitrov commented Mar 1, 2024 • edited Loading

QuentinBisson commented Apr 17, 2024 • edited Loading

dimitarvdimitrov commented Apr 22, 2024

QuentinBisson commented Apr 23, 2024

QuentinBisson commented Apr 25, 2024

QuentinBisson commented Apr 29, 2024

dimitarvdimitrov commented May 3, 2024

Using a common label on the kube_persistentvolumeclaim_labels selector

QuentinBisson commented May 3, 2024

dimitarvdimitrov commented May 6, 2024

QuentinBisson commented May 6, 2024

QuentinBisson commented May 6, 2024

QuentinBisson commented May 14, 2024 • edited by dimitarvdimitrov Loading

dimitarvdimitrov commented May 24, 2024 • edited Loading

QuentinBisson commented May 29, 2024

dimitarvdimitrov commented Mar 1, 2024 •

edited

Loading

QuentinBisson commented Apr 17, 2024 •

edited

Loading

Using a common label on the `kube_persistentvolumeclaim_labels` selector

QuentinBisson commented May 14, 2024 •

edited by dimitarvdimitrov

Loading

dimitarvdimitrov commented May 24, 2024 •

edited

Loading