Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes compatible dashboards (alt) #60

Merged
merged 9 commits into from
Mar 26, 2021
Merged

Kubernetes compatible dashboards (alt) #60

merged 9 commits into from
Mar 26, 2021

Conversation

mrfelton
Copy link
Contributor

Alternate version of #42 that allows filtering by node and namespace.

grafana/provisioning/dashboards/network.json Outdated Show resolved Hide resolved
"tags": [],
"text": "default",
"value": "default"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be set below where the namespace label/name is actually defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you're referring to. The order of the filters is datasource, namespace, node which is a logical ordering. What specifically do you think should be moved lower?

@Roasbeef
Copy link
Member

Set of changes looks sound, but will need to test this out both in the normal config, as well as how we deploy this in our own k8s instance as well to ensure nothing breaks.

Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this our on our cluster and ran into some issues, but I think this is really close! Would be great to get this in for a new lndmon release.

grafana/provisioning/dashboards/peers.json Outdated Show resolved Hide resolved
@@ -435,7 +435,7 @@
"steppedLine": false,
"targets": [
{
"expr": "go_memstats_heap_objects{pod=\"$node\"}",
"expr": "go_memstats_heap_objects{namespace=\"$namespace\",pod=\"$node\"}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this our on our cluster, and our metrics have kubernetes_namespace rather than namespace - versioning difference perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about this. We're using prometheus operator to deploy prometheus and grafana and in all of the templates installed by that the variable namespace is used. I can't see any examples of kubernetes_namespace being used.Tried searching the web to understand the difference by wasn't able to find anything relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some testing on this. It doesn't matter what the variable is called, as long as the query behind it is label_values(namespace) as per https://github.com/lightninglabs/lndmon/pull/60/files#diff-c1ba9705a49adb4580928d95e75b7973c367a5b1f5142a44ac5a9f97c3fccd1cR427

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlaKC just FYI, I was able to solve this with a metric_relabel_configs setting in Prometheus, copying the kubernetes labels into ones named namespace and pod.
@mrfelton: we're using the BKPR stack which has different naming schemes for the prometheus labels apparently. But I don't think that should be the concern for this PR if we can solve it with a label rewrite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, but I think we should add some docs to the PR noting the labeling in k8s and relabelling so that folks know how to use these additions?

grafana/provisioning/dashboards/chain.json Show resolved Hide resolved
@lispmeister
Copy link

Looking at this PR I'm wondering why this needs to be done by lndmon itself. Usually you want the data that's being collected by Prometheus to be as uniform as possible and then use labels to categorize multiple instances of the same data from different contexts.

Example from one of our Helm charts that exposes a scraping service to Prometheus:

apiVersion: v1
kind: Service
metadata:
  name: {{ template "nautilus.fullname" . }}-prometheus
  labels:
    app: {{ template "nautilus.name" . }}
    chart: {{ template "nautilus.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
    lnd_instance: {{ .Values.nautilus.prometheus.lndInstance }}
    cluster_name: {{ .Values.clusterName }}
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "{{ .Values.nautilus.prometheus.port }}"
spec:
  ports:
    - name: prometheus
      protocol: TCP
      port: {{ .Values.nautilus.prometheus.port }}
      targetPort: {{ .Values.nautilus.prometheus.port }}
  selector:
    app: {{ template "nautilus.name" . }}
    release: {{ .Release.Name }}

Notice you we label data collected from nautilus by lnd_instance and clusterName. The annotations tell fluentd how to scrape the data.

@mrfelton
Copy link
Contributor Author

mrfelton commented Dec 6, 2020

@lispmeister this PR doesn't change how lndmon collects the data, it alters how grafana makes the data available.

Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes to our labelling, this works for vanilla lndmon and in our prod setup.
To proceed with this PR, I think it needs some docs noting the labels that promoperator adds, and some instructions on relabelling.

With those included, I'm happy for this to go in.

@guggero
Copy link
Member

guggero commented Jan 29, 2021

I agree that some doc would be nice.
Here's how we relabel for those dashboards in prometheus.yml:

scrape_configs:
  - job_name: kubernetes-service-endpoints
    metric_relabel_configs:                                                                                                                                                                                  
    - source_labels:
      - kubernetes_namespace
      target_label: namespace
    - source_labels:
      - lnd_instance
      target_label: pod

Feel free to add that as an example.

@mrfelton
Copy link
Contributor Author

I've rebased this PR, again. Would be good to see it merged. I don't fully understand the point about relabelling or how or where it should be documented. Maybe someone else that better understand that can add that documentation in a follow up PR.

Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@guggero guggero requested a review from carlaKC March 26, 2021 10:23
Copy link
Contributor

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested out with docker-compose and on our infra, LGTM!

@carlaKC
Copy link
Contributor

carlaKC commented Mar 26, 2021

Thanks for sticking with this one @mrfelton! Nice improvement 🎉

@guggero guggero merged commit 3aa925a into lightninglabs:master Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants