Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

incomplete install instructions? #58

Open
sensay-nelson opened this issue Jan 28, 2019 · 35 comments
Open

incomplete install instructions? #58

sensay-nelson opened this issue Jan 28, 2019 · 35 comments

Comments

@sensay-nelson
Copy link

It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.

A user must manually do the following for this to work:

  • install the configmap (provided in the grafana kubernetes app configuration interface)
  • run a prometheus pod which uses said configmap to scrape metrics.

Without these steps, almost no metrics will work. These requirements are missing from the readme.

The Deploy button will deploy the following:
(1) A promtheus configmap which contains the prometheus jobs that collect metrics used by the dashboards in the kubernetes app
  - Incorrect, the grafana kuberentes app does not do this ^^
(2) a Node Exporter deployment, and 
(3) a Kube-State Metrics deployment

Unless I am missing something else possibly?

@sensay-nelson
Copy link
Author

Looking closer, I'm seeing a lot of these errors in the kube-state-metrics pods
So perhaps my issue is permissions related.

Do we know what permissions this container requires? I do not see any serviceaccountName in the deployment configuration. Once we know permissions, how are they to be assigned?

E0128 04:33:50.024112       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/replicaset.go:87: Failed to list *v1beta1.ReplicaSet: replicasets.extensions is forbidden: User "system:serviceaccount:kube-system:default" cannot list replicasets.extensions at the cluster scope
E0128 04:33:50.114785       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/resourcequota.go:67: Failed to list *v1.ResourceQuota: resourcequotas is forbidden: User "system:serviceaccount:kube-system:default" cannot list resourcequotas at the cluster scope

@sensay-nelson
Copy link
Author

sensay-nelson commented Jan 28, 2019

I fixed the permission issues by adding a serviceaccount to the kubernetes configuration for kube-state-metrics with the following configs.
Unfortunately, it did not improve any information available in the dashboard. error messaging in kube-state-metrics container reduced significantly though.

kubectl --namespace=kube-system create -f kube-state-metrics-role.yaml

kube-state-metrics-role.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: kube-state-metrics
roleRef:
  kind: ClusterRole
  name: kube-state-metrics
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    grafanak8sapp: "true"
    k8s-app: kube-state-metrics
  name: kube-state-metrics
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      grafanak8sapp: "true"
      k8s-app: kube-state-metrics
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        grafanak8sapp: "true"
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - image: quay.io/coreos/kube-state-metrics:v1.1.0
        imagePullPolicy: IfNotPresent
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

@sensay-nelson
Copy link
Author

i added a service for the node_exporter to expose 9100 and set the Prometheus datasource in grafana to it. I don't think this is what I'm supposed to do, and naturally it still doesn't work.

@sensay-nelson
Copy link
Author

I also created a service for kube-state-metrics, still not the metrics this is looking for.
For instance, if I am looking at the "K8s Container" dashboard, the "Total Memory Usage" panel.
it's trying to calculate the following:

sum(container_memory_usage_bytes{pod_name=~"$pod"}) by (pod_name)

If I examine with query inspector, it's trying to do the following. Which I assume it is trying to hit the kubernetes api? although api/v1/query_range is not a k8 route I'm familiar with. Right now my Prometheus data source is set to the kube-state-metrics, and obviously this query fails. I'm not sure how to update it to check the kubernetes datasource again. the way this app works is very confusing.

xhrStatus:"complete"
request:Object
method:"GET"
url:"api/datasources/proxy/17/api/v1/query_range?query=sum(container_memory_usage_bytes%7Bpod_name%3D~%22contact-bot-7779886947-pwfqd%7Cdfuse-events-77f74d44bc-n9lmm%7Ceos-monitor-568bf8688-sm96w%7Ckeosd-7f4d745d-lp45w%7Clogspout-papertrail-7ql54%7Clogspout-papertrail-wrsm2%7Cmake-sense-app-fdffd4495-nwzr4%7Cmetabase-9ff4bdf5c-vc75k%7Cnats-7958747d76-ccvrb%7Cnginx-7bc66d857-dz5hs%7Cnginx-7bc66d857-lkjql%7Cprometheus-7d584c557-4xc2j%7Csendy-59b94fb496-8rxxl%7Csense-registration-5bb8f77b5c-k7xkl%7Csense-registration-5bb8f77b5c-pnbqz%7Csensetoken-6554c74bb7-75qrl%7Csensetoken-6554c74bb7-zxkcw%7Cdns-controller-6f9fb9cf78-849nb%7Cetcd-server-events-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Cetcd-server-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-apiserver-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-controller-manager-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-dns-7c4d8456dd-hwq49%7Ckube-dns-7c4d8456dd-vnws9%7Ckube-dns-autoscaler-f4c47db64-wm4cs%7Ckube-proxy-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-proxy-ip-172-22-20-181%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-proxy-ip-172-22-30-172%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-scheduler-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-state-metrics-699cf64f48-8t46r%7Ckubernetes-dashboard-7798c48646-kfg4h%7Cnode-exporter-cz6hj%7Cnode-exporter-wrbzs%7Cweave-net-l2bqf%7Cweave-net-pzfwc%7Cweave-net-twgsk%22%7D)%20by%20(pod_name)&start=1548678030&end=1548679845&step=15"

@sensay-nelson
Copy link
Author

w00t. some progress. As I suspected at the start, in addition to the the kube-state-metrics and node-exporter, you need to manually create the config for a prometheus pod with the provided configmap, expose this pod via a service and then use that as the prometheus data source in the k8s app config for this cluster. I am now getting metrics for these dashboards:

  • K8s Cluster dashboard
  • K8s Deployments
  • K8s Nodes
    "K8s Container" dashboard shows the containers, but none of the metrics are working sadly.

I would love get the per-pod metrics working as these are probably the most useful stats for establishing resource constraints - one of the more challenging tasks in managing a k8s cluster. If I can get this last part figured out, I'll wrap up all these findings in a pull request (improving the readme if nothing else)

@sensay-nelson
Copy link
Author

So far, I have narrowed it down to the prometheus configs for getting cadvisor is not populating, naming is off, or possibly a permissions issue.
This is the query from grafan kubernetes app (k8s container dashboard memory usage)

sum(container_memory_usage_bytes{pod_name=~"my-container-7779886947-pwfqd|other-container-77f74d44bc-n9lmm"}) by (pod_name)

However, when I go directly to the /metrics in prometheus, container_memory_usage_bytes is not an optional metric.
The prometheus config looks correct though. If I hit the metrics/cadvisor routes from a kube proxy to the api, the prometheus output is there:

http://127.0.0.1:8001/api/v1/nodes/my-hostname/proxy/metrics/cadvisor

...
container_memory_usage_bytes{container_name="",id="/",image="",name="",namespace="",pod_name=""} 5.430976512e+09
...

Prometheus configuration.

    scrape_configs:
    - job_name: 'kubernetes-kubelet'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics
    - job_name: 'kubernetes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-kube-state'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
        regex: .*true.*
        action: keep
      - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
        regex: 'node-exporter;(.*)'
        action: replace
        target_label: nodename

Queries for kube-state appear to be fine.
Perhaps it's a permission issue? My prometheus container is using the same role I posted above.

@sensay-nelson
Copy link
Author

yep, permission issue.
started prometheus with log-level debug:
command: ["prometheus","--config.file=/etc/prometheus/prometheus.yml","--log.level=debug"]
and there are the beautiful 403's.

level=debug ts=2019-01-29T05:34:37.90956911Z caller=scrape.go:825 component="scrape manager" scrape_pool=kubernetes-kubelet target=https://kubernetes.default.svc:443/api/v1/nodes/ip-172-22-20-181.us-west-2.compute.internal/proxy/metrics msg="Scrape failed" err="server returned HTTP status 403 Forbidden"
level=debug ts=2019-01-29T05:34:41.453264939Z caller=scrape.go:825 component="scrape manager" scrape_pool=kubernetes-cadvisor target=https://kubernetes.default.svc:443/api/v1/nodes/ip-172-22-10-5.us-west-2.compute.internal/proxy/metrics/cadvisor msg="Scrape failed" err="server returned HTTP status 403 Forbidden"

@sensay-nelson
Copy link
Author

sensay-nelson commented Jan 29, 2019

Oh, this is one of those fun problems that make you question all of your life decisions.

This can't be fixed with simple rbac rules, it requires flags set on the kubelet which nicely reduce security. Problem and solutions nicely summarized prometheus-operator/prometheus-operator#633

While it's easy to accomplish manually on currently running nodes, in order to be maintained you will need to dig into whatever tool you use to create/manage clusters.
I'm using kops. This is how to get it done on running nodes:

sudo vi /etc/sysconfig/kubelet

add 
--authentication-token-webhook=true --authorization-mode=Webhook
to the DAEMON_ARGS

sudo systemctl restart kubelet

edit: nope, not quite. adding those flags may have worked, but it blocked cert authentication to view logs using kubectl with cert authentication. Reading carefully at https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus#prerequisites
it appears that when --authorization-mode=Webhook, than cert authorization will not work - it's one or the other. existing solutions assume that a cluster is setup with pure rbac and authorization, which is not the case for my kops cluster unfortunately.

i see some solutions around using an http rather than https request, kubernetes/kops#5176 (comment)
but i'm unsure how to manually alter the prometheus configs. the prometheus-operator, like all things helm, is difficult to stitch together what the final templates look like.

@sensay-nelson
Copy link
Author

sensay-nelson commented Jan 29, 2019

after 3 days of troubleshooting, i unfortunately must concede defeat. if anyone get's this to work on a k8 cluster 1.8+, please do chime in.

edit: I CONCEDE NOTHING! Finally got the K8 Container Dashboard to work and now have memory stats by container...yay!

I came across this post again (which ironically, one of the first things i read in troubleshooting)
prometheus/prometheus#2918
It made a little more sense to me now. Mucking around with the routes in kubectl proxy I was able to find a config that works.

Everything will vary a little bit based on your cluster setup. The material component appears to be my usage of kops vs kubeadm for cluster setup.

  • Kops does not currently support Webhook authorization flag on kubelet - this is required to access the metrics/cadvisor route via the api proxy to the kubelet on the standard port and basically the underlying issue.

  • Kops also does not disable the insecure no-auth required port 4194/proxy/metrics route by default, which provides us with a viable solution. node:4194/proxy/metrics (in at least k8 1.8) includes the stats that are normally accessed via metrics/cadvisor.

kubeadm does the opposite of both of those, for better or worse, which is why getting a straight answer has been challenging.

This is my final configmap for the prometheus scraper. Only one line is modified on kubernetes-cadvisor replacement: /api/v1/nodes/${1}:4194/proxy/metrics

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus
  namespace: kube-system
data:
  prometheus.yml: |
    - job_name: 'kubernetes-kubelet'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics

    - job_name: 'kubernetes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}:4194/proxy/metrics

    - job_name: 'kubernetes-kube-state'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
        regex: .*true.*
        action: keep
      - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
        regex: 'node-exporter;(.*)'
        action: replace
        target_label: nodename

The ServiceAccount ClusterRole for prometheus can be grabbed from:
https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup.yml

If your curious, the prometheus-kubernetes.yml example in that same directory has some conflicting information in regards to the scrape configs depending on the version of k8s deployed. There is a versioning issue, but also the issue with how your cluster and kubelet authorization and authentication flags are set, which is not addressed. I'll aim for a pull request which will hopefully add clarity to the situation.

@illectronic
Copy link

I keep getting "Query support not implemented yet" in the dashboard. But I can see all the pod metrics individually. Any ideas?

@sensay-nelson
Copy link
Author

@illectronic that error comes from this repo

throw new Error("Query Support not implemented yet.");

Looks like it's related to the kubernetes api source, but i'm not sure what triggers it exactly.

@sensay-nelson
Copy link
Author

somewhere in my toiling I lost the node data :( . not having an arch diagram that describes the source aggregation is driving me nuts.

@sakthishanmugam02
Copy link

It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.

A user must manually do the following for this to work:

  • install the configmap (provided in the grafana kubernetes app configuration interface)
  • run a prometheus pod which uses said configmap to scrape metrics.

Without these steps, almost no metrics will work. These requirements are missing from the readme.

The Deploy button will deploy the following:
(1) A promtheus configmap which contains the prometheus jobs that collect metrics used by the dashboards in the kubernetes app
  - Incorrect, the grafana kuberentes app does not do this ^^
(2) a Node Exporter deployment, and 
(3) a Kube-State Metrics deployment

Unless I am missing something else possibly?

hi, I am stuck; not sure what i have to do with respect to prometheus. could you pls help me with that;

i have running k8s cluster and grafana configuration is set up; what i have to do with respect to prometheus; pls help me pointers right from installation?

@sakthishanmugam02
Copy link

It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.

A user must manually do the following for this to work:

  • install the configmap (provided in the grafana kubernetes app configuration interface)
  • run a prometheus pod which uses said configmap to scrape metrics.

Without these steps, almost no metrics will work. These requirements are missing from the readme.

The Deploy button will deploy the following:
(1) A promtheus configmap which contains the prometheus jobs that collect metrics used by the dashboards in the kubernetes app
  - Incorrect, the grafana kuberentes app does not do this ^^
(2) a Node Exporter deployment, and 
(3) a Kube-State Metrics deployment

Unless I am missing something else possibly?

how to install configmap? which one? could you please elaborate?
also how to download prometheus pod and run?

@sensay-nelson
Copy link
Author

@sakthishanmugam02 im working on a pull request that should hopefully help you. give me a few hours.

@sensay-nelson
Copy link
Author

@sakthishanmugam02 here ya go: sensay-nelson#1

@sakthishanmugam02
Copy link

sakthishanmugam02 commented Feb 3, 2019

@sensay-nelson when i try to deploy the configuration,
kubectl get deploy -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
coredns 2/2 2 2 140m
kube-state-metrics 0/1 0 0 38m
metrics-server 1/1 1 1 57m
prometheus 0/1 0 0 34m

@sensay-nelson
Copy link
Author

sensay-nelson commented Feb 3, 2019

@sakthishanmugam02 did you create the service account first? what does kubectl -n kube-system describe pod <pod> tell you the issue is?

@sakthishanmugam02
Copy link

@sensay-nelson
i dont see any pod running for promotheus and kube-state-metrics
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-6gkg4 1/1 Running 0 152m
coredns-86c58d9df4-xngh7 1/1 Running 0 152m
etcd-hwim-perf-test 1/1 Running 0 151m
kube-apiserver-hwim-perf-test 1/1 Running 0 58m
kube-controller-manager-hwim-perf-test 1/1 Running 2 151m
kube-flannel-ds-amd64-ndhkk 1/1 Running 0 131m
kube-proxy-hw2fs 1/1 Running 0 152m
kube-scheduler-hwim-perf-test 1/1 Running 2 151m
metrics-server-68d85f76bb-db22h 1/1 Running 0 60m
node-exporter-mvc2c 1/1 Running 0 48m

@sakthishanmugam02
Copy link

sakthishanmugam02 commented Feb 3, 2019

@sensay-nelson output of describe deploy:

root@hwim-perf-test:~/prom-config# kubectl describe deploy kube-state-metrics -n kube-system
Name: kube-state-metrics
Namespace: kube-system
CreationTimestamp: Sun, 03 Feb 2019 13:56:01 +0000
Labels: grafanak8sapp=true
k8s-app=kube-state-metrics
Annotations: deployment.kubernetes.io/revision: 1
Selector: grafanak8sapp=true,k8s-app=kube-state-metrics
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: grafanak8sapp=true
k8s-app=kube-state-metrics
Service Account: prometheus
Containers:
kube-state-metrics:
Image: quay.io/coreos/kube-state-metrics:v1.1.0
Port: 8080/TCP
Host Port: 0/TCP
Readiness: http-get http://:8080/healthz delay=5s timeout=5s period=10s #success=1 #failure=3
Environment:
Mounts:
Volumes:
Conditions:
Type Status Reason


Progressing True NewReplicaSetCreated
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
OldReplicaSets:
NewReplicaSet: kube-state-metrics-6bdd878bd7 (0/1 replicas created)
Events:
Type Reason Age From Message


Normal ScalingReplicaSet 44s deployment-controller Scaled up replica set kube-state-metrics-6bdd878bd7 to 1

@sensay-nelson
Copy link
Author

sensay-nelson commented Feb 3, 2019

try checking the issue with the replicaset i guess. isnt kube-state-metrics-6bdd878bd7 a pod id?

@sakthishanmugam02
Copy link

kube-state-metrics-6bdd878bd7

no pod is not listing as part of kubectl get pods -n kube-system

@sakthishanmugam02
Copy link

@sensay-nelson one update:
i will check further; seems some permission issue
kubectl get events -n kube-system -w
LAST SEEN TYPE REASON KIND MESSAGE
19m Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found
61s Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found
3s Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found
6m29s Normal ScalingReplicaSet Deployment Scaled up replica set kube-state-metrics-6bdd878bd7 to 1
8s Normal ScalingReplicaSet Deployment Scaled up replica set kube-state-metrics-6bdd878bd7 to 1
0s Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found

@sakthishanmugam02
Copy link

i got it working; since namespace not specified in service account, it created in default; updated the namespace to kube-system; pod deployed @sensay-nelson thanks for your support

@sakthishanmugam02
Copy link

@sensay-nelson now prometheus server is up and running; how to configure grafana dashboard; i am getting Bad HTTP gateway error

@sakthishanmugam02
Copy link

sakthishanmugam02 commented Feb 3, 2019

@sensay-nelson how to setup data source and cluster? detailed steps please

@sensay-nelson
Copy link
Author

sensay-nelson commented Feb 4, 2019 via email

@sakthishanmugam02
Copy link

sakthishanmugam02 commented Feb 4, 2019

http://docs.grafana.org/features/datasources/prometheus/
On Mon, Feb 4, 2019 at 2:56 AM sakthishanmugam02 @.***> wrote: @sensay-nelson https://github.com/sensay-nelson how to setup data source ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#58 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AiYw9PLRydodeiF718tNmYZ07Dsrz6p0ks5vJz7SgaJpZM4aVKo7 .
-- Sense Director of Technical Operations nelson@makesense.com 818-220-1300

I added data source as prometheus with :30690 NodePort ip;
selected CA auth: and gave the certificate details what is there in promtheus configmap; and skip TLS verify (without these 2 option also tried)
message: Data source is working

Setup the New cluster and have chosen the created data source; but there are no metrics; all metrics shows as NA and no node and namesapace details are listing; Unexpected error pop-up came in between

also following pop up
Templating init failed
Cannot read property 'length' of undefined

@sakthishanmugam02
Copy link

sakthishanmugam02 commented Feb 4, 2019

@sensay-nelson update: some progress;

I am able to see metrics now;
but no pod level metrics... any idea?
I 'm running cluster using kubeadm; your previous post explained something about kubeadm and kops; could you pls elaborate?
image

@sakthishanmugam02
Copy link

thanks a lot; i have changed the configmap 'replacement' properties; it started working :)

@sakthishanmugam02
Copy link

@sensay-nelson one clarification; pod level metrics slightly higher than kubectl top command? what could be the reason?

@sensay-nelson
Copy link
Author

different data sources will produce different values

@BenRomberg
Copy link

@sensay-nelson thank you so much for your detailed writeup! Would've given up long ago if it weren't for this thread!

I wanted to add something since this is the most comprehensive thread when setting up the Grafana Kubernetes App, maybe it helps anyone. For people using Grafana Cloud, you can easily make Prometheus write to their hosted Prometheus endpoint by configuring remote_write on the prometheus.yml:

    remote_write:
    - url: https://prometheus-us-central1.grafana.net/api/prom/push
      basic_auth:
        username: 7313
        password: <Your API token>

This way, you don't have to store any metrics in the cluster and don't have to add a data source for each cluster.

@i5okie
Copy link

i5okie commented Aug 22, 2019

I've got a cluster I've created with kops.

Grafana:

  • enabled kubernetes plugin

  • add prometheus data source and point it to your cluster one

  • add new cluster

  • copy decrypted client certs from ~/.kube/config (I'd recommend creating a new user with read-only permissions); paste them into client cert and client key fields.

  • copy basic-auth credentials and paste them into basic auth fields
    image

  • do NOT deploy anything.

  • on the bottom of Add a new cluster page, expand the Configuring Prometheus tab and copy the configs to use in following steps
    image

Cluster:

I've installed prometheus-operator with the helm chart (stable).
Prometheus is exposed with an ingress rule.

To get this kubernetes plugin working, modify your values.yaml file.
Add additionalScrapeConfigs, and paste the configs below like this:

prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
       - job_name: 'kubernetes-kubelet'
...

The only thing that isn't working so far is the Nodes dashboard...

@i5okie
Copy link

i5okie commented Sep 5, 2019

Update:
This also works with Rancher cluster-monitoring, without separate prometheus-operator.
Adding additional scrape configs is not documented.

  • Click on the cluster-monitoring app and upgrade.
  • Click to edit the answers as yaml.

additionalScrapeConfigs go in there like this:

prometheus:
  additionalScrapeConfigs:
     - job_name: 'kubernetes-kubelet'
...

tip: if the app fails to install and you see giant list of issues in red.. it's probably the yaml.
edit it and remove double quotes from every enabled statement.. enabled: true or false.. helm wants these as bool and not string.

Unfortunately, node metrics still don't work for me even though kube-state-metrics is installed by cluster-monitoring app.

If you have EKS clusters and Rancher, then this works too (aside from node metrics dashboard). Take url/credentials from Rancher's kubeconfig file for each cluster.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants