Skip to content

Commit

Permalink
Configure Prometheus to scrape Kubelets directly
Browse files Browse the repository at this point in the history
* Use Kubelet bearer token authn/authz to scrape metrics
* Drop RBAC permission from nodes/proxy to nodes/metrics
* Stop proxying kubelet scrapes through the apiserver, since
this required higher privilege (nodes/proxy) and can add
load to the apiserver on large clusters
  • Loading branch information
dghubble committed May 15, 2018
1 parent 37981f9 commit 1190dea
Show file tree
Hide file tree
Showing 7 changed files with 39 additions and 37 deletions.
3 changes: 3 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ Notable changes between versions.
#### Addons

* Fix Prometheus data directory location ([#203](https://github.com/poseidon/typhoon/pull/203))
* Configure Prometheus to scrape Kubelets directly with bearer token auth instead of proxying through the apiserver ([#217](https://github.com/poseidon/typhoon/pull/217))
* Security improvement: Drop RBAC permission from `nodes/proxy` to `nodes/metrics`
* Performance: Remove proxied scrape load from the apiserver
* Update Grafana from v5.04 to v5.1.2 ([#208](https://github.com/poseidon/typhoon/pull/208))
* Disable Grafana Google Analytics by default ([#214](https://github.com/poseidon/typhoon/issues/214))

Expand Down
43 changes: 11 additions & 32 deletions addons/prometheus/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,63 +56,42 @@ data:
target_label: job
# Scrape config for node (i.e. kubelet) /metrics (e.g. 'kubelet_'). Explore
# metrics from a node by scraping kubelet (127.0.0.1:10255/metrics).
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
# metrics from a node by scraping kubelet (127.0.0.1:10250/metrics).
- job_name: 'kubelet'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# Kubelet certs don't have any fixed IP SANs
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor. Explore metrics from a node by
# scraping kubelet (127.0.0.1:10255/metrics/cadvisor).
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
# scraping kubelet (127.0.0.1:10250/metrics/cadvisor).
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
# Kubelet certs don't have any fixed IP SANs
insecure_skip_verify: true
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# Scrap etcd metrics from controllers
# Scrap etcd metrics from controllers via listen-metrics-urls
- job_name: 'etcd'
kubernetes_sd_configs:
- role: node
Expand Down
2 changes: 1 addition & 1 deletion addons/prometheus/rbac/cluster-role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
Expand Down
10 changes: 10 additions & 0 deletions aws/container-linux/kubernetes/security.tf
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,16 @@ resource "aws_security_group_rule" "controller-node-exporter" {
source_security_group_id = "${aws_security_group.worker.id}"
}

resource "aws_security_group_rule" "controller-kubelet" {
security_group_id = "${aws_security_group.controller.id}"

type = "ingress"
protocol = "tcp"
from_port = 10250
to_port = 10250
source_security_group_id = "${aws_security_group.worker.id}"
}

resource "aws_security_group_rule" "controller-kubelet-self" {
security_group_id = "${aws_security_group.controller.id}"

Expand Down
10 changes: 10 additions & 0 deletions aws/fedora-atomic/kubernetes/security.tf
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,16 @@ resource "aws_security_group_rule" "controller-node-exporter" {
source_security_group_id = "${aws_security_group.worker.id}"
}

resource "aws_security_group_rule" "controller-kubelet" {
security_group_id = "${aws_security_group.controller.id}"

type = "ingress"
protocol = "tcp"
from_port = 10250
to_port = 10250
source_security_group_id = "${aws_security_group.worker.id}"
}

resource "aws_security_group_rule" "controller-kubelet-self" {
security_group_id = "${aws_security_group.controller.id}"

Expand Down
4 changes: 2 additions & 2 deletions google-cloud/container-linux/kubernetes/network.tf
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ resource "google_compute_firewall" "internal-node-exporter" {
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}

# kubelet API to allow kubectl exec and log
# kubelet API to allow apiserver exec and log or metrics scraping
resource "google_compute_firewall" "internal-kubelet" {
name = "${var.cluster_name}-internal-kubelet"
network = "${google_compute_network.network.name}"
Expand All @@ -131,7 +131,7 @@ resource "google_compute_firewall" "internal-kubelet" {
ports = [10250]
}

source_tags = ["${var.cluster_name}-controller"]
source_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}

Expand Down
4 changes: 2 additions & 2 deletions google-cloud/fedora-atomic/kubernetes/network.tf
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ resource "google_compute_firewall" "internal-node-exporter" {
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}

# kubelet API to allow kubectl exec and log
# kubelet API to allow apiserver exec and log or metrics scraping
resource "google_compute_firewall" "internal-kubelet" {
name = "${var.cluster_name}-internal-kubelet"
network = "${google_compute_network.network.name}"
Expand All @@ -131,7 +131,7 @@ resource "google_compute_firewall" "internal-kubelet" {
ports = [10250]
}

source_tags = ["${var.cluster_name}-controller"]
source_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
target_tags = ["${var.cluster_name}-controller", "${var.cluster_name}-worker"]
}

Expand Down

0 comments on commit 1190dea

Please sign in to comment.