Skip to content

Latest commit

 

History

History
419 lines (361 loc) · 13.8 KB

README.md

File metadata and controls

419 lines (361 loc) · 13.8 KB

Kubernetes Inventory Input Plugin

This plugin generates metrics derived from the state of the following Kubernetes resources:

  • daemonsets
  • deployments
  • endpoints
  • ingress
  • nodes
  • persistentvolumes
  • persistentvolumeclaims
  • pods (containers)
  • services
  • statefulsets
  • resourcequotas

Kubernetes is a fast moving project, with a new minor release every 3 months. As such, we will aim to maintain support only for versions that are supported by the major cloud providers; this is roughly 4 release / 2 years.

This plugin supports Kubernetes 1.11 and later.

Series Cardinality Warning

This plugin may produce a high number of series which, when not controlled for, will cause high load on your database. Use the following techniques to avoid cardinality issues:

Global configuration options

In addition to the plugin-specific configuration settings, plugins support additional global and plugin configuration settings. These settings are used to modify metrics, tags, and field or create aliases and configure ordering, etc. See the CONFIGURATION.md for more details.

Configuration

# Read metrics from the Kubernetes api
[[inputs.kube_inventory]]
  ## URL for the Kubernetes API.
  ## If empty in-cluster config with POD's service account token will be used.
  # url = ""

  ## URL for the kubelet, if set it will be used to collect the pods resource metrics
  # url_kubelet = "http://127.0.0.1:10255"

  ## Namespace to use. Set to "" to use all namespaces.
  # namespace = "default"

  ## Node name to filter to. No filtering by default.
  # node_name = ""

  ## Use bearer token for authorization.
  ## Ignored if url is empty and in-cluster config is used.
  # bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"

  ## Set response_timeout (default 5 seconds)
  # response_timeout = "5s"

  ## Optional Resources to exclude from gathering
  ## Leave them with blank with try to gather everything available.
  ## Values can be - "daemonsets", deployments", "endpoints", "ingress",
  ## "nodes", "persistentvolumes", "persistentvolumeclaims", "pods", "services",
  ## "statefulsets"
  # resource_exclude = [ "deployments", "nodes", "statefulsets" ]

  ## Optional Resources to include when gathering
  ## Overrides resource_exclude if both set.
  # resource_include = [ "deployments", "nodes", "statefulsets" ]

  ## selectors to include and exclude as tags.  Globs accepted.
  ## Note that an empty array for both will include all selectors as tags
  ## selector_exclude overrides selector_include if both set.
  # selector_include = []
  # selector_exclude = ["*"]

  ## Optional TLS Config
  ## Trusted root certificates for server
  # tls_ca = "/path/to/cafile"
  ## Used for TLS client certificate authentication
  # tls_cert = "/path/to/certfile"
  ## Used for TLS client certificate authentication
  # tls_key = "/path/to/keyfile"
  ## Send the specified TLS server name via SNI
  # tls_server_name = "kubernetes.example.com"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## Uncomment to remove deprecated metrics.
  # fieldexclude = ["terminated_reason"]

Kubernetes Permissions

If using RBAC authorization, you will need to create a cluster role to list "persistentvolumes" and "nodes". You will then need to make an aggregated ClusterRole that will eventually be bound to a user or group.

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:cluster:viewer
  labels:
    rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes", "nodes"]
    verbs: ["get", "list"]

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:telegraf
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.

Bind the newly created aggregated ClusterRole with the following config file, updating the subjects as needed.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: influx:telegraf:viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: influx:telegraf
subjects:
  - kind: ServiceAccount
    name: telegraf
    namespace: default

Quickstart in k3s

When monitoring k3s server instances one can re-use already generated administration token. This is less secure than using the more restrictive dedicated telegraf user but more convenient to set up.

# replace `telegraf` with the user the telegraf process is running as
$ install -o telegraf -m400 /var/lib/rancher/k3s/server/token /run/telegraf-kubernetes-token
$ install -o telegraf -m400 /var/lib/rancher/k3s/server/tls/client-admin.crt /run/telegraf-kubernetes-cert
$ install -o telegraf -m400 /var/lib/rancher/k3s/server/tls/client-admin.key /run/telegraf-kubernetes-key
[kube_inventory]
bearer_token = "/run/telegraf-kubernetes-token"
tls_cert = "/run/telegraf-kubernetes-cert"
tls_key = "/run/telegraf-kubernetes-key"

Metrics

  • kubernetes_daemonset

    • tags:
      • daemonset_name
      • namespace
      • selector (*varies)
    • fields:
      • generation
      • current_number_scheduled
      • desired_number_scheduled
      • number_available
      • number_misscheduled
      • number_ready
      • number_unavailable
      • updated_number_scheduled
  • kubernetes_deployment

    • tags:
      • deployment_name
      • namespace
      • selector (*varies)
    • fields:
      • replicas_available
      • replicas_unavailable
      • created
  • kubernetes_endpoints

    • tags:
      • endpoint_name
      • namespace
      • hostname
      • node_name
      • port_name
      • port_protocol
      • kind (*varies)
    • fields:
      • created
      • generation
      • ready
      • port
  • kubernetes_ingress

    • tags:
      • ingress_name
      • namespace
      • hostname
      • ip
      • backend_service_name
      • path
      • host
    • fields:
      • created
      • generation
      • backend_service_port
      • tls
  • kubernetes_node

    • tags:
      • node_name
      • status
      • condition
      • cluster_namespace
    • fields:
      • capacity_cpu_cores
      • capacity_millicpu_cores
      • capacity_memory_bytes
      • capacity_pods
      • allocatable_cpu_cores
      • allocatable_millicpu_cores
      • allocatable_memory_bytes
      • allocatable_pods
      • status_condition
      • spec_unschedulable
      • node_count
  • kubernetes_persistentvolume

    • tags:
      • pv_name
      • phase
      • storageclass
    • fields:
  • kubernetes_persistentvolumeclaim

    • tags:
      • pvc_name
      • namespace
      • phase
      • storageclass
      • selector (*varies)
    • fields:
  • kubernetes_pod_container

    • tags:
      • container_name
      • namespace
      • node_name
      • pod_name
      • node_selector (*varies)
      • phase
      • state
      • readiness
      • condition
    • fields:
      • restarts_total
      • state_code
      • state_reason
      • phase_reason
      • terminated_reason (string, deprecated in 1.15: use state_reason instead)
      • resource_requests_millicpu_units
      • resource_requests_memory_bytes
      • resource_limits_millicpu_units
      • resource_limits_memory_bytes
      • status_condition
  • kubernetes_service

    • tags:
      • service_name
      • namespace
      • port_name
      • port_protocol
      • external_name
      • cluster_ip
      • selector (*varies)
    • fields
      • created
      • generation
      • port
      • target_port
  • kubernetes_statefulset

    • tags:
      • statefulset_name
      • namespace
      • selector (*varies)
    • fields:
      • created
      • generation
      • replicas
      • replicas_current
      • replicas_ready
      • replicas_updated
      • spec_replicas
      • observed_generation
  • kubernetes_resourcequota

    • tags:
      • resource
      • namespace
    • fields:
      • hard_cpu_limits
      • hard_cpu_requests
      • hard_memory_limit
      • hard_memory_requests
      • hard_pods
      • used_cpu_limits
      • used_cpu_requests
      • used_memory_limits
      • used_memory_requests
      • used_pods
  • kubernetes_certificate

    • tags:
      • common_name
      • signature_algorithm
      • public_key_algorithm
      • issuer_common_name
      • san
      • verification
      • name
      • namespace
    • fields:
      • age
      • expiry
      • startdate
      • enddate
      • verification_code

kubernetes node status status

The node status ready can mean 3 different values.

Tag value Corresponding field value Meaning
ready 0 NotReady
ready 1 Ready
ready 2 Unknown

pv phase_type

The persistentvolume "phase" is saved in the phase tag with a correlated numeric field called phase_type corresponding with that tag value.

Tag value Corresponding field value
bound 0
failed 1
pending 2
released 3
available 4
unknown 5

pvc phase_type

The persistentvolumeclaim "phase" is saved in the phase tag with a correlated numeric field called phase_type corresponding with that tag value.

Tag value Corresponding field value
bound 0
lost 1
pending 2
unknown 3

Example Output

kubernetes_configmap,configmap_name=envoy-config,namespace=default,resource_version=56593031 created=1544103867000000000i 1547597616000000000
kubernetes_daemonset,daemonset_name=telegraf,selector_select1=s1,namespace=logging number_unavailable=0i,desired_number_scheduled=11i,number_available=11i,number_misscheduled=8i,number_ready=11i,updated_number_scheduled=11i,created=1527758699000000000i,generation=16i,current_number_scheduled=11i 1547597616000000000
kubernetes_deployment,deployment_name=deployd,selector_select1=s1,namespace=default replicas_unavailable=0i,created=1544103082000000000i,replicas_available=1i 1547597616000000000
kubernetes_node,host=vjain node_count=8i 1628918652000000000
kubernetes_node,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True status_condition=1i 1629177980000000000
kubernetes_node,cluster_namespace=tools,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True allocatable_cpu_cores=4i,allocatable_memory_bytes=7186567168i,allocatable_millicpu_cores=4000i,allocatable_pods=110i,capacity_cpu_cores=4i,capacity_memory_bytes=7291424768i,capacity_millicpu_cores=4000i,capacity_pods=110i,spec_unschedulable=0i,status_condition=1i 1628918652000000000
kubernetes_resourcequota,host=vjain,namespace=default,resource=pods-high hard_cpu=1000i,hard_memory=214748364800i,hard_pods=10i,used_cpu=0i,used_memory=0i,used_pods=0i 1629110393000000000
kubernetes_resourcequota,host=vjain,namespace=default,resource=pods-low hard_cpu=5i,hard_memory=10737418240i,hard_pods=10i,used_cpu=0i,used_memory=0i,used_pods=0i 1629110393000000000
kubernetes_persistentvolume,phase=Released,pv_name=pvc-aaaaaaaa-bbbb-cccc-1111-222222222222,storageclass=ebs-1-retain phase_type=3i 1547597616000000000
kubernetes_persistentvolumeclaim,namespace=default,phase=Bound,pvc_name=data-etcd-0,selector_select1=s1,storageclass=ebs-1-retain phase_type=0i 1547597615000000000
kubernetes_pod,namespace=default,node_name=ip-172-17-0-2.internal,pod_name=tick1 last_transition_time=1547578322000000000i,ready="false" 1547597616000000000
kubernetes_service,cluster_ip=172.29.61.80,namespace=redis-cache-0001,port_name=redis,port_protocol=TCP,selector_app=myapp,selector_io.kompose.service=redis,selector_role=slave,service_name=redis-slave created=1588690034000000000i,generation=0i,port=6379i,target_port=0i 1547597616000000000
kubernetes_pod_container,condition=Ready,host=vjain,pod_name=uefi-5997f76f69-xzljt,status=True status_condition=1i 1629177981000000000
kubernetes_pod_container,container_name=telegraf,namespace=default,node_name=ip-172-17-0-2.internal,node_selector_node-role.kubernetes.io/compute=true,pod_name=tick1,phase=Running,state=running,readiness=ready resource_requests_cpu_units=0.1,resource_limits_memory_bytes=524288000,resource_limits_cpu_units=0.5,restarts_total=0i,state_code=0i,state_reason="",phase_reason="",resource_requests_memory_bytes=524288000 1547597616000000000
kubernetes_statefulset,namespace=default,selector_select1=s1,statefulset_name=etcd replicas_updated=3i,spec_replicas=3i,observed_generation=1i,created=1544101669000000000i,generation=1i,replicas=3i,replicas_current=3i,replicas_ready=3i 1547597616000000000