kubelet tracing

Signed-off-by: Sally O'Malley <somalley@redhat.com>
kubernetes · Jul 21, 2021 · 1bf3d00 · 1bf3d00
1 parent 9d12734
commit 1bf3d00
Show file tree

Hide file tree

Showing 2 changed files with 360 additions and 0 deletions.
diff --git a/keps/sig-instrumentation/2831-kubelet-tracing/README.md b/keps/sig-instrumentation/2831-kubelet-tracing/README.md
@@ -0,0 +1,329 @@
+# KEP-2831: Kubelet Tracing
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Definitions](#definitions)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [User Stories](#user-stories)
+    - [Continuous Trace Collection](#continuous-trace-collection)
+  - [Tracing Requests and Exporting Spans](#tracing-requests-and-exporting-spans)
+  - [Running the OpenTelemetry Collector](#running-the-opentelemetry-collector)
+  - [Kubelet Configuration](#kubelet-configuration)
+  - [Test Plan](#test-plan)
+- [Graduation Requirements](#graduation-requirements)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+- [Alternatives Considered](#alternatives-considered)
+  - [Other OpenTelemetry Exporters](#other-opentelemetry-exporters)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
+- [ ] (R) Graduation criteria is in place
+- [ ] (R) Production readiness review completed
+- [ ] Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+## Summary
+
+This Kubernetes Enhancement Proposal (KEP) proposes enhancing the kubelet to allow tracing gRPC and HTTP API requests. 
+It proposes using OpenTelemetry libraries, and exports in the OpenTelemetry format. This is in line with the 
+[API Server enhancement](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/647-apiserver-tracing).
+
+## Motivation
+
+Along with metrics and logs, traces are a useful form of telemetry to aid with debugging incoming requests.
+The kubelet can make use of distributed tracing to improve the ease of use and enable easier analysis of trace data.
+Trace data is structured, providing the detail necessary to debug requests across service boundaries.
+As more core components are instrumented, Kubernetes becomes easier to monitor, manage, and troubleshoot.
+
+### Definitions
+
+**Span**: The smallest unit of a trace.  It has a start and end time. Spans are the building blocks of a trace.
+**Trace**: A collection of Spans which represents work being done by a system. A record of the path of requests through a system.
+**Trace Context**: A reference to a Trace that is designed to be propagated across component boundaries.
+
+### Goals
+
+* The kubelet generates and exports spans for incoming and outgoing requests.
+* The kubelet propagates context from incoming requests to outgoing requests.
+
+### Non-Goals
+
+* Tracing in kubernetes controllers
+* Replace existing logging, metrics
+* Change metrics or logging (e.g. to support trace-metric correlation)
+* Access control to tracing backends
+* Add tracing to components outside kubernetes (e.g. etcd client library).
+
+## Proposal
+
+### User Stories
+
+Since this feature is for diagnosing problems with the kubelet, it is targeted at Cluster Operators and Cloud Vendors that manage kubernetes control-planes.
+
+For the following use-cases, I can deploy an OpenTelemetry collector agent as a DaemonSet to collect kubelet trace data from each node's host network. Then, I can deploy a single OpenTelemetry collector to consolidate the kubelet traces. From there, OpenTelemetry trace data can be exported to a tracing backend of choice.  I can use the `EnableOtelTracing` boolean to enable trace exports from kubelet service. I can use the `OpenTelemetryConfig` to configure a trace service name other than the node hostname and the tracing collector port if the OpenTelemetry collector is listening on a different port than the default `0.0.0.0:4317`.
+
+#### Continuous Trace Collection
+
+As a cluster operator or cloud provider, I would like to collect gRPC and HTTP trace data from the API server to the kubelet to help debug control-plane problems. Depending on the symptoms I need to debug, I can search span metadata to find a trace which displays the symptoms I am looking to debug.  The sampling rate for trace exports can be configured based on my needs. I can collect each node's kubelet trace data as distinct tracing services to diagnose node issues.
+
+### Tracing Requests and Exporting Spans
+
+We will instrument the kubelet's [gRPC server](https://github.com/kubernetes/kubernetes/blob/release-1.21/pkg/kubelet/server/server.go#L190#L192) to intercept gRPC calls and export spans. Also, the http server and http clients will be wrapped with [otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/v0.21.0/instrumentation/net/http) to generate spans for sampled incoming requests and propagate context with client requests. The gRPC client in the kubelet's Container Runtime Interface (CRI) [Remote Runtime Service](https://github.com/kubernetes/kubernetes/blob/release-1.21/pkg/kubelet/cri/remote/remote_runtime.go) and [the CRI streaming package](https://github.com/kubernetes/kubernetes/tree/release-1.21/pkg/kubelet/cri/streaming) will be instrumented to export and propagate trace data. The [Go implementation of OpenTelemetry](https://github.com/open-telemetry/opentelemetry-go) will be used. An [OTLP exporter](https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracegrpc/exporter.go) and an [OTLP trace provider](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/provider.go) along with a [context propagator](https://opentelemetry.io/docs/go/instrumentation/#propagators-and-context) will be configured.
+
+OpenTelemetry-Go provides the [propagation package](https://github.com/open-telemetry/opentelemetry-go/blob/main/propagation/propagation.go) with which you can add custom key-value pairs known as [baggage](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md). Baggage data will be propagated across services within contexts.
+
+### Running the OpenTelemetry Collector
+
+The [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) can be run as a sidecar, a daemonset, a deployment , or a combination in which the daemonset buffers telemetry and forwards to the deployment for aggregation (e.g. tail-base sampling) and routing to a telemetry backend.  To support these various setups, the kubelet should be able to send traffic either to a local (on the control plane network) collector, or to a cluster service (on the cluster network).
+
+### Kubelet Configuration
+
+```golang
+// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
+// KubeletConfiguration contains the configuration for the Kubelet
+type KubeletConfiguration struct {
+    metav1.TypeMeta
+    ------------
+    // EnableOtelTracing enables export of opentelemetry traces
+    // +optional
+    EnableOtelTracing bool
+    // OpenTelemetryConfig
+    // +optional
+    OpenTelemetryConfig OpenTelemetryConfig
+}
+
+// OpenTelemetryConfig specifies configuration for opentelemetry tracing
+type OpenTelemetryConfig struct {
+    // +optional
+    // The name of the tracing service
+    // Defaults to node hostname. This results in a tracing service for each node.
+    TracingServiceName string
+
+    // +optional
+    // The port of the collector agent running on the node
+    // Defaults to 0.0.0.0:4317
+    CollectorEndpoint  string
+
+    // +optional
+    // SamplingRatePerMillion is the number of samples to collect per million spans.
+    // Defaults to 0.
+    SamplingRatePerMillion *int32
+}
+```
+
+### Test Plan
+
+We will test tracing added by this feature with an integration test.  The
+integration test will verify that spans exported by the kubelet match what is
+expected from the request.
+
+## Graduation Requirements
+
+Alpha
+
+- [] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet
+- [] Integration testing of tracing
+
+Beta
+
+- [] Tracing 100% of requests does not break scalability tests (this does not necessarily mean trace backends can handle all the data).
+- [] OpenTelemetry reaches GA
+- [] Publish examples of how to use the OT Collector with kubernetes
+- [] Allow time for feedback
+- [] Revisit the format used to export spans.
+
+GA
+
+## Production Readiness Review Questionnaire
+
+### Feature Enablement and Rollback
+
+* **How can this feature be enabled / disabled in a live cluster?**
+  - [X] Feature gate (also fill in values in `kep.yaml`)
+    - Feature gate name: OpenTelemetryTracing
+    - Components depending on the feature gate: kubelet
+  - [X] Other
+    - Describe the mechanism: Kubelet Configuration.
+    - Will enabling / disabling the feature require downtime of the control
+      plane?  No. It will require restarting the kubelet service per node.
+    - Will enabling / disabling the feature require downtime or reprovisioning
+      of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). No.
+
+* **Does enabling the feature change any default behavior?**
+  No. The feature is disabled unlesss the feature gate is enabled and the Kubelet Configuration EnableOtelTracing is set to true.  When the feature is enabled, it doesn't change behavior from the users' perspective; it only adds tracing telemetry.
+
+* **Can the feature be disabled once it has been enabled (i.e. can we roll back
+  the enablement)?**
+  Yes.
+
+* **What happens if we reenable the feature if it was previously rolled back?**
+  It will start generating and exporting traces again.
+
+* **Are there any tests for feature enablement/disablement?**
+  Unit tests switching feature gates will be added.
+
+### Rollout, Upgrade and Rollback Planning
+
+_This section must be completed when targeting beta graduation to a release._
+
+* **How can a rollout fail? Can it impact already running workloads?**
+  Try to be as paranoid as possible - e.g., what if some components will restart
+   mid-rollout?
+
+* **What specific metrics should inform a rollback?**
+
+* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
+  Describe manual testing that was done and the outcomes.
+  Longer term, we may want to require automated upgrade/rollback tests, but we
+  are missing a bunch of machinery and tooling and can't do that now.
+
+* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, 
+fields of API types, flags, etc.?**
+  Even if applying deprecation policies, they may still surprise some users.
+
+### Monitoring Requirements
+
+_This section must be completed when targeting beta graduation to a release._
+
+* **How can an operator determine if the feature is in use by workloads?**
+  Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
+  checking if there are objects with field X set) may be a last resort. Avoid
+  logs or events for this purpose.
+
+* **What are the SLIs (Service Level Indicators) an operator can use to determine 
+the health of the service?**
+  - [ ] Metrics
+    - Metric name:
+    - [Optional] Aggregation method:
+    - Components exposing the metric:
+  - [ ] Other (treat as last resort)
+    - Details:
+
+* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
+  At a high level, this usually will be in the form of "high percentile of SLI
+  per day <= X". It's impossible to provide comprehensive guidance, but at the very
+  high level (needs more precise definitions) those may be things like:
+  - per-day percentage of API calls finishing with 5XX errors <= 1%
+  - 99% percentile over day of absolute value from (job creation time minus expected
+    job creation time) for cron job <= 10%
+  - 99,9% of /health requests per day finish with 200 code
+
+* **Are there any missing metrics that would be useful to have to improve observability 
+of this feature?**
+  Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
+  implementation difficulties, etc.).
+
+### Dependencies
+
+_This section must be completed when targeting beta graduation to a release._
+
+* **Does this feature depend on any specific services running in the cluster?**
+  Think about both cluster-level services (e.g. metrics-server) as well
+  as node-level agents (e.g. specific version of CRI). Focus on external or
+  optional services that are needed. For example, if this feature depends on
+  a cloud provider API, or upon an external software-defined storage or network
+  control plane.
+
+  For each of these, fill in the following—thinking about running existing user workloads
+  and creating new ones, as well as about cluster-level services (e.g. DNS):
+  - [Dependency name]
+    - Usage description:
+      - Impact of its outage on the feature:
+      - Impact of its degraded performance or high-error rates on the feature:
+
+
+### Scalability
+
+_For alpha, this section is encouraged: reviewers should consider these questions
+and attempt to answer them._
+
+_For beta, this section is required: reviewers must answer these questions._
+
+_For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field._
+
+* **Will enabling / using this feature result in any new API calls?**
+  This will not add any additional API calls.
+
+* **Will enabling / using this feature result in introducing new API types?**
+  This will introduce an API type for the configuration.  This is only for
+  loading configuration, users cannot create these objects.
+
+* **Will enabling / using this feature result in any new calls to the cloud 
+provider?**
+  Not directly.  Cloud providers could choose to send traces to their managed
+  trace backends, but this requires them to set up a telemetry pipeline as
+  described above.
+
+* **Will enabling / using this feature result in increasing size or count of 
+the existing API objects?**
+  No.
+
+* **Will enabling / using this feature result in increasing time taken by any 
+operations covered by [existing SLIs/SLOs]?**
+  It will increase API Server request latency by a negligible amount (<1 microsecond)
+  for encoding and decoding the trace contex from headers, and recording spans
+  in memory. Exporting spans is not in the critical path.
+
+* **Will enabling / using this feature result in non-negligible increase of 
+resource usage (CPU, RAM, disk, IO, ...) in any components?**
+  The tracing client library has a small, in-memory cache for outgoing spans.
+
+### Troubleshooting
+
+The Troubleshooting section currently serves the `Playbook` role. We may consider
+splitting it into a dedicated `Playbook` document (potentially with some monitoring
+details). For now, we leave it here.
+
+_This section must be completed when targeting beta graduation to a release._
+
+* **How does this feature react if the API server and/or etcd is unavailable?**
+
+* **What are other known failure modes?**
+  For each of them, fill in the following information by copying the below template:
+  - [Failure mode brief description]
+    - Detection: How can it be detected via metrics? Stated another way:
+      how can an operator troubleshoot without logging into a master or worker node?
+    - Mitigations: What can be done to stop the bleeding, especially for already
+      running user workloads?
+    - Diagnostics: What are the useful log messages and their required logging
+      levels that could help debug the issue?
+      Not required until feature graduated to beta.
+    - Testing: Are there any tests for failure mode? If not, describe why.
+
+* **What steps should be taken if SLOs are not being met to determine the problem?**
+
+[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
+[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
+
+## Implementation History
+
+## Alternatives Considered
+
+### Other OpenTelemetry Exporters
+
+This KEP suggests that we utilize the OpenTelemetry exporter format in all components.  Alternative options include:
+
+1. Add configuration for many exporters in-tree by vendoring multiple "supported" exporters. These exporters are the only compatible backends for tracing in kubernetes.
+  a. This places the kubernetes community in the position of curating supported tracing backends
+2. Support *both* a curated set of in-tree exporters, and the collector exporter
diff --git a/keps/sig-instrumentation/2831-kubelet-tracing/kep.yaml b/keps/sig-instrumentation/2831-kubelet-tracing/kep.yaml
@@ -0,0 +1,31 @@
+title: Kubelet OpenTelemetry Tracing
+kep-number: 2831
+authors:
+  - "@husky-parul"
+  - "@somalley"
+owning-sig: sig-instrumentation
+participating-sigs:
+  - sig-architecture
+  - sig-api-machinery
+  - sig-scalability
+status: provisional
+creation-date: 2021-07-21
+reviewers:
+  - "@dashpole"
+  - "TBD"
+approvers:
+  - "@dashpole"
+  - "TBD"
+see-also:
+replaces:
+stage: alpha
+last-updated: 2021-07-21
+latest-milestone:
+milestone:
+  alpha:
+feature-gates:
+  - name: KubeletTracing
+    components:
+      - kubelet
+disable-supported: true
+metrics: