-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Core Metrics in Kubelet #252
Changes from 30 commits
d411d57
3bad120
1e69459
d0aaf9a
29a4bf8
25abf52
11ea97f
13eee17
919764e
04dc0aa
fdcfed7
40a60f2
d030ef9
54eaae8
53c3b18
2f1d746
b059fd4
9e87a2f
a106dd6
da190b6
6cf837e
d9fca71
5f636c3
2d5ebbb
005669d
cec5de5
ea0e796
38e2036
15e80b4
6c43ae8
33160be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
# Core Metrics in kubelet | ||
|
||
**Author**: David Ashpole (@dashpole) | ||
|
||
**Last Updated**: 1/19/2017 | ||
|
||
**Status**: Proposal | ||
|
||
This document proposes a design for the set of metrics included in an eventual Core Metrics Pipeline. | ||
|
||
<!-- BEGIN MUNGE: GENERATED_TOC --> | ||
|
||
- [Core Metrics in kubelet](#core-metrics-in-kubelet) | ||
- [Introduction](#introduction) | ||
- [Definitions](#definitions) | ||
- [Background](#background) | ||
- [Motivations](#motivations) | ||
- [Proposal](#proposal) | ||
- [Non Goals](#non-goals) | ||
- [Design](#design) | ||
- [Metric Requirements:](#metric-requirements) | ||
- [Proposed Core Metrics:](#proposed-core-metrics) | ||
- [On-Demand Design:](#on-demand-design) | ||
- [Implementation Plan](#implementation-plan) | ||
- [Rollout Plan](#rollout-plan) | ||
- [Implementation Status](#implementation-status) | ||
|
||
<!-- END MUNGE: GENERATED_TOC --> | ||
|
||
## Introduction | ||
|
||
### Definitions | ||
"Kubelet": The daemon that runs on every kubernetes node and controls pod and container lifecycle, among many other things. | ||
["cAdvisor":](https://github.com/google/cadvisor) An open source container monitoring solution which only monitors containers, and has no concept of kubernetes constructs like pods or volumes. | ||
["Summary API":](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go) A kubelet API which currently exposes node metrics for use by both system components and monitoring systems. | ||
["CRI":](https://github.com/kubernetes/community/blob/master/contributors/devel/container-runtime-interface.md) The Container Runtime Interface designed to provide an abstraction over runtimes (docker, rkt, etc). | ||
"Core Metrics": A set of metrics described in the [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) whose purpose is to provide metrics for first-class resource isolation and untilization features, including [resource feasibility checking](https://github.com/eBay/Kubernetes/blob/master/docs/design/resources.md#the-resource-model) and node resource management. | ||
"Resource": A consumable element of a node (e.g. memory, disk space, CPU time, etc). | ||
"First-class Resource": A resource critical for scheduling, whose requests and limits can be (or soon will be) set via the Pod/Container Spec. | ||
"Metric": A measure of consumption of a Resource. | ||
|
||
### Background | ||
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal contains a blueprint for a set of metrics referred to as "Core Metrics". The purpose of this proposal is to specify what those metrics are, to enable work relating to the collection, by the kubelet, of the metrics. | ||
|
||
Kubernetes vendors cAdvisor into its codebase, and the kubelet uses cAdvisor as a library that enables it to collect metrics on containers. The kubelet can then combine container-level metrics from cAdvisor with the kubelet's knowledge of kubernetes constructs (e.g. pods) to produce the kubelet Summary statistics, which provides metrics for use by the kubelet, or by users through the [Summary API](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go). cAdvisor works by collecting metrics at an interval (10 seconds, by default), and the kubelet then simply queries these cached metrics whenever it has a need for them. | ||
|
||
Currently, cAdvisor collects a large number of metrics related to system and container performance. However, only some of these metrics are consumed by the kubelet summary API, and many are not used. The kubelet [Summary API](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go) is published to the kubelet summary API endpoint (stats/summary). Some of the metrics provided by the summary API are consumed by kubernetes system components, but many are included for the sole purpose of providing metrics for monitoring. | ||
|
||
### Motivations | ||
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal explains why a separate monitoring pipeline is required. | ||
|
||
By publishing core metrics, the kubelet is relieved of its responsibility to provide metrics for monitoring. | ||
The third party monitoring pipeline also is relieved of any responsibility to provide these metrics to system components. | ||
|
||
cAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benifit from a more "On-Demand" metrics collection design. | ||
|
||
### Proposal | ||
This proposal is to use this set of core metrics, collected by the kubelet, and used solely by kubernetes system components to support "First-Class Resource Isolation and Utilization Features". This proposal is not designed to be an API published by the kubelet, but rather a set of metrics collected by the kubelet that will be transformed, and published in the future. | ||
|
||
The target "Users" of this set of metrics are kubernetes components (though not neccessarily directly). This set of metrics itself is not designed to be user-facing, but is designed to be general enough to support user-facing components. | ||
|
||
### Non Goals | ||
Everything covered in the [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) design doc will not be covered in this proposal. This includes the third party metrics pipeline, and the methods by which the metrics found in this proposal are provided to other kubernetes components. | ||
|
||
Integration with CRI will not be covered in this proposal. In future proposals, integrating with CRI may provide a better abstraction of information required by the core metrics pipeline to collect metrics. | ||
|
||
The kubelet API endpoint, including the format, url pattern, versioning strategy, and name of the API will be the topic of a follow-up proposal to this proposal. | ||
|
||
## Design | ||
This design covers only metrics to be included in the Core Metrics Pipeline. | ||
|
||
High level requirements for the design are as follows: | ||
- The kubelet collects the minimum possible number of metrics to provide "First-Class Resource Isolation and Utilization Features". | ||
- Metrics can be fetched "On Demand", giving the kubelet more up-to-date stats. | ||
|
||
This proposal purposefully omits many metrics that may eventually become core metrics. This is by design. Once metrics are needed to support First-Class Resource Isolation and Utilization Features, they can be added to the core metrics API. | ||
|
||
### Metric Requirements | ||
The core metrics api is designed to provide metrics for "First Class Resource Isolation and Utilization Features" within kubernetes. | ||
|
||
Many kubernetes system components currently support these features. Many more components that support these features are in development. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which features? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Refers to the above list of "use-cases" changed the above to explicitly call them features. |
||
The following is not meant to be an exhaustive list, but gives the current set of use cases for these metrics. | ||
|
||
Metrics requirements for "First Class Resource Isolation and Utilization Features", based on kubernetes component needs, are as follows: | ||
|
||
- Kubelet | ||
- Node-level usage metrics for Filesystems, CPU, and Memory | ||
- Pod-level usage metrics for Filesystems and Memory | ||
- Metrics Server (outlined in [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md)), which exposes the [Resource Metrics API](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-metrics-api.md) to the following system components: | ||
- Scheduler | ||
- Node-level usage metrics for Filesystems, CPU, and Memory | ||
- Pod-level usage metrics for Filesystems, CPU, and Memory | ||
- Container-level usage metrics for Filesystems, CPU, and Memory | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The scheduler uses container level metrics? Or does this just mean as opposed to the metrics consumed by a Pod that aren't accounted to a container? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. It is the vertical pod autoscaler that will need container level metrics. I was basing that off of david oppenheim's scheduling requirements doc |
||
- Horizontal-Pod-Autoscaler | ||
- Node-level usage metrics for CPU and Memory | ||
- Pod-level usage metrics for CPU and Memory | ||
- Cluster Federation | ||
- Node-level usage metrics for Filesystems, CPU, and Memory | ||
- kubectl top and Kubernetes Dashboard | ||
- Node-level usage metrics for Filesystems, CPU, and Memory | ||
- Pod-level usage metrics for Filesystems, CPU, and Memory | ||
- Container-level usage metrics for Filesystems, CPU, and Memory | ||
|
||
### Proposed Core Metrics: | ||
This section defines "usage metrics" for filesystems, CPU, and Memory. | ||
As stated in Non-Goals, this proposal does not attempt to define the specific format by which these are exposed. For convenience, it may be neccessary to include static information such as start time, node capacities for CPU, Memory, or filesystems, and more. | ||
|
||
```go | ||
// CpuUsage holds statistics about the amount of cpu time consumed | ||
type CpuUsage struct { | ||
// The time at which these Metrics were updated. | ||
Timestamp metav1.Time | ||
// Cumulative CPU usage (sum of all cores) since object creation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At some point we needed to add container start time to the summary API to know when this was measured from. I suppose the container ID could be used to cross-reference the ContainerStatus for start time, but we should at least consider whether it should be included here. |
||
CumulativeUsageNanoSeconds *uint64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cumulative cpu usage needs to be accompanied with time window from which it was collected. |
||
} | ||
|
||
// MemoryUsage holds statistics about the quantity of memory consumed | ||
type MemoryUsage struct { | ||
// The time at which these metrics were updated. | ||
Timestamp metav1.Time | ||
// The amount of "working set" memory. This includes recently accessed memory, | ||
// dirty memory, and kernel memory. | ||
UsageBytes *uint64 | ||
} | ||
|
||
// FilesystemUsage holds statistics about the quantity of local storage (e.g. disk) resources consumed | ||
type FilesystemUsage struct { | ||
// The time at which these metrics were updated. | ||
Timestamp metav1.Time | ||
// This must uniquely identify the storage resource that is consumed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unique in what namespace? E.g. will this refer to the device/partition, or the subpath of the container FS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will refer to the device/partition. The namespace is the node. I can update the comment, |
||
StorageIdentifier string | ||
// UsedBytes represents the disk space consumed, in bytes. | ||
UsedBytes *uint64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed from scope of proposal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: +1 For normalizing on "Used" or "Usage" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
// UsedInodes represents the inodes consumed | ||
UsedInodes *uint64 | ||
} | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You talked about gathering stats on demand, but some use cases (e.g. kubectl top) also require regular updates. I think it would be worth adding a section about how this will work. For example, every stat request could include a "required freshness", and pulled from cache if there are recent stats, or fetch new ones if not. |
||
|
||
### On-Demand Design | ||
Interface: | ||
The interface for exposing these metrics within the kubelet contains methods for fetching each relevant metric. These methods contains a "recency" parameter which specifies how recently the metrics must have been computed. Kubelet components which require very up-to-date metrics (eviction, for example), use very low values. Other components use higher values. | ||
|
||
Implementation: | ||
To keep performance bounded while still offering metrics "On-Demand", all calls to get metrics are cached, and a minimum recency is established to prevent repeated metrics computation. Before computing new metrics, the previous metrics are checked to see if they meet the recency requirements of the caller. If the age of the metrics meet the recency requirements, then the cached metrics are returned. If not, then new metrics are computed and cached. | ||
|
||
## Implementation Plan | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implementation plan is rahter poor. You should either expand or remove it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed in favor of a Future Work section. |
||
@dashpole will modify the structure of metrics collection code to be "On-Demand". | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Filesystem metrics are expensive and its better for us to collect them in the background. Collecting them on-demand would mean that we will either exec There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. High level comment: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed references to people |
||
|
||
Suggested, tentative future work, which may be covered by future proposals: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be under section: future improvements rather than implementation plan. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
- Publish these metrics in some form to a kubelet API endpoint | ||
- Obtain all runtime-specific information needed to collect metrics from the CRI. | ||
- Kubernetes can be configured to run a default "third party metrics provider" as a daemonset. Possibly standalone cAdvisor. | ||
|
||
## Rollout Plan | ||
Once this set of metrics is accepted, @dashpole will begin discussions on the format, and design of the endpoint that exposes them. The node resource metrics endpoint (TBD) will be added alongside the current Summary API in an upcoming release. This should allow concurrent developments of other portions of the system metrics pipeline (metrics-server, for example). Once this addition is made, all other changes will be internal, and will not require any API changes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As we discussed offline nobody works on metrics-server in Q1/1.6. Once metrics-server will be the main consumer of the API I think we should wait for it rather than add this in the upcoming release. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/an upcoming/a future There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. |
||
@dashpole will also start discussions on integrating with the CRI, and discussions on how to provide an out-of-the-box solution for the "third party monitoring" pipeline on the node. One current idea is a standalone verison of cAdvisor, but any third party metrics solution could serve this function as well. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if I understand it correctly. According to the mentioned Monitoring Architecture vision, there won't be any out-of-the-box solution for 3rd monitoring pipeline but rather clear integration points. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ill remove that from the proposal then. When you come in march, we can discuss the best way to transition from our current metrics situation to the Monitoring Architecture you outlined. |
||
|
||
## Implementation Status | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I'd get rid of this section. Proposals aren't the best medium for tracking status, and tend to get out of date. I'd use a feature issue instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok. |
||
|
||
The implementation goals of the first milestone are outlined below. | ||
- [ ] Create the proposal | ||
- [ ] Modify the structure of metrics collection code to be "On-Demand" | ||
|
||
|
||
|
||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> | ||
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/core-metrics-pipeline.md?pixel)]() | ||
<!-- END MUNGE: GENERATED_ANALYTICS --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another motivation is once we defined the new metrics API, we can denote the current Summary API for the monitoring pipeline. This means it can be iterated to provide more monitoring / introspection stats. There has been many requests to have a much richer set of monitoring stats from the node through Summary API, the reason we punt those enhancement is Summary API serves both purposes: system controlling and monitoring.