Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Core Metrics in Kubelet #252

Merged
merged 31 commits into from
Jan 25, 2017
Merged
Changes from 30 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d411d57
Moved from kubernetes/docs/proposals
dashpole Jan 5, 2017
3bad120
Addressed comments
dashpole Jan 5, 2017
1e69459
More comments
dashpole Jan 5, 2017
d0aaf9a
Changes
dashpole Jan 5, 2017
29a4bf8
s/WorkingsetBytes/UsedBytes
dashpole Jan 5, 2017
25abf52
s/I/@dashpole
dashpole Jan 5, 2017
11ea97f
node-level cpu metrics in kubelet requirements
dashpole Jan 5, 2017
13eee17
Improve memory and cpu documentation
dashpole Jan 5, 2017
919764e
summary API change, cpu description
dashpole Jan 6, 2017
04dc0aa
Include Resource Metrics API
dashpole Jan 6, 2017
fdcfed7
Configurable interval
dashpole Jan 6, 2017
40a60f2
Define summary API
dashpole Jan 6, 2017
d030ef9
removed proto annotation
dashpole Jan 6, 2017
54eaae8
update tentative future plans
dashpole Jan 6, 2017
53c3b18
Feature based
dashpole Jan 10, 2017
2f1d746
Add Definitions section. Add users description.
dashpole Jan 10, 2017
b059fd4
cleanup language, formatting; rollout plan;
dashpole Jan 11, 2017
9e87a2f
Addressed comments about language
dashpole Jan 11, 2017
a106dd6
address comments, remove json
dashpole Jan 12, 2017
da190b6
formatting
dashpole Jan 12, 2017
6cf837e
Clarified only usage
dashpole Jan 12, 2017
d9fca71
On-demand
dashpole Jan 12, 2017
5f636c3
No longer about how metrics exposed; +formatting
dashpole Jan 13, 2017
2d5ebbb
changes
dashpole Jan 13, 2017
005669d
Refined language
dashpole Jan 14, 2017
cec5de5
removed capacity from requirements
dashpole Jan 14, 2017
ea0e796
internal to kubelet
dashpole Jan 14, 2017
38e2036
narrow scope of metrics.
dashpole Jan 18, 2017
15e80b4
final changes
dashpole Jan 19, 2017
6c43ae8
s/UsageInodes/UsedInodes; updated rollout
dashpole Jan 24, 2017
33160be
addressed timstclair changes
dashpole Jan 24, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions contributors/design-proposals/core-metrics-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Core Metrics in kubelet

**Author**: David Ashpole (@dashpole)

**Last Updated**: 1/19/2017

**Status**: Proposal

This document proposes a design for the set of metrics included in an eventual Core Metrics Pipeline.

<!-- BEGIN MUNGE: GENERATED_TOC -->

- [Core Metrics in kubelet](#core-metrics-in-kubelet)
- [Introduction](#introduction)
- [Definitions](#definitions)
- [Background](#background)
- [Motivations](#motivations)
- [Proposal](#proposal)
- [Non Goals](#non-goals)
- [Design](#design)
- [Metric Requirements:](#metric-requirements)
- [Proposed Core Metrics:](#proposed-core-metrics)
- [On-Demand Design:](#on-demand-design)
- [Implementation Plan](#implementation-plan)
- [Rollout Plan](#rollout-plan)
- [Implementation Status](#implementation-status)

<!-- END MUNGE: GENERATED_TOC -->

## Introduction

### Definitions
"Kubelet": The daemon that runs on every kubernetes node and controls pod and container lifecycle, among many other things.
["cAdvisor":](https://github.com/google/cadvisor) An open source container monitoring solution which only monitors containers, and has no concept of kubernetes constructs like pods or volumes.
["Summary API":](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go) A kubelet API which currently exposes node metrics for use by both system components and monitoring systems.
["CRI":](https://github.com/kubernetes/community/blob/master/contributors/devel/container-runtime-interface.md) The Container Runtime Interface designed to provide an abstraction over runtimes (docker, rkt, etc).
"Core Metrics": A set of metrics described in the [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) whose purpose is to provide metrics for first-class resource isolation and untilization features, including [resource feasibility checking](https://github.com/eBay/Kubernetes/blob/master/docs/design/resources.md#the-resource-model) and node resource management.
"Resource": A consumable element of a node (e.g. memory, disk space, CPU time, etc).
"First-class Resource": A resource critical for scheduling, whose requests and limits can be (or soon will be) set via the Pod/Container Spec.
"Metric": A measure of consumption of a Resource.

### Background
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal contains a blueprint for a set of metrics referred to as "Core Metrics". The purpose of this proposal is to specify what those metrics are, to enable work relating to the collection, by the kubelet, of the metrics.

Kubernetes vendors cAdvisor into its codebase, and the kubelet uses cAdvisor as a library that enables it to collect metrics on containers. The kubelet can then combine container-level metrics from cAdvisor with the kubelet's knowledge of kubernetes constructs (e.g. pods) to produce the kubelet Summary statistics, which provides metrics for use by the kubelet, or by users through the [Summary API](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go). cAdvisor works by collecting metrics at an interval (10 seconds, by default), and the kubelet then simply queries these cached metrics whenever it has a need for them.

Currently, cAdvisor collects a large number of metrics related to system and container performance. However, only some of these metrics are consumed by the kubelet summary API, and many are not used. The kubelet [Summary API](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/stats/types.go) is published to the kubelet summary API endpoint (stats/summary). Some of the metrics provided by the summary API are consumed by kubernetes system components, but many are included for the sole purpose of providing metrics for monitoring.

### Motivations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another motivation is once we defined the new metrics API, we can denote the current Summary API for the monitoring pipeline. This means it can be iterated to provide more monitoring / introspection stats. There has been many requests to have a much richer set of monitoring stats from the node through Summary API, the reason we punt those enhancement is Summary API serves both purposes: system controlling and monitoring.

The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal explains why a separate monitoring pipeline is required.

By publishing core metrics, the kubelet is relieved of its responsibility to provide metrics for monitoring.
The third party monitoring pipeline also is relieved of any responsibility to provide these metrics to system components.

cAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benifit from a more "On-Demand" metrics collection design.

### Proposal
This proposal is to use this set of core metrics, collected by the kubelet, and used solely by kubernetes system components to support "First-Class Resource Isolation and Utilization Features". This proposal is not designed to be an API published by the kubelet, but rather a set of metrics collected by the kubelet that will be transformed, and published in the future.

The target "Users" of this set of metrics are kubernetes components (though not neccessarily directly). This set of metrics itself is not designed to be user-facing, but is designed to be general enough to support user-facing components.

### Non Goals
Everything covered in the [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) design doc will not be covered in this proposal. This includes the third party metrics pipeline, and the methods by which the metrics found in this proposal are provided to other kubernetes components.

Integration with CRI will not be covered in this proposal. In future proposals, integrating with CRI may provide a better abstraction of information required by the core metrics pipeline to collect metrics.

The kubelet API endpoint, including the format, url pattern, versioning strategy, and name of the API will be the topic of a follow-up proposal to this proposal.

## Design
This design covers only metrics to be included in the Core Metrics Pipeline.

High level requirements for the design are as follows:
- The kubelet collects the minimum possible number of metrics to provide "First-Class Resource Isolation and Utilization Features".
- Metrics can be fetched "On Demand", giving the kubelet more up-to-date stats.

This proposal purposefully omits many metrics that may eventually become core metrics. This is by design. Once metrics are needed to support First-Class Resource Isolation and Utilization Features, they can be added to the core metrics API.

### Metric Requirements
The core metrics api is designed to provide metrics for "First Class Resource Isolation and Utilization Features" within kubernetes.

Many kubernetes system components currently support these features. Many more components that support these features are in development.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refers to the above list of "use-cases" changed the above to explicitly call them features.

The following is not meant to be an exhaustive list, but gives the current set of use cases for these metrics.

Metrics requirements for "First Class Resource Isolation and Utilization Features", based on kubernetes component needs, are as follows:

- Kubelet
- Node-level usage metrics for Filesystems, CPU, and Memory
- Pod-level usage metrics for Filesystems and Memory
- Metrics Server (outlined in [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md)), which exposes the [Resource Metrics API](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-metrics-api.md) to the following system components:
- Scheduler
- Node-level usage metrics for Filesystems, CPU, and Memory
- Pod-level usage metrics for Filesystems, CPU, and Memory
- Container-level usage metrics for Filesystems, CPU, and Memory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler uses container level metrics? Or does this just mean as opposed to the metrics consumed by a Pod that aren't accounted to a container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It is the vertical pod autoscaler that will need container level metrics. I was basing that off of david oppenheim's scheduling requirements doc

- Horizontal-Pod-Autoscaler
- Node-level usage metrics for CPU and Memory
- Pod-level usage metrics for CPU and Memory
- Cluster Federation
- Node-level usage metrics for Filesystems, CPU, and Memory
- kubectl top and Kubernetes Dashboard
- Node-level usage metrics for Filesystems, CPU, and Memory
- Pod-level usage metrics for Filesystems, CPU, and Memory
- Container-level usage metrics for Filesystems, CPU, and Memory

### Proposed Core Metrics:
This section defines "usage metrics" for filesystems, CPU, and Memory.
As stated in Non-Goals, this proposal does not attempt to define the specific format by which these are exposed. For convenience, it may be neccessary to include static information such as start time, node capacities for CPU, Memory, or filesystems, and more.

```go
// CpuUsage holds statistics about the amount of cpu time consumed
type CpuUsage struct {
// The time at which these Metrics were updated.
Timestamp metav1.Time
// Cumulative CPU usage (sum of all cores) since object creation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we needed to add container start time to the summary API to know when this was measured from. I suppose the container ID could be used to cross-reference the ContainerStatus for start time, but we should at least consider whether it should be included here.

CumulativeUsageNanoSeconds *uint64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cumulative cpu usage needs to be accompanied with time window from which it was collected.

}

// MemoryUsage holds statistics about the quantity of memory consumed
type MemoryUsage struct {
// The time at which these metrics were updated.
Timestamp metav1.Time
// The amount of "working set" memory. This includes recently accessed memory,
// dirty memory, and kernel memory.
UsageBytes *uint64
}

// FilesystemUsage holds statistics about the quantity of local storage (e.g. disk) resources consumed
type FilesystemUsage struct {
// The time at which these metrics were updated.
Timestamp metav1.Time
// This must uniquely identify the storage resource that is consumed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique in what namespace? E.g. will this refer to the device/partition, or the subpath of the container FS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will refer to the device/partition. The namespace is the node. I can update the comment,

StorageIdentifier string
// UsedBytes represents the disk space consumed, in bytes.
UsedBytes *uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need Capacity and Available too. I'd recommend sticking to the fields in the summary API. We have already spent a lot of time defining that. I don't see a point in debating the same fields again in this proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed from scope of proposal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: +1 For normalizing on "Used" or "Usage"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// UsedInodes represents the inodes consumed
UsedInodes *uint64
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You talked about gathering stats on demand, but some use cases (e.g. kubectl top) also require regular updates. I think it would be worth adding a section about how this will work. For example, every stat request could include a "required freshness", and pulled from cache if there are recent stats, or fetch new ones if not.


### On-Demand Design
Interface:
The interface for exposing these metrics within the kubelet contains methods for fetching each relevant metric. These methods contains a "recency" parameter which specifies how recently the metrics must have been computed. Kubelet components which require very up-to-date metrics (eviction, for example), use very low values. Other components use higher values.

Implementation:
To keep performance bounded while still offering metrics "On-Demand", all calls to get metrics are cached, and a minimum recency is established to prevent repeated metrics computation. Before computing new metrics, the previous metrics are checked to see if they meet the recency requirements of the caller. If the age of the metrics meet the recency requirements, then the cached metrics are returned. If not, then new metrics are computed and cached.

## Implementation Plan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation plan is rahter poor. You should either expand or remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in favor of a Future Work section.

@dashpole will modify the structure of metrics collection code to be "On-Demand".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filesystem metrics are expensive and its better for us to collect them in the background. Collecting them on-demand would mean that we will either exec 100+ du binaries or take ~10s of seconds to respond to each stat query.
I'd instead recommend adding timestamp to filesystem metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment:
IMO the purpose of such technical proposal is to define what should be done and how rather than who will be working on this. From technical POV it's irrelevant information here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed references to people


Suggested, tentative future work, which may be covered by future proposals:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under section: future improvements rather than implementation plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- Publish these metrics in some form to a kubelet API endpoint
- Obtain all runtime-specific information needed to collect metrics from the CRI.
- Kubernetes can be configured to run a default "third party metrics provider" as a daemonset. Possibly standalone cAdvisor.

## Rollout Plan
Once this set of metrics is accepted, @dashpole will begin discussions on the format, and design of the endpoint that exposes them. The node resource metrics endpoint (TBD) will be added alongside the current Summary API in an upcoming release. This should allow concurrent developments of other portions of the system metrics pipeline (metrics-server, for example). Once this addition is made, all other changes will be internal, and will not require any API changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline nobody works on metrics-server in Q1/1.6. Once metrics-server will be the main consumer of the API I think we should wait for it rather than add this in the upcoming release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/an upcoming/a future
It was not meant to mean this release, but an upcoming release. This should make that more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@dashpole will also start discussions on integrating with the CRI, and discussions on how to provide an out-of-the-box solution for the "third party monitoring" pipeline on the node. One current idea is a standalone verison of cAdvisor, but any third party metrics solution could serve this function as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand it correctly. According to the mentioned Monitoring Architecture vision, there won't be any out-of-the-box solution for 3rd monitoring pipeline but rather clear integration points.

cc @fgrzadkowski

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill remove that from the proposal then. When you come in march, we can discuss the best way to transition from our current metrics situation to the Monitoring Architecture you outlined.


## Implementation Status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd get rid of this section. Proposals aren't the best medium for tracking status, and tend to get out of date. I'd use a feature issue instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.


The implementation goals of the first milestone are outlined below.
- [ ] Create the proposal
- [ ] Modify the structure of metrics collection code to be "On-Demand"



<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/core-metrics-pipeline.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->