Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Core Metrics in Kubelet #252

Merged
merged 31 commits into from
Jan 25, 2017
Merged

[Proposal] Core Metrics in Kubelet #252

merged 31 commits into from
Jan 25, 2017

Conversation

dashpole
Copy link
Contributor

@dashpole dashpole commented Jan 5, 2017

Continuation of this proposal.

Included in the Monitoring Architecture is the existence of a set of "Core Metrics". This goal of this proposal is to fully define what those metrics are, and provide a blueprint for how they will be collected on the node.

Issue: #39102

My analysis of what metrics we use and dont use in the kubelet can be found here.

cc: @kubernetes/sig-node-misc @kubernetes/sig-scheduling-misc @kubernetes/sig-autoscaling-misc @kubernetes/sig-instrumentation-misc
cc: @piosz @luxas @DirectXMan12

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 5, 2017
@dashpole dashpole requested a review from DirectXMan12 January 5, 2017 15:57
@dashpole dashpole assigned dashpole, dchen1107 and piosz and unassigned dashpole Jan 5, 2017
Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document URL patterns for what is invoked to gather this information. I assume this is a new endpoint to the existing stats api.

### Background
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal contains a blueprint for a set of metrics referred to as "Core Metrics". The purpose of this proposal is to specify what those metrics are, and how they will be collected on the node.

CAdvisor is an open source container monitoring solution which only monitors containers, and has no concept of k8s constructs like pods or volumes. Kubernetes vendors cAdvisor into its codebase, and uses cAdvisor as a library with functions that enable it to collect metrics on containers. The kubelet can then combine container-level metrics from cAdvisor with the kubelet's knowledge of k8s constructs like pods to produce the kubelet Summary statistics, which provides metrics for use by the kubelet, or by users through the Summary API. cAdvisor works by collecting metrics at an interval (10 seconds), and the kubelet then simply querries these cached metrics whenever it has a need for them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/CAdvisor/cAdvisor
s/querries/querres

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The 10 seconds is a configurable interval.


It is very cumbersome to make changes or bugfixes in cAdvisor, because that then needs to be vendored back into kubernetes.

CAdvisor is structured to collect metrics on an interval, which is appropriate for a stand-alone metrics collector. However, many functions in the kubelet are latency-sensitive (eviction, for example), and would benifit from a more "On-Demand" metrics collection design.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// The memory capacity, in bytes
CapacityBytes *uint64 `json:"capacitybytes,omitempty"`
// The available memory, in bytes
AvailableBytes *uint64 `json:"availablebytes,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know the definitions are probably intended to be consistent with cAdvisor, but maybe include more detail on what this field means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Is this based on working set or total usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

The code that provides this data currently resides in cAdvisor. I propose moving this to the kubelet as well.

```go
type CoreInfo struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be MachineInfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think so too.

## Introduction

### Background
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal contains a blueprint for a set of metrics referred to as "Core Metrics". The purpose of this proposal is to specify what those metrics are, and how they will be collected on the node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dislike the term "Core Metrics". The monitoring architecture referred to this as a system metrics, and it split system metrics into core and non-core pieces. I would prefer we call this "System resource metrics API" and let the non-core stuff be treated as opaque resource metrics apis.

@smarterclayton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant to be a proposal for the core metrics portion. Are you saying that we should expose system metrics rather than just core metrics through any external URL endpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the argument is that the name "core metrics" is wrong, or not philosophically aligned with our general Kubernetes wide naming conventions (wearing my "what are we trying to build as a project" hat momentarily).

I have badgered Derek a bit about this because "core" isn't a great word to describe things. I was looking for a more precise name for this general approach, and in particular the name that this API group will have to align well to what it is surfacing. "metrics" is not appropriate (this is not generic metrics). "core-metrics" is not precise (it does not define what is core).

"resource-metrics", "system-resource-metrics", "utilization-metrics" are all more accurate. Before we merge this, I'd like to converge on a name that is not "core" to describe this subset of all possible metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resource-metrics", "system-resource-metrics", "utilization-metrics"

So, the name of the API served to expose these metrics to Kubernetes components (the HPA controller, the scheduler in the future) is the "resource metrics API" (also known as the "master metrics API", but that's a bit of a misnomer). I, personally, would lean towards the "system resource metrics API", "kubelet metrics API", or something along those lines, to distinguish it from the "resource metrics API".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the argument with the term of core-metrics. How about kubelet-resource-metrics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think basic metrics has the same ambiguous meaning as core metrics here. I heard @vishh's concern about kubelet- prefix for the naming. How about calling it node-resource-metrics or system-resource-metrics?

I will move all code pertaining to collection and processing of core metrics from cAdvisor into kubernetes.
I will vendor the new core metrics code back into cAdvisor.
I will modify volume stats collection so that it relies on this code.
I will modify the structure of stats collection code to be "On-Demand"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a mechanism to rate limit or control how frequently on-demand stat collection can occur. Did you have thoughts here? Even at existing 15s intervals, file system usage stats are very expensive to compute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking is that queries would contain a recency time.Duration parameter. Pieces of the kubelet that need the most recent information can set it low, and pieces that can do with approximately correct measures could set it higher. We could set a minimum to enforce that the system is not overwhelmed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. A lot of thought has gone into cAdvisor's stats collection to make it performant and resource efficient.

s/querries/queries
s/CAdvisor/cAdvisor
s/CoreInfo/MachineInfo
### Background
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal contains a blueprint for a set of metrics referred to as "Core Metrics". The purpose of this proposal is to specify what those metrics are, and how they will be collected on the node.

CAdvisor is an open source container monitoring solution which only monitors containers, and has no concept of k8s constructs like pods or volumes. Kubernetes vendors cAdvisor into its codebase, and uses cAdvisor as a library with functions that enable it to collect metrics on containers. The kubelet can then combine container-level metrics from cAdvisor with the kubelet's knowledge of k8s constructs like pods to produce the kubelet Summary statistics, which provides metrics for use by the kubelet, or by users through the Summary API. cAdvisor works by collecting metrics at an interval (10 seconds), and the kubelet then simply querries these cached metrics whenever it has a need for them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The 10 seconds is a configurable interval.


CAdvisor is an open source container monitoring solution which only monitors containers, and has no concept of k8s constructs like pods or volumes. Kubernetes vendors cAdvisor into its codebase, and uses cAdvisor as a library with functions that enable it to collect metrics on containers. The kubelet can then combine container-level metrics from cAdvisor with the kubelet's knowledge of k8s constructs like pods to produce the kubelet Summary statistics, which provides metrics for use by the kubelet, or by users through the Summary API. cAdvisor works by collecting metrics at an interval (10 seconds), and the kubelet then simply querries these cached metrics whenever it has a need for them.

Currently, cAdvisor collects a large number of metrics related to system and container performance. However, only some of these metrics are consumed by the kubelet summary API, and many are not used. The kubelet summary API is published to the kubelet summary API endpoint. Some of the metrics provided by the summary API are consumed internally, but most are not used internally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Define the summary API here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Motivations

Giving the kubelet the role of both providing metrics for its own use, and providing metrics for users has a couple problems
- First, it is clearly inefficent to collect metrics and not use them. The kubelet uses only a small portion of the metrics it collects, and thus presents considerable extra overhead to any users who do not use them, or prefer a third party monitoring solution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what overhead? State it explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in favor of leaving separate pipeline motivations to monitoring architecture proposal


Giving the kubelet the role of both providing metrics for its own use, and providing metrics for users has a couple problems
- First, it is clearly inefficent to collect metrics and not use them. The kubelet uses only a small portion of the metrics it collects, and thus presents considerable extra overhead to any users who do not use them, or prefer a third party monitoring solution.
- Second, as the number of metrics collected grows over time, the kubelet will gain more and more overhead for collecting, processing, and publishing these metrics. Since the metrics users may want is unbounded, the kubelet's resource overhead could easily grow to unreasonable levels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think the metrics collected will grow over time? Are you explaining the rationale behind wanting a separate "core metrics" pipeline here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will remove this, as it is explained in the monitoring architecture.

- First, it is clearly inefficent to collect metrics and not use them. The kubelet uses only a small portion of the metrics it collects, and thus presents considerable extra overhead to any users who do not use them, or prefer a third party monitoring solution.
- Second, as the number of metrics collected grows over time, the kubelet will gain more and more overhead for collecting, processing, and publishing these metrics. Since the metrics users may want is unbounded, the kubelet's resource overhead could easily grow to unreasonable levels.

It is very cumbersome to make changes or bugfixes in cAdvisor, because that then needs to be vendored back into kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be the case moving forward for the entire organization since more components are expected to move into separate projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

// MachineID reported by the node. For unique machine identification
// in the cluster this field is prefered. Learn more from man(5)
// machine-id: http://man7.org/linux/man-pages/man5/machine-id.5.html
MachineID string `json:"machineID" protobuf:"bytes,1,opt,name=machineID"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a proto annotation here and not for other objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

// SystemUUID reported by the node. For unique machine identification
// MachineID is prefered. This field is specific to Red Hat hosts
// https://access.redhat.com/documentation/en-US/Red_Hat_Subscription_Management/1/html/RHSM/getting-system-uuid.html
SystemUUID string `json:"systemUUID" protobuf:"bytes,2,opt,name=systemUUID"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, @dchen1107 and @philips had intended to drop MachineID in favor of SystemUUID or vice versa. Now might be the time to do that.


## Implementation Plan

I will move all code pertaining to collection and processing of core metrics from cAdvisor into kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of I suggest dashpole@.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just don't mention the first person:

  • move all code pertaining to collection and processing of core metrics...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to @dashpole

I will move all code pertaining to collection and processing of core metrics from cAdvisor into kubernetes.
I will vendor the new core metrics code back into cAdvisor.
I will modify volume stats collection so that it relies on this code.
I will modify the structure of stats collection code to be "On-Demand"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. A lot of thought has gone into cAdvisor's stats collection to make it performant and resource efficient.

Tenative future work, not included in this proposal:
Obtain all runtime-specific information needed to collect metrics from the CRI.
Create a third party metadata API, whose function is to provide third party monitoring solutions with kubernetes-specific data (pod-container relationships, for example).
Modify cAdvisor to be "stand alone", and run in a seperate binary from the kubelet. It will consume the above metadata API, and provide the summary API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a suggestion or a plan of record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested, i think. I view this mostly as a worst-case proposal. It may be that there is a better, and easier way to continue to provide the summary api, but at least providing stand-alone cAdvisor is a viable path that achieves the high level requirements.

@vishh
Copy link
Contributor

vishh commented Jan 5, 2017 via email

Addressed some nits,  
removed some of motivation, as it is covered by Architecture, 
s/DiskUsage/FilesystemUsage
s/DiskResources/FilesystemResources
cAdvisor vendoring no longer changed.
Clarified that standalone cAdvisor will take over providing summary api.
@dashpole
Copy link
Contributor Author

dashpole commented Jan 5, 2017

Kubelet has to avoid overcomiting CPU and for that it needs CPU capacity.
Any first class resource that can be requested explicitly via the API,
kubelet will have to track capacity and availability for it.

Ahh, right. CPU capacity is included in MachineInfo in cAdvisor's api, which I have moved to the CoreStats portion. But the kubelet doesn't consume any CPU usage or availability metrics.

@vishh
Copy link
Contributor

vishh commented Jan 5, 2017 via email

@euank
Copy link
Contributor

euank commented Jan 6, 2017

cc @lucab

@dashpole
Copy link
Contributor Author

dashpole commented Jan 6, 2017

@DirectXMan12 I was not aware of the "Resource Metrics API". Does this api encompass all metrics used by components in Kubernetes other than the kubelet? Is this api enpoint exposed by all nodes, or just the master node?

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple a changes/nits.

## Introduction

### Background
The [Monitoring Architecture](https://github.com/kubernetes/kubernetes/blob/master/docs/design/monitoring_architecture.md) proposal contains a blueprint for a set of metrics referred to as "Core Metrics". The purpose of this proposal is to specify what those metrics are, and how they will be collected on the node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resource-metrics", "system-resource-metrics", "utilization-metrics"

So, the name of the API served to expose these metrics to Kubernetes components (the HPA controller, the scheduler in the future) is the "resource metrics API" (also known as the "master metrics API", but that's a bit of a misnomer). I, personally, would lean towards the "system resource metrics API", "kubelet metrics API", or something along those lines, to distinguish it from the "resource metrics API".

This design covers only the internal Core Metrics Pipeline.

High level requirements for the design are as follows:
- Do not break existing users. We should continue to provide the full summary API by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @vishh is saying that you should qualify this, e.g.:

We should continue to provide the fully summary API as an optional addon. Once the monitoring pipeline is converted to use the new API, the summary API will be moved out into a component which can optionally be served using a standalone copy of cAdvisor

or something to that effect.

@DirectXMan12
Copy link
Contributor

Does [the resource metrics API] api encompass all metrics used by components in Kubernetes other than the kubelet?

That's supposed to be the end goal. It's called the "master metrics API" in the monitoring vision, and the "resource metrics API" elsewhere (e.g. https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-metrics-api.md).

Is this api enpoint exposed by all nodes, or just the master node?

It is/will be exposed as a "normal" Kubernetes API server API -- it's not exposed by nodes. Currently, it's exposed several ways by Heapster, in the future, it will be exposed as a discoverable API by the metrics-server component.

@dashpole
Copy link
Contributor Author

dashpole commented Jan 6, 2017

@DirectXMan12 Thanks! See the updated requirements section.

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking pretty good, but we should address the general naming question.

@timstclair
Copy link

timstclair commented Jan 24, 2017 via email

@dchen1107
Copy link
Member

dchen1107 commented Jan 24, 2017

From @vishh:

Actually, UsageRateNanoCores or instantaneous usage is what we need. Use
cases include HPA, kubectl top, dashboard, etc.

We already export cumulative CPU usage and timestamp when it is collects. Shouldn't the usage rate can be derived from those stats by server when needed.

From @timstclair:

Rate is not well defined in the current proposal which uses on-demand stats
gathering. Why not push that to the service which needs it, polling on the
desired window? (thereby enabling multiple averaging windows)?

Agreed.

@vishh
Copy link
Contributor

vishh commented Jan 24, 2017 via email

@dchen1107
Copy link
Member

From @vishh:

We have discussed and tried that in the past and at the end realized that
calculating rate at the node level is much simpler across the stack for
resource management purposes.

This is not true in today's Heapster implementation which serves HPA, kubectl top, dashboard, etc. It appears that there are 2 types of cpu usage metrics:

  • .node.cpu.usageNanoCores (cAdvisor) -> StandardMetrics.MetricCpuUsage (Heapster) - current cpu usage metric ==> mapping to UsageRateNanoCores proposed here
  • .node.cpu.usageCoreNanoSecons (cAdvisor) -> RateMetrics.MetricCpuUsageRate (Heapster) - cumulative cpu usage rate ==> mapping to CumulativeUsageNanoSeconds proposed here

In metrics/processors/rate_calculator.go https://github.com/kubernetes/heapster/blob/master/metrics/processors/rate_calculator.go#L60-L63 - there is a special handling for core.MetricCpuUsage which has strong assumption that it operates on cumulative values.

I search heapster's codebase, I think usageNanoCores is ignored by heapster completely.

@dchen1107
Copy link
Member

LGTM

@dchen1107 dchen1107 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2017
@dashpole dashpole dismissed stale reviews from derekwaynecarr, DirectXMan12, and timothysc January 25, 2017 18:19

addressed comments.

@timstclair
Copy link

I search heapster's codebase, I think usageNanoCores is ignored by heapster completely.

Yes, as far as I know the usageNanoCores metric isn't used anywhere.

@dchen1107
Copy link
Member

@dashpole can you squash all commits, then we are ready to go.

@vishh
Copy link
Contributor

vishh commented Jan 25, 2017 via email

@piosz
Copy link
Member

piosz commented Jan 31, 2017

@dashpole can we please do not merge PRs with 31 completely useless commits?

@piosz piosz changed the title [Proposal] Core Metrics API [Proposal] Core Metrics in Kubelet Jan 31, 2017
Copy link
Member

@piosz piosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late to the party.

The set of metrics being collected looks reasonable and should meet metrics-server requirements.

A few comments inline.

// The time at which these Metrics were updated.
Timestamp metav1.Time
// Cumulative CPU usage (sum of all cores) since object creation.
CumulativeUsageNanoSeconds *uint64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cumulative cpu usage needs to be accompanied with time window from which it was collected.

## Implementation Plan
@dashpole will modify the structure of metrics collection code to be "On-Demand".

Suggested, tentative future work, which may be covered by future proposals:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under section: future improvements rather than implementation plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

To keep performance bounded while still offering metrics "On-Demand", all calls to get metrics are cached, and a minimum recency is established to prevent repeated metrics computation. Before computing new metrics, the previous metrics are checked to see if they meet the recency requirements of the caller. If the age of the metrics meet the recency requirements, then the cached metrics are returned. If not, then new metrics are computed and cached.

## Implementation Plan
@dashpole will modify the structure of metrics collection code to be "On-Demand".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment:
IMO the purpose of such technical proposal is to define what should be done and how rather than who will be working on this. From technical POV it's irrelevant information here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed references to people

Implementation:
To keep performance bounded while still offering metrics "On-Demand", all calls to get metrics are cached, and a minimum recency is established to prevent repeated metrics computation. Before computing new metrics, the previous metrics are checked to see if they meet the recency requirements of the caller. If the age of the metrics meet the recency requirements, then the cached metrics are returned. If not, then new metrics are computed and cached.

## Implementation Plan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation plan is rahter poor. You should either expand or remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in favor of a Future Work section.

- Kubernetes can be configured to run a default "third party metrics provider" as a daemonset. Possibly standalone cAdvisor.

## Rollout Plan
Once this set of metrics is accepted, @dashpole will begin discussions on the format, and design of the endpoint that exposes them. The node resource metrics endpoint (TBD) will be added alongside the current Summary API in an upcoming release. This should allow concurrent developments of other portions of the system metrics pipeline (metrics-server, for example). Once this addition is made, all other changes will be internal, and will not require any API changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline nobody works on metrics-server in Q1/1.6. Once metrics-server will be the main consumer of the API I think we should wait for it rather than add this in the upcoming release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/an upcoming/a future
It was not meant to mean this release, but an upcoming release. This should make that more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


## Rollout Plan
Once this set of metrics is accepted, @dashpole will begin discussions on the format, and design of the endpoint that exposes them. The node resource metrics endpoint (TBD) will be added alongside the current Summary API in an upcoming release. This should allow concurrent developments of other portions of the system metrics pipeline (metrics-server, for example). Once this addition is made, all other changes will be internal, and will not require any API changes.
@dashpole will also start discussions on integrating with the CRI, and discussions on how to provide an out-of-the-box solution for the "third party monitoring" pipeline on the node. One current idea is a standalone verison of cAdvisor, but any third party metrics solution could serve this function as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand it correctly. According to the mentioned Monitoring Architecture vision, there won't be any out-of-the-box solution for 3rd monitoring pipeline but rather clear integration points.

cc @fgrzadkowski

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill remove that from the proposal then. When you come in march, we can discuss the best way to transition from our current metrics situation to the Monitoring Architecture you outlined.

@dashpole
Copy link
Contributor Author

@piosz I used the git "squash and merge" tool. It should only be one commit in the master branch, but you still see all the commits here.

@dashpole dashpole restored the core_metrics_api branch January 31, 2017 16:20
@dashpole dashpole deleted the core_metrics_api branch January 31, 2017 16:25
@dashpole dashpole mentioned this pull request Jan 31, 2017
@piosz
Copy link
Member

piosz commented Feb 1, 2017

@dashpole thanks for the explanation. Indeed there is only one commit in the master branch. Magic.

ruebenramirez pushed a commit to ruebenramirez/community that referenced this pull request Apr 22, 2017
This document provides the contents, and uses of core metrics.
shyamjvs pushed a commit to shyamjvs/community that referenced this pull request Sep 22, 2017
keep dates in order; add beta0 to details
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
This document provides the contents, and uses of core metrics.
danehans pushed a commit to danehans/community that referenced this pull request Jul 18, 2023
* Add new Reviewer role with all the needed info

Signed-off-by: Rigs Caballero <grca@google.com>

* Change "Content Reviewer" to "Documentation Reviewer"

Signed-off-by: Rigs Caballero <grca@google.com>

* Fix confusing clause.

Signed-off-by: Rigs Caballero <grca@google.com>

* Fix typo and remove lists' intros

Signed-off-by: Rigs Caballero <grca@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.