Provide support for other metrics sources than Heapster #1310

superdump · 2016-10-06T06:53:41Z

Issue details

The new dashboard in 1.4.0 was announced to support metrics graphs that provide a quick look into the recent state of the various assets running on a cluster. However, Heapster is not everyone's choice of monitoring system and indeed Prometheus is an excellent match for Kubernetes. It would be very useful if Prometheus and others were supported as sources for the dashboard metrics.

Environment

Dashboard was deployed using the YAML description as noted in the README.md.

Dashboard version: 1.4.0
Kubernetes version: 1.4.0
Operating system: Ubuntu 16.04.1 LTS but not relevant.

Steps to reproduce

Roll out a Kubernetes cluster
Install the dashboard
Observe their are no CPU/RAM graphs as in the announcement

Observed result

Only Heapster is supported as a source of metrics for the dashboard graphs.

Expected result

An easy way to configure the data source of metrics in the dashboard. Data sources should likely include the most used solutions (Prometheus, InfluxDB, Graphite, ...) and be pluggable so people can implement and easily extend that dashboard with their own data sources.

Comments

Not sure what the prioritisation should be for support. Consider this issue a request for Prometheus support.

The text was updated successfully, but these errors were encountered:

bryk · 2016-10-06T09:22:05Z

Cool, thanks for this feature request! Looks like something we'd love to have.

Is this something you can contribute to?

cheld · 2016-10-07T13:02:35Z

As far as I understand we have three use cases:

One important use case is to provide historic data to Dashboard.

Heapster old-timer is intended to provide an interface for historic data, by reading data from DBs such as Hawkular, InfluxDB, GCM, and OpenTSDB

https://github.com/kubernetes/heapster/blob/master/docs/proposals/old-timer.md

Does this cover part of your request?

Another important use case is custom application metrics e.g. active user sessions

There have been suggestions that pods could expose custom metrics in prometheus syntax. Not sure if this is still up to date:

https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/custom-metrics.md

I believe currently only cAdvisor can be used to read custom metrics.

The third use case is to replace Heapster altogether and make Kubernetes more pluggable.

cheld · 2016-10-07T13:09:55Z

BTW: in my understanding the only way to use InfluxDB or Graphite is to install Heapster with a corresponding sink. How would you suggest to install without Heapster? Also, could you elaborate on your prometheus setup?

puja108 · 2016-10-07T13:14:16Z

At least from my perspective it is more about the 3rd use case. Not replacing Heapster, but making other things (like Dashboard or Autoscaling) not depend on it but instead be more pluggable. This might be a bit more over-arching than only Dashboard.

The issue here in Dashboard repo specifically though is quite well defined, as Dashboard is using a very specific set of Heapster metrics, which can be provided for example by Prometheus just as well.

For users already running Prometheus (because of additional functionality, like e.g. custom metrics, altering, and more) running Heapster is a bit of wasting resources.

cheld · 2016-10-07T13:56:30Z

I have just learned that cAdvisor exposes the metrics in Prometheus format as well. Do you use cAdisor or the node-exporter for scraping? Are the nodes manually registered? Anyway, the setup is pretty similar to heapster with a few Prometheus benefits, I see.

puja108 · 2016-10-07T14:02:28Z

We deploy node-exporter as a daemonset and there's a k8s service discovery in prometheus. For an example you can check out this quick repo: https://github.com/giantswarm/kubernetes-prometheus or the CoreOS setup (which comes with a but more explanation around it): https://coreos.com/blog/prometheus-and-kubernetes-up-and-running.html

superdump · 2016-10-08T16:59:28Z

@cheld I'll try to answer your questions in order:

Access to historical data is one thing. Access to other metrics not exposed in the dashboard that may be related to diagnosis is another.
The third case you mentioned is the main goal - why run Heapster if you anyway intend to run Prometheus?
A standard Prometheus for k8s is probably better known by Prometheus project members. However, it has k8s service discovery which is being actively worked on to be improved. I would deploy a node-exporter on every host (API services, etcds, nodes, whatever) and the kubelet bakes in and exposes a Prometheus metrics endpoint from cadvisor as far as I know. Other services would use appropriate exporters and service discovery to expose themselves appropriately.

As I got down to the bottom of the issue, @puja108 details the same aspects I have been considering and indeed the same resources (the CoreOS blog articles about Prometheus and k8s for example.)

I would choose to deploy Prometheus, its Alertmanager component, Grafana for dashboards, a Prometheus push gateway for monitoring batch job / version roll out events and such. And it makes sense to me to use the metrics from there and not deploy Heapster if possible. However, I'm a k8s newbie and don't know for what else Heapster is used. You're welcome to poke me here or on Slack about the wider-reaching consequences of Heapster.

cheld · 2016-10-10T06:28:39Z

Thanks for explaining, The replacement of Heapster seems interesting. I am a bit worried about n-Kubernetes components to depend on m-monitoring systems. BTW: AFAIK, there is an attempt to add a monitoring API to the api server, I think. This might solve all issues.

puja108 · 2016-10-10T08:05:59Z

Agree, the goal should not be n*m complexity. We should mid- to long-term strive for a solution that "harmonizes" the interface in between. IIRC sig instrumentation was also discussing around this topic. We might want to sync this over SIGs, as this is not a Dashboard or UI issue per se.

If we had a common /metrics somewhere we could all try to build on that, and only when you need specialized data you need to actually go for the specific solution.

Here's also the SIG instrumentation meeting notes, with their plans for monitoring API: https://docs.google.com/document/d/1gWuAATtlmI7XJILXd31nA4kMq6U9u63L70382Y3xcbM/edit#heading=h.qpfxt91hdl2x

We need to just see how long such a harmonization effort would take and if we don't want to have a interim-solution where at least the 2 main monitoring solutions (Heapster and Prometheus), which are also part of CNCF, can be supported. Especially, as a harmonized solution should most probably look close to either of those APIs I'd say.

jimmidyson · 2016-10-10T08:28:28Z

I encourage you to join sig-instrumentaion if you're interested in helping define these extension points. Sig-instrumentation has mostly been discussing what APIs need to be defined to allow this & prevent the n*m complexity that you mention that we all want to avoid.

cheld · 2016-10-11T09:27:23Z

@jimmidyson thanks for offering. I am a bit busy the next two weeks. After that I will have a look.

CC @floreks, @maciaszczykm, @kenan435 maybe interesting for you?

BTW: for the interested reader, the kubernetes components that depend on Heapster which I know out of my head are:

Dashboard
Horizontal Pod Autoscaler
kubectl top

there are might be more.

bryk · 2016-10-20T08:23:22Z

I had a chat with @piosz and @mwielgus and AFAIU the plan of sig-instrumentation is to provide a single API for "current state of resource usage". This could be easily used with Dashboard and others. Historical data proposal is not settled yet on.

piosz · 2016-10-20T13:17:13Z

There is Kubernetes monitoring architecture proposal in flight.

Our intend is to address issues like this. The proposal is to:

have core monitoring pipeline, installed by default on every cluster, which collects only the most important metrics required by core Kubernetes components (in fact cpu/mem/disk usage + resource estimation) and export the latest values of them via Master Metrics API available in API server
provide easy way of installation of various 3rd party monitoring providers (like Heapster, Prometheus, Sysdig), and integration with components which need more than just base metrics available in api server like UI or HPA (to scale based on custom metrics)

I can imagine that UI can work as follow:

display basic graphs/infos in case there is no 3rd party monitoring provider (or there is a one which is not itegrated with UI)
display advanced, sophisticated grapfs/infos in case there is a 3rd party monitoring provider integrated with UI

Does it seem reasonable?

cc @davidopp @fabxc @fgrzadkowski

fgrzadkowski · 2016-10-20T13:34:08Z

Another options would be:

Read data from Infrastore (once it's available)
Keep some history in dashboard specific backend that would be read from metrics API.

davidopp · 2016-10-20T19:49:39Z

I think the diagram at the end of the doc Piotr posted here #1310 (comment) addresses this? kube-dashboard takes metrics from Infrastore, which gets its metrics from master metrics API (core system metrics) and a plugin that lets you feed it metrics from any monitoring system (the plugin is written by the monitoring system vendor).

puja108 · 2016-10-20T20:16:38Z

I guess keeping to the metrics API specs will be key, no matter if they come from the master (metrics) API (i.e. metrics server) or the Infrastore.

Looking at the Roadmap for monitoring Metrics server will optimistically come OOTB with 1.6, but Infrastore will still be PoC by then, so I guess historical data would have to wait a bit. Still I would not build another dashboard specific history backend just for the short time until we get Infrastore.

bryk · 2016-10-21T09:00:34Z

Still I would not build another dashboard specific history backend just for the short time until we get Infrastore.

Yes. If we ever realize that Infrastore release timeline does not work for us, I'd rather make us join the effort than develop something separate.

fejta-bot · 2017-12-31T16:15:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-30T16:23:34Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

maciaszczykm · 2018-11-08T07:43:24Z

We will be switching to metrics API and it can be tracked from #2986. Heapster integration will be removed.

/close

k8s-ci-robot · 2018-11-08T07:43:25Z

@maciaszczykm: Closing this issue.

In response to this:

We will be switching to metrics API and it can be tracked from #2986. Heapster integration will be removed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

floreks added priority/P2 kind/feature Categorizes issue or PR as related to a new feature. labels Oct 8, 2016

maciaszczykm removed the area/dev label Apr 18, 2017

cheld mentioned this issue Jun 21, 2017

Introduce integration framework backend #2017

Merged

5 tasks

maciaszczykm added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 13, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 30, 2018

maciaszczykm added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 27, 2018

ijl mentioned this issue Apr 26, 2018

Support metrics API #2986

Closed

k8s-ci-robot closed this as completed Nov 8, 2018

puja108 mentioned this issue Jan 3, 2019

REQUEST: New membership for @puja108 kubernetes/org#327

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide support for other metrics sources than Heapster #1310

Provide support for other metrics sources than Heapster #1310

superdump commented Oct 6, 2016

bryk commented Oct 6, 2016

cheld commented Oct 7, 2016 •

edited

Loading

cheld commented Oct 7, 2016

puja108 commented Oct 7, 2016

cheld commented Oct 7, 2016

puja108 commented Oct 7, 2016

superdump commented Oct 8, 2016

cheld commented Oct 10, 2016

puja108 commented Oct 10, 2016 •

edited

Loading

jimmidyson commented Oct 10, 2016

cheld commented Oct 11, 2016

bryk commented Oct 20, 2016

piosz commented Oct 20, 2016

fgrzadkowski commented Oct 20, 2016

davidopp commented Oct 20, 2016

puja108 commented Oct 20, 2016 •

edited

Loading

bryk commented Oct 21, 2016

fejta-bot commented Dec 31, 2017

fejta-bot commented Jan 30, 2018

maciaszczykm commented Nov 8, 2018

k8s-ci-robot commented Nov 8, 2018

Provide support for other metrics sources than Heapster #1310

Provide support for other metrics sources than Heapster #1310

Comments

superdump commented Oct 6, 2016

Issue details

Environment

Steps to reproduce

Observed result

Expected result

Comments

bryk commented Oct 6, 2016

cheld commented Oct 7, 2016 • edited Loading

cheld commented Oct 7, 2016

puja108 commented Oct 7, 2016

cheld commented Oct 7, 2016

puja108 commented Oct 7, 2016

superdump commented Oct 8, 2016

cheld commented Oct 10, 2016

puja108 commented Oct 10, 2016 • edited Loading

jimmidyson commented Oct 10, 2016

cheld commented Oct 11, 2016

bryk commented Oct 20, 2016

piosz commented Oct 20, 2016

fgrzadkowski commented Oct 20, 2016

davidopp commented Oct 20, 2016

puja108 commented Oct 20, 2016 • edited Loading

bryk commented Oct 21, 2016

fejta-bot commented Dec 31, 2017

fejta-bot commented Jan 30, 2018

maciaszczykm commented Nov 8, 2018

k8s-ci-robot commented Nov 8, 2018

cheld commented Oct 7, 2016 •

edited

Loading

puja108 commented Oct 10, 2016 •

edited

Loading

puja108 commented Oct 20, 2016 •

edited

Loading