Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for other metrics sources than Heapster #1310

Closed
superdump opened this issue Oct 6, 2016 · 21 comments
Closed

Provide support for other metrics sources than Heapster #1310

superdump opened this issue Oct 6, 2016 · 21 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@superdump
Copy link

Issue details

The new dashboard in 1.4.0 was announced to support metrics graphs that provide a quick look into the recent state of the various assets running on a cluster. However, Heapster is not everyone's choice of monitoring system and indeed Prometheus is an excellent match for Kubernetes. It would be very useful if Prometheus and others were supported as sources for the dashboard metrics.

Environment

Dashboard was deployed using the YAML description as noted in the README.md.

Dashboard version: 1.4.0
Kubernetes version: 1.4.0
Operating system: Ubuntu 16.04.1 LTS but not relevant.
Steps to reproduce
  1. Roll out a Kubernetes cluster
  2. Install the dashboard
  3. Observe their are no CPU/RAM graphs as in the announcement
Observed result

Only Heapster is supported as a source of metrics for the dashboard graphs.

Expected result

An easy way to configure the data source of metrics in the dashboard. Data sources should likely include the most used solutions (Prometheus, InfluxDB, Graphite, ...) and be pluggable so people can implement and easily extend that dashboard with their own data sources.

Comments

Not sure what the prioritisation should be for support. Consider this issue a request for Prometheus support.

@bryk
Copy link
Contributor

bryk commented Oct 6, 2016

Cool, thanks for this feature request! Looks like something we'd love to have.

Is this something you can contribute to?

@cheld
Copy link
Contributor

cheld commented Oct 7, 2016

As far as I understand we have three use cases:

  • One important use case is to provide historic data to Dashboard.

Heapster old-timer is intended to provide an interface for historic data, by reading data from DBs such as Hawkular, InfluxDB, GCM, and OpenTSDB

https://github.com/kubernetes/heapster/blob/master/docs/proposals/old-timer.md

Does this cover part of your request?

  • Another important use case is custom application metrics e.g. active user sessions

There have been suggestions that pods could expose custom metrics in prometheus syntax. Not sure if this is still up to date:

https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/custom-metrics.md

I believe currently only cAdvisor can be used to read custom metrics.

  • The third use case is to replace Heapster altogether and make Kubernetes more pluggable.

@cheld
Copy link
Contributor

cheld commented Oct 7, 2016

BTW: in my understanding the only way to use InfluxDB or Graphite is to install Heapster with a corresponding sink. How would you suggest to install without Heapster? Also, could you elaborate on your prometheus setup?

@puja108
Copy link
Member

puja108 commented Oct 7, 2016

At least from my perspective it is more about the 3rd use case. Not replacing Heapster, but making other things (like Dashboard or Autoscaling) not depend on it but instead be more pluggable. This might be a bit more over-arching than only Dashboard.

The issue here in Dashboard repo specifically though is quite well defined, as Dashboard is using a very specific set of Heapster metrics, which can be provided for example by Prometheus just as well.

For users already running Prometheus (because of additional functionality, like e.g. custom metrics, altering, and more) running Heapster is a bit of wasting resources.

@cheld
Copy link
Contributor

cheld commented Oct 7, 2016

I have just learned that cAdvisor exposes the metrics in Prometheus format as well. Do you use cAdisor or the node-exporter for scraping? Are the nodes manually registered? Anyway, the setup is pretty similar to heapster with a few Prometheus benefits, I see.

@puja108
Copy link
Member

puja108 commented Oct 7, 2016

We deploy node-exporter as a daemonset and there's a k8s service discovery in prometheus. For an example you can check out this quick repo: https://github.com/giantswarm/kubernetes-prometheus or the CoreOS setup (which comes with a but more explanation around it): https://coreos.com/blog/prometheus-and-kubernetes-up-and-running.html

@superdump
Copy link
Author

@cheld I'll try to answer your questions in order:

  • Access to historical data is one thing. Access to other metrics not exposed in the dashboard that may be related to diagnosis is another.
  • The third case you mentioned is the main goal - why run Heapster if you anyway intend to run Prometheus?
  • A standard Prometheus for k8s is probably better known by Prometheus project members. However, it has k8s service discovery which is being actively worked on to be improved. I would deploy a node-exporter on every host (API services, etcds, nodes, whatever) and the kubelet bakes in and exposes a Prometheus metrics endpoint from cadvisor as far as I know. Other services would use appropriate exporters and service discovery to expose themselves appropriately.

As I got down to the bottom of the issue, @puja108 details the same aspects I have been considering and indeed the same resources (the CoreOS blog articles about Prometheus and k8s for example.)

I would choose to deploy Prometheus, its Alertmanager component, Grafana for dashboards, a Prometheus push gateway for monitoring batch job / version roll out events and such. And it makes sense to me to use the metrics from there and not deploy Heapster if possible. However, I'm a k8s newbie and don't know for what else Heapster is used. You're welcome to poke me here or on Slack about the wider-reaching consequences of Heapster.

@floreks floreks added priority/P2 kind/feature Categorizes issue or PR as related to a new feature. labels Oct 8, 2016
@cheld
Copy link
Contributor

cheld commented Oct 10, 2016

Thanks for explaining, The replacement of Heapster seems interesting. I am a bit worried about n-Kubernetes components to depend on m-monitoring systems. BTW: AFAIK, there is an attempt to add a monitoring API to the api server, I think. This might solve all issues.

@puja108
Copy link
Member

puja108 commented Oct 10, 2016

Agree, the goal should not be n*m complexity. We should mid- to long-term strive for a solution that "harmonizes" the interface in between. IIRC sig instrumentation was also discussing around this topic. We might want to sync this over SIGs, as this is not a Dashboard or UI issue per se.

If we had a common /metrics somewhere we could all try to build on that, and only when you need specialized data you need to actually go for the specific solution.

Here's also the SIG instrumentation meeting notes, with their plans for monitoring API: https://docs.google.com/document/d/1gWuAATtlmI7XJILXd31nA4kMq6U9u63L70382Y3xcbM/edit#heading=h.qpfxt91hdl2x

We need to just see how long such a harmonization effort would take and if we don't want to have a interim-solution where at least the 2 main monitoring solutions (Heapster and Prometheus), which are also part of CNCF, can be supported. Especially, as a harmonized solution should most probably look close to either of those APIs I'd say.

@jimmidyson
Copy link
Member

I encourage you to join sig-instrumentaion if you're interested in helping define these extension points. Sig-instrumentation has mostly been discussing what APIs need to be defined to allow this & prevent the n*m complexity that you mention that we all want to avoid.

@cheld
Copy link
Contributor

cheld commented Oct 11, 2016

@jimmidyson thanks for offering. I am a bit busy the next two weeks. After that I will have a look.

CC @floreks, @maciaszczykm, @kenan435 maybe interesting for you?

BTW: for the interested reader, the kubernetes components that depend on Heapster which I know out of my head are:

  • Dashboard
  • Horizontal Pod Autoscaler
  • kubectl top

there are might be more.

@bryk
Copy link
Contributor

bryk commented Oct 20, 2016

I had a chat with @piosz and @mwielgus and AFAIU the plan of sig-instrumentation is to provide a single API for "current state of resource usage". This could be easily used with Dashboard and others. Historical data proposal is not settled yet on.

@piosz
Copy link
Member

piosz commented Oct 20, 2016

There is Kubernetes monitoring architecture proposal in flight.

Our intend is to address issues like this. The proposal is to:

  • have core monitoring pipeline, installed by default on every cluster, which collects only the most important metrics required by core Kubernetes components (in fact cpu/mem/disk usage + resource estimation) and export the latest values of them via Master Metrics API available in API server
  • provide easy way of installation of various 3rd party monitoring providers (like Heapster, Prometheus, Sysdig), and integration with components which need more than just base metrics available in api server like UI or HPA (to scale based on custom metrics)

I can imagine that UI can work as follow:

  • display basic graphs/infos in case there is no 3rd party monitoring provider (or there is a one which is not itegrated with UI)
  • display advanced, sophisticated grapfs/infos in case there is a 3rd party monitoring provider integrated with UI

Does it seem reasonable?

cc @davidopp @fabxc @fgrzadkowski

@fgrzadkowski
Copy link
Contributor

Another options would be:

  • Read data from Infrastore (once it's available)
  • Keep some history in dashboard specific backend that would be read from metrics API.

@davidopp
Copy link
Member

I think the diagram at the end of the doc Piotr posted here #1310 (comment) addresses this? kube-dashboard takes metrics from Infrastore, which gets its metrics from master metrics API (core system metrics) and a plugin that lets you feed it metrics from any monitoring system (the plugin is written by the monitoring system vendor).

@puja108
Copy link
Member

puja108 commented Oct 20, 2016

I guess keeping to the metrics API specs will be key, no matter if they come from the master (metrics) API (i.e. metrics server) or the Infrastore.

Looking at the Roadmap for monitoring Metrics server will optimistically come OOTB with 1.6, but Infrastore will still be PoC by then, so I guess historical data would have to wait a bit. Still I would not build another dashboard specific history backend just for the short time until we get Infrastore.

@bryk
Copy link
Contributor

bryk commented Oct 21, 2016

Still I would not build another dashboard specific history backend just for the short time until we get Infrastore.

Yes. If we ever realize that Infrastore release timeline does not work for us, I'd rather make us join the effort than develop something separate.

@maciaszczykm maciaszczykm added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 13, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 30, 2018
@maciaszczykm maciaszczykm added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 27, 2018
@maciaszczykm
Copy link
Member

We will be switching to metrics API and it can be tracked from #2986. Heapster integration will be removed.

/close

@k8s-ci-robot
Copy link
Contributor

@maciaszczykm: Closing this issue.

In response to this:

We will be switching to metrics API and it can be tracked from #2986. Heapster integration will be removed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests