Unbundle Prometheus and Grafana #3406

andrew-waters · 2019-09-09T12:09:05Z

Feature Request

What problem are you trying to solve?

linkerd2 deployments come with Grafana and Prometheus bundled into the deployment. This is great for single cluster setups and PoC's but it comes at the cost of resources when an existing cluster already has prometheus running. The documentation around exporting metrics is useful and for the specific use case of running thanos, the service monitor is the preferable route (the purpose of thanos is to avoid federation, which has legitimate issues).

Below is an approximate topology of a multi cluster setup running thanos (as close as possible to the linkerd2 supplied one).

As you can see it should in theory be possible to remove prometheus and grafana from the linkerd deployment and maintain metric collection.

However, removing these cause errors with linkerd check and linkerd dashboard (they appear to check for the existence.

How should the problem be solved?

I'm not entirely sure of the scope of changes that are required, but adding a flag to the commands that would otherwise fail may be a useful starting point for discussion:

linkerd check --exclude=prometheus,grafana
linkerd install --exclude=prometheus,grafana
linkerd dashboard

Note that linkerd dashboard fails when prometheus isn't available which suggests there may be some core logic that is shared validation logic between the cli.

Any alternatives you've considered?

The only alternatives are to:

ignore this issue and force users to install prometheus and grafana
avoid using the CLI

How would users interact with this feature?

Referenced above in bash scripts.

Edit: updated diagram to have a white background for legibility

The text was updated successfully, but these errors were encountered:

grampelberg · 2019-09-09T20:02:24Z

The bundled prometheus is pretty tuned just for Linkerd. We're doing enough things there and with the CLI/dashboard that a central system would likely crash, especially if retention is over 6 hours. Assuming that cost can be kept in check, it feels like an important piece of the puzzle.

Grafana, on the other hand, is 100% optional. There's some ongoing work around configuration that'll make it optional and not break when it isn't installed (dashboard links for example).

andrew-waters · 2019-09-09T20:21:12Z

That's interesting. Is linkerd using prometheus as it's own storage? The most common (recommended) tsdb lifecycle in prometheus is 2 hours so that shouldn't cause issues.

Being able to point dashboard links to an URL outside of the cluster would be helpful.

I still maintain that unbundling (optional) is valid if it's explicit what the ramifications may be.

grampelberg · 2019-09-09T20:43:53Z

Is linkerd using prometheus as it's own storage?

Linkerd doesn't do any storage itself. So, it is either prometheus for metrics or k8s for cluster state and configuration.

is valid if it's explicit what the ramifications may be.

You're totally right, at least some documentation on what happens would be helpful.

masterkain · 2019-11-01T17:38:33Z

can someone kindly explain the relation between the linkerd2-prometheus and https://github.com/weaveworks/flagger/blob/master/docs/gitbook/how-it-works.md#http-metrics this?

asking because request-success-rate should be a prometheus metrics but it's not being picked up, I'm unsure if I have to install prometheus or I can reuse the linkerd one.

grampelberg · 2019-11-01T17:41:30Z

Flagger uses the linkerd prometheus.

masterkain · 2019-11-01T17:45:18Z

Flagger uses the linkerd prometheus.

thanks, does the nginx ingress needs to be meshed too? when I do that it does not work anymore (502)

I'm trying to make canary releases work but I have no idea how flagger asks linkerd-prometheus these metrics during a deployment. I know that this isn't maybe the right place, but I'm stuck, any advice would be appreciated https://gist.github.com/masterkain/75c26bf239ad08400ac40c0a45714b28 I tested from inside the load balancer pod hey -z 2m -q 10 -c 2 http://bstore-stag-puma:3000/elb-status and packets are being sent ok

grampelberg · 2019-11-01T17:47:19Z

Why don't you jump into slack. It will be easier to help you there. Also:

Instructions for meshing nginx-ingress - https://linkerd.io/2/tasks/using-ingress/
Tutorial on how to use flagger with linkerd end to end - https://linkerd.io/2/tasks/canary-release/

Pothulapati · 2021-03-17T07:05:48Z

This is fixed now. Both grafana, prometheus are enabled by default but optional now. Check the configuration fields to see how to disable them during install.

grampelberg added area/install help wanted labels Sep 17, 2019

Pothulapati mentioned this issue Oct 16, 2019

Make Prometheus Pluggable #3590

Closed

andrew-waters mentioned this issue Dec 27, 2019

Add multicluster support for Grafana #3405

Closed

Pothulapati mentioned this issue May 8, 2020

Move Prometheus as an Add-On #4362

Merged

Pothulapati mentioned this issue Jul 7, 2020

Remove/Relax prometheus related checks #4724

Merged

Pothulapati closed this as completed Mar 17, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbundle Prometheus and Grafana #3406

Unbundle Prometheus and Grafana #3406

andrew-waters commented Sep 9, 2019 •

edited

Loading

grampelberg commented Sep 9, 2019

andrew-waters commented Sep 9, 2019

grampelberg commented Sep 9, 2019

masterkain commented Nov 1, 2019

grampelberg commented Nov 1, 2019

masterkain commented Nov 1, 2019 •

edited

Loading

grampelberg commented Nov 1, 2019

Pothulapati commented Mar 17, 2021

Unbundle Prometheus and Grafana #3406

Unbundle Prometheus and Grafana #3406

Comments

andrew-waters commented Sep 9, 2019 • edited Loading

Feature Request

What problem are you trying to solve?

How should the problem be solved?

Any alternatives you've considered?

How would users interact with this feature?

grampelberg commented Sep 9, 2019

andrew-waters commented Sep 9, 2019

grampelberg commented Sep 9, 2019

masterkain commented Nov 1, 2019

grampelberg commented Nov 1, 2019

masterkain commented Nov 1, 2019 • edited Loading

grampelberg commented Nov 1, 2019

Pothulapati commented Mar 17, 2021

andrew-waters commented Sep 9, 2019 •

edited

Loading

masterkain commented Nov 1, 2019 •

edited

Loading