Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring your own prometheus #16

Merged
merged 11 commits into from
May 18, 2020
202 changes: 202 additions & 0 deletions design/0003-byop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Bring your own Prometheus

- @pothulapati
- Start Date: 2020-04-15
- Target Date:
- RFC PR: [linkerd/rfc#16](https://github.com/linkerd/rfc/pull/16)
- Linkerd Issue: [linkerd/linkerd2#3590](https://github.com/linkerd/linkerd2/issues/3590)
- Reviewers: @grampelberg @alpeb @siggy

## Summary

[summary]: #summary

This RFC aims to move prometheus as an add-on i.e not mandated but advised.
This includes moving the linkerd-prometheus manifests as a sub-chart and
allowing more configuration (more on this later), along with documentation
and tooling to use Linkerd with an existing prometheus installation.

## Problem Statement (Step 1)

[problem-statement]: #problem-statement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this problem statement misses the point, at least as far as I understand it.

Having an existing prometheus installation is not, on its own, a good reason to omit linkerd-prometheus. Furthermore, in the vast majority of cases, there is no significant resource or operational cost to running linkerd-prometheus. This is true when linkerd-prometheus is used on its own for short-term retention and when linkerd-prometheus is paired with another solution for long-term storage.

As far as I'm aware the only good reason for bringing your own prometheus is if you operate at a large scale or have very specific prometheus configuration requirements and need to do extensive tuning or configuration of prometheus.


Lots of users already have a exisiting prometheus installation in their
cluster and don't like managing another prometheus instance especially
because they could be resource intensive and painful. They usually
have a prometheus installation paired up with a long term monitroing
solutions like Cortex,InfluxDB, or third party solutions like Grafana
Cloud, etc. In those cases prometheus is just a temporary component, that
keeps pushing metrics into their long term solution, making users want to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I really understand what users don't like about the existence of more than one Prometheus in their cluster. What is the actual downside to having Linkerd-prometheus as a temporary storage for metrics before they are pulled into long-term storage via federation or remote-writes? Does the linkerd-prometheus have a high resource cost? Can we quantify this? Is there some greater operational burden to having the linkerd-prometheus exist? what is that burden?

Copy link
Contributor Author

@Pothulapati Pothulapati May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, linkerd-prometheus isn't any different from the normal installation, and dosen't also get any specific tweaks for Linkerd. I definitely think users should be using the linkerd-prometheus as it dosen't do anything different and the operational burden shouldn't matter that much if there is already a Prometheus and long-term solution. We will be suggesting the same in the documentation.

On the resource cost,
We have the following in ha mode:

prometheusResources:
  cpu:
    limit: "4"
    request: 300m
  memory:
    limit: 8192Mi
    request: 300Mi

This isn't a lot but we have definitely seen cases where prometheus was being OOM, etc. I don't think this will be any better in the BYOP case, and the problem could be further amplified.

I think, It's just one of those obv questions that users stumble upon initially "Can I use my existing prometheus?" but it would later make sense as they start using, to have separate one for Linkerd.

IMO the main value of this rfc is separating out prometheus from the contorl-plane components in charts, configuration and having a life-cycle of its own for updating configuration, etc. BYOP is just another benefit that we are getting out of this. WDYT?

re-use their exisitng prometheus.

The following are the list of problems that we plan to solve:

- Hard requirement of linkerd-prometheus.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the actual problem that this solves, though? Why is hard requirement of linkerd-prometheus bad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good decision technically as there are clear separation of concerns here but a lot of people consider it as a shortcoming and have been asking the bringing their own prometheus use-case.


- Configuring the current linkerd-prometheus for remote-writes, etc is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

hard, because the prometheus configmap is overridden after upgrades.

- Not enough documentation and tooling for bringing your own prometheus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very important that we give a clear rubric in the documentation that allows users to figure out if BYO prometheus is something that they should do. I think that for the (vast?) majority of users, even users that are already running a prometheus, using linkerd-prometheus in conjunction with their own prometheus is the right thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, Totally. and we will definitely have to highlight it in the documentation of BYOP case.

use-case.

Once the RFC is implemented, Users should not only be able to use their
exisiting prometheus, but configuration of the linkerd-prometheus for
remote-writes, etc should also be easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind outlining the UX side of this a little? We want to make sure that none of the existing linkerd tooling breaks (instead using their prometheus for it all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added

### User Expereince

Currently, `public-api`, `grafana` and `heartbeat` are the only three
components which are configured with a `prometheus-url` parameter,
All the Dashboards and CLI are powered through the `public-api`.
This allows us to swap prometheus without breaking any exisiting linkerd
tooling. Though it is expected that the `scrape-config` is configured
correctly, when this path is choosen.

Are you looking for anything more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During implementation, I think we should audit all Prometheus queries from Linkerd, to ensure they are scoped to job="linkerd-proxy", or similar. We don't want Linkerd accidentally querying for non-Linkerd metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that makes sense, I will definitely see if there are any places where we use metrics that are scraped from stuff outside the scrape-config provided in documentation, and raise issues (if there are any)

## Design proposal (Step 2)

[design-proposal]: #design-proposal

### Prometheus as an Add-On

By moving Prometheus as an add-on, We can make it as an optional component
that can be used both through Helm and Linkerd CLI. The add-on model would
well with upgrades, etc as per [#3955](https://github.com/linkerd/linkerd2/pull/3955).
This is done by moving the prometheus components into a separate sub-chart
in the linkerd repo and updating the code. The same health checks are also
expected, but as a separate check Category, which would be transient i.e
runs only when it recognises that the add-on is enabled.

By default, the prometheus add-on will be **enabled**.

This also provides a single place of configuration for all things related to
linkerd-prometheus. Though we are making linkerd-prometheus optional but
having it is great for Linkerd and external users, for reasons like
separation of concerns, etc. By allowing the `remote-write`, `alerts`, etc
configuration in the linkerd-prometheus, we would decrease the need/want
for external users to tie their existing prometheus to Linkerd,
as it would be even easier for them to configure `remote-write`, etc
configuration from their exisiting prometheus to the linkerd-prometheus
right from the start.

The following default prometheus configuration is planned:

```yaml
prometheus:
enabled: true
name: linkerd-prometheus
image:
args:
# - log.format: json
global:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to load all these from files? Then folks could just manage configmaps and not actually change the chart in any way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first step is to extract our current Prometheus configmap into a separate yaml template (currently it lives together with the prom service and deployment). That configmap is currently being reworked in linkerd/linkerd2#4220, which adds the ability to plug-in recording and alerting rules through separate configmaps (no other prom configs items are pluggable like that from what I can tell).
Then this new Prometheus addon values.yaml could just have a few minimal entries like enabled, image, etc, and the new entries from that PR to configure rules. We could also have an entry to entirely override the provided Prometheus configmap with an external one, in which case the user would need to be advised to follow the scrape configs from the default configmap.
Makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @alpeb mentioned, I integrated stuff from linkerd/linkerd2#4220 into this prometheus field. It dosen't help a lot, as only rules, etc can be added as files and other configuration like remote_write, alerting etc feels better to be taken directly.

Then this new Prometheus addon values.yaml could just have a few minimal entries like enabled, image, etc, and the new entries from that PR to configure rules.

Users can always choose to ignore fields that he dont want to fill!

We could also have an entry to entirely override the provided Prometheus configmap with an external one, in which case the user would need to be advised to follow the scrape configs from the default configmap.

This is interesting but it essentially feels like a BYOP case? and requires complication on maintaing the linkerd-addons configmap.

<global-prom-config> https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file
remote-write:
<remote-write-config> https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write
alertManagers:
<alert-manager-config> https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alertmanager_config
configMapMounts:
# - name: alerting-rules
# subPath: alerting_rules.yml
# configMap: linkerd-prometheus-rules
```

All the current default configuration will be moved to `values.yaml` in the
above format, allowing users to override this by using Helm or
`--addon-config` through CLI.

### Using Exisiting Prometheus

First, Documentation will be provided at `linkerd.io` to guide users on
adding the `scrape-configs` required, to get the metrics from linkerd
proxies, control plane components, etc. This can be found
[here](https://github.com/linkerd/linkerd2/blob/master/charts/linkerd2/templates/prometheus.yaml#L26).
Additional documentation on the storage requirements for Linkerd metrics
in prometheus can also be provided around existing installations with
longer retention times, scrape interval changes, etc.
This can also be automated like we already do with Linkerd CLI help and args.

Once this is done, Users should be able to see golden metrics, etc in their
exisiting prometheus instance. The `.Values.global.prometheusUrl` field helps users in
configuring Linkerd components i.e `public-api`,etc that needs prometheus
access to the exisiting prometheus installation.

`prometheusUrl` is present in `.Values.global` instead of `.Values.prometheus.` to allow this field to
be accessible by other addOn components like `grafana`, etc. `linkerd-config-addons` Configmap will also be updated to
store the same.

Flag interactions are defined below:

| `prometheus.enabled` | `global.prometheusUrl` | Result |
|---------|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| true | "" | Install linkerd-prometheus, and configure control-plane components to use it. |
| false | "prometheus.xyz" | No linkerd-prometheus, control-plane components will be configured to url |
| true | "prometheus.xyz" | Install linkerd-prometheus, but stil configure control plane to use different url, useful when linkerd-prometheus is configured to remote-write into long term storage solution at url, Thus powering dashboard and CLI through long term storage. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting. Would it make sense to have some validation in this case ensuring that remote-write is setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that make sense.Should we consider it in the hot path i.e to make the public-api perform the check when initializing or have Linkerd check tooling that checks this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be checked at install/upgrade time by adding a Helm validation like what we do in indentity.yaml for ensuring a cert is provided. I can't recall if that also gets applied during CLI install/upgrade; if not, then that'd be something to consider in the CLI scripts.

| false | "" | No linkerd-prometheus, and no URL to power dashboards. (Should we even allow this?) |

#### User Expereince

Currently, `public-api`, `grafana` and `heartbeat` are the only three
components which are configured with a `prometheus-url` parameter,
All the Dashboards and CLI are powered through the `public-api`.
This allows us to swap prometheus without breaking any exisiting linkerd
tooling. Though it is expected that the `scrape-config` is configured
correctly, when this path is choosen.

If `prometheus-url` is not accessible, components like `public-api` should not fail
but just log some warning. So, that users can still access things like TAP, Checks, etc through the dashboard and CLI.

#### Linkerd CLI Tooling

When users get their own Prometheus, It would be really great to have
some tooling that allows users to check and troubleshoot if there are any
problems.

*linkerd check*

Checks can be added to report if the prometheus is configured correctly and is replying to queries.

- Check if the configured prometheus instance is healthy.
- if the relevant metrics are present in prometheus by querying, and warn the users about the same. User's
don't have to supply the `prometheusUrl` as we can access the same from `linkerd-config-addons` Configmap.
- More checks can be added here, if there are some known
configurations that don't go well with linkerd.

*linkerd prom-config*

A additional command can be provided in the CLI, which outputs the prometheus-config corresponding to that version of linkerd, which would
have the relevant scrape-configs and default configuration.


### Prior art
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to describe alternative solutions that we considered and explored and why they were not suitable.

For examples:

  • agressive tuning of linkerd-prometheus
  • exposing more tuning knobs of linkerd-prometheus
  • dropping labels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exposing more tuning knobs of linkerd-prometheus

This rfc address this by exposing the global field at prometheus.global where most tuning around retention period, etc can be configured.

All the tuning stuff though being very helpful to make the linkerd-prometheus more stable, managable and not run into situations like OOM, etc when there is sudden spikes in queries, etc still won't address the user's question of getting their own prometheus, which we definitely heard a lot! (definitely not suggestible though)

I added this suggestion. Feel free to comment, if you would like me to add anything!


[prior-art]: #prior-art

- [Prometheus Operator](https://github.com/coreos/prometheus-operator) is a interesting tool,
which can remove the problem of maintaining configuration for upgrades, etc from Linkerd,
but comes with it's own controller and CRD's.
- Specific to resource costs, We can consider things like tuning prometheus, dropping labels, etc making it
further lighter but there will still be some users who would want to use their exisitng prometheus. Irrespective
of this solution, we should definitely try to be light on linkerd-prometheus and also provide as many knobs for
configuration making it easy to manage. This could be addressed separately and [rfc/21](https://github.com/linkerd/rfc/pull/21) has some similarities.
- Istio's [Bring your own Prometheus example](https://istiobyexample.dev/prometheus/). A similar helm
based sub-chart model is used but this was with Mixer v1.
- We already have docs for [exporting metrics](https://linkerd.io/2/tasks/exporting-metrics/).
We could update those docs, with remote-write, etc.

### Unresolved questions

[unresolved-questions]: #unresolved-questions

- Should we support all prometheus configs in `linkerd-prometheus`?

- How do we keep track of metrics that the proxies expose, to have
consistent check tooling?

- Most third party storage solutions have some auth mechanisms, should we
allow that? IMO, we shouldn't focus on this in v1 but as folks start to call
out we can consider updating to handle that use case.

- Should we separate out each induvidual fields of `proemtheus` i.e `remoteWrite`, etc
into separate configmaps?

### Future possibilities

[future-possibilities]: #future-possibilities

In this proposal, we separate the query path with the linkerd-promethues,
Thus allowing users to use any third party long term storage solution that
follow the prometheus Query API to power Dashboards and CLI.

Also, The same `bring your own prometheus` document can be re-used for
solutions like [grafana cloud agent](https://grafana.com/blog/2020/03/18/introducing-grafana-cloud-agent-a-remote_write-focused-prometheus-agent-that-can-save-40-on-memory-usage/)
that follow the same prometheus scrape-config.