-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbundle Prometheus and Grafana #3406
Comments
The bundled prometheus is pretty tuned just for Linkerd. We're doing enough things there and with the CLI/dashboard that a central system would likely crash, especially if retention is over 6 hours. Assuming that cost can be kept in check, it feels like an important piece of the puzzle. Grafana, on the other hand, is 100% optional. There's some ongoing work around configuration that'll make it optional and not break when it isn't installed (dashboard links for example). |
That's interesting. Is linkerd using prometheus as it's own storage? The most common (recommended) tsdb lifecycle in prometheus is 2 hours so that shouldn't cause issues. Being able to point dashboard links to an URL outside of the cluster would be helpful. I still maintain that unbundling (optional) is valid if it's explicit what the ramifications may be. |
Linkerd doesn't do any storage itself. So, it is either prometheus for metrics or k8s for cluster state and configuration.
You're totally right, at least some documentation on what happens would be helpful. |
can someone kindly explain the relation between the linkerd2-prometheus and https://github.com/weaveworks/flagger/blob/master/docs/gitbook/how-it-works.md#http-metrics this? asking because |
Flagger uses the linkerd prometheus. |
thanks, does the nginx ingress needs to be meshed too? when I do that it does not work anymore (502) I'm trying to make canary releases work but I have no idea how flagger asks linkerd-prometheus these metrics during a deployment. I know that this isn't maybe the right place, but I'm stuck, any advice would be appreciated https://gist.github.com/masterkain/75c26bf239ad08400ac40c0a45714b28 I tested from inside the load balancer pod |
Why don't you jump into slack. It will be easier to help you there. Also:
|
This is fixed now. Both |
Feature Request
What problem are you trying to solve?
linkerd2 deployments come with Grafana and Prometheus bundled into the deployment. This is great for single cluster setups and PoC's but it comes at the cost of resources when an existing cluster already has prometheus running. The documentation around exporting metrics is useful and for the specific use case of running thanos, the service monitor is the preferable route (the purpose of thanos is to avoid federation, which has legitimate issues).
Below is an approximate topology of a multi cluster setup running thanos (as close as possible to the linkerd2 supplied one).
As you can see it should in theory be possible to remove prometheus and grafana from the linkerd deployment and maintain metric collection.
However, removing these cause errors with
linkerd check
andlinkerd dashboard
(they appear to check for the existence.How should the problem be solved?
I'm not entirely sure of the scope of changes that are required, but adding a flag to the commands that would otherwise fail may be a useful starting point for discussion:
Note that
linkerd dashboard
fails when prometheus isn't available which suggests there may be some core logic that is shared validation logic between the cli.Any alternatives you've considered?
The only alternatives are to:
How would users interact with this feature?
Referenced above in bash scripts.
Edit: updated diagram to have a white background for legibility
The text was updated successfully, but these errors were encountered: