-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alloy Test and Investigation for Metrics #3522
Comments
I managed to use alloy to send metrics to Mimir. Settings kept:
Settings abandoned:
Decision made:
Here is the value file I used to deploy Alloy as metrics ingester values.yaml.gz
The amount of metrics sent to Mimir stays the same as with Prometheus agent
|
Here are the results of running Prometheus agent and Alloy as metrics agent. All test have been ran on the same installation ( I used 4 different test cases:
Agents
MimirThe amount of metrics, network and resources load on Mimir stayed approximately the same across all tests. Some Mimir ingester restarted and had some impact on the values shown in the graphs here but values are mostly within the same range. SummaryThose tests showed that Alloy tends to consume about the same amount of resources or less than Prometheus agent and Mimir load stayed the same across all tests. Here are the results as Grafana dashboard screenshots prometheus-agent_vs_alloy.tar.gz |
Most of the work and testing was done in giantswarm/observability-operator#66 I decided to go with our current custom autoscaling solution as it would otherwise differ to much from what we have currently and its also more complex to find a fit for every different installations size.
Deployment to an installation is currently blocked as this feature is only supported on CAPI installations and we need a new release to get the new observability-bundle out. |
v29.1.0 is on its way, once this is release to a CAPA installation we can proceed with our live testing of Alloy as monitoring agent. We would then only need to toggle the monitoring agent flag for the observability-operator (example: https://github.com/giantswarm/giantswarm-configs/pull/135/files). |
As an FYI, the release was merged :) |
Now we need to have it deployed to MCs https://github.com/giantswarm/giantswarm-management-clusters/pull/749 |
We can try it on a WC right? |
Oh wait no we cannot because of this https://github.com/giantswarm/observability-operator/blob/09ddfe046e6a81cc6b874ac537941be9a495bc18/internal/controller/cluster_monitoring_controller.go#L181 Maybe the ervices should be created for each reconciliation then so the agent is always injected? Or passed as a function parameter |
Yes we can test it out on the gazelle/cicddev cluster as it is running 29.1.0 :) |
There were actually few issue preventing this to be rolled out
Those are all fixed now, but we now need to wait for an upgrade of observability-bundle to v1.6.2, most likely in capa v30.0.0 > giantswarm/releases#1357 (review) |
This is running on
Reminder: make sure we make an announcement to customers before releasing alloy-metrics. |
@TheoBrigitte as this is an investigation story and not the rollout, should this be put in tracking or closed? |
Done on our side for now. |
Motivation
We want to unify all of our agents to use the new opentelemetry agent from grafanalabs: alloy. For this we need to first test out if alloy can deliver exactly the same capability as the prometheus/grafana agents when collecting metrics.
Todo
Outcome
The text was updated successfully, but these errors were encountered: