-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110
Comments
Hi @scottcrespo, thank you for opening this one, he recently had conversations about this specific topic, your proposal looks good to me, any opinions here @ruflin @jsoriano ? |
Thanks a lot for creating this detailed proposal. I like the checklist on what each metricset should document. One thing I would like to add is that from some metricsets it can be configured if it should be cluster or node level. Example for RabbitMQ is here: #6971 The reasons behind this it's not always possible to install Metricbeat on each node, for example in the case of using RabbitMQ as a hosted service but still want to monitor all nodes. For such cases we should document both cases and the pros and cons. One take away from the above proposal for me is also that we should have some docs outside the modules that describe the differences and the different approaches. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Pinging @elastic/integrations (Team:Integrations) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
TL/DR
Whether the metricset is node-level or cluster-level is important.
In each metricset's documentation, the following information should be provided:
Overview
This is a documentation request, with a specific documentation policy proposal.
When monitoring a clustered service with metricbeat, it is not clear if a particular metricset is a node-level metricset or cluster-level metricset. How the metricset is obtained impacts how the beat must be deployed and the integrity of the data that's reported.
I realized this issue while deploying a variety of metricbeat modules to collect data on clustered services running on Kubernetes.
Currently, to determine if a metricset is node-level or cluster-level, I review the metricbeat source code to find the specific endpoint being called. Then I head to that technology's official docs to figure out if I must call each node individually, or if I can address the cluster as a whole.
Determining if a metricset is node-level or cluster-level is important. For example, if the metricset is node-level but I treat it as cluster-level, I may just get a random node's report each time period, which results in inconsistent and incomplete data. Vice-versa, if I have a cluster-level metricset, but I treat it as node-level, I may receive the same cluster-wide, aggregated stats each time period n times. Then, I'm also likely to perform aggregations on the duplicate reports in Kibana, which results in a double aggregation, and potentially invalid results.
You'll find that metricbeat's official docs touch on the need to deploy agents differently on Kubernetes (Daemonset vs. Deployment) depending on whether the metricset is node-level or cluster-level.
Below I define node-level and cluster-level, and propose some specific documentation policies that might provide more information to users as to how metricbeat will collect data from a clustered service.
Definitions
Node-level metricset
Definition
The metric beat should contact each node in the cluster directly to obtain data.
Therefore, deployment of a node-level metricset requires that a metricbeat agent be placed on every node in the cluster.
Deployment
In the case of deploying on Kubernetes, you would likely create a DaemonSet.
Example
Mongo's
dbstats
metricset. Each node in a cluster may have different databases (in the case of sharded cluster), and may have different data volumes (as a result of replication and eventual consistency).Cluster-Level metricset
Definition
A cluster-level metricset may contact one particular endpoint in the cluster (which may be an LB, master node, or any random node), and the response will either (a) report on each individual node, or (b) represent an aggregation of all nodes.
Deployment
Deployment will likely single agent, which contacts a single endpoint. In the case of a Kubernetes deployment, this would be a Deployment with one replica.
Example
When retrieving data from the RabbitMQ managmenet plugin, all metricsets are cluster level.
Documentation Proposal
In each metricset's documentation, the following information should be provided:
Example
MongoDB Module
Metricsets
dbstats (beta)
Description
DbStats iterate through all databases on a particular node, and calls the db.stats() command.
Endpoint
db.stats()
Node Level
When operating a MongoDB cluster, it is recommend this metricset is treated as node-level
Deployment Notes
When collecting
dbstats
from a MongoDB cluster, metricbeat should be configured to connect to each node in the cluster directly.The text was updated successfully, but these errors were encountered: