Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

Closed
scottcrespo opened this issue May 15, 2018 · 5 comments
Closed

[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

scottcrespo opened this issue May 15, 2018 · 5 comments
Labels

Comments

@scottcrespo
Copy link

scottcrespo commented May 15, 2018

TL/DR

Whether the metricset is node-level or cluster-level is important.

In each metricset's documentation, the following information should be provided:

  1. The endpoint called to retrieve the data with a link to the official documentation on that endpoint
  2. Whether the metricset is node-level or cluster-level when the target is clustered
  3. (Optional) Deployment notes/recommendations

Overview

This is a documentation request, with a specific documentation policy proposal.

When monitoring a clustered service with metricbeat, it is not clear if a particular metricset is a node-level metricset or cluster-level metricset. How the metricset is obtained impacts how the beat must be deployed and the integrity of the data that's reported.

I realized this issue while deploying a variety of metricbeat modules to collect data on clustered services running on Kubernetes.

Currently, to determine if a metricset is node-level or cluster-level, I review the metricbeat source code to find the specific endpoint being called. Then I head to that technology's official docs to figure out if I must call each node individually, or if I can address the cluster as a whole.

Determining if a metricset is node-level or cluster-level is important. For example, if the metricset is node-level but I treat it as cluster-level, I may just get a random node's report each time period, which results in inconsistent and incomplete data. Vice-versa, if I have a cluster-level metricset, but I treat it as node-level, I may receive the same cluster-wide, aggregated stats each time period n times. Then, I'm also likely to perform aggregations on the duplicate reports in Kibana, which results in a double aggregation, and potentially invalid results.

You'll find that metricbeat's official docs touch on the need to deploy agents differently on Kubernetes (Daemonset vs. Deployment) depending on whether the metricset is node-level or cluster-level.

Below I define node-level and cluster-level, and propose some specific documentation policies that might provide more information to users as to how metricbeat will collect data from a clustered service.

Definitions

Node-level metricset

Definition
The metric beat should contact each node in the cluster directly to obtain data.

Therefore, deployment of a node-level metricset requires that a metricbeat agent be placed on every node in the cluster.

Deployment
In the case of deploying on Kubernetes, you would likely create a DaemonSet.

Example
Mongo's dbstats metricset. Each node in a cluster may have different databases (in the case of sharded cluster), and may have different data volumes (as a result of replication and eventual consistency).

Cluster-Level metricset

Definition
A cluster-level metricset may contact one particular endpoint in the cluster (which may be an LB, master node, or any random node), and the response will either (a) report on each individual node, or (b) represent an aggregation of all nodes.

Deployment
Deployment will likely single agent, which contacts a single endpoint. In the case of a Kubernetes deployment, this would be a Deployment with one replica.

Example
When retrieving data from the RabbitMQ managmenet plugin, all metricsets are cluster level.

Documentation Proposal

In each metricset's documentation, the following information should be provided:

  1. The endpoint called to retrieve the data with a link to the official documentation on that endpoint
  2. Whether the metricset is node-level or cluster-level when the target is clustered
  3. (Optional) Deployment notes/recommendations

Example


MongoDB Module

Metricsets

dbstats (beta)

Description
DbStats iterate through all databases on a particular node, and calls the db.stats() command.

Endpoint
db.stats()

Node Level
When operating a MongoDB cluster, it is recommend this metricset is treated as node-level

Deployment Notes
When collecting dbstats from a MongoDB cluster, metricbeat should be configured to connect to each node in the cluster directly.

@exekias
Copy link
Contributor

exekias commented May 15, 2018

Hi @scottcrespo, thank you for opening this one, he recently had conversations about this specific topic, your proposal looks good to me, any opinions here @ruflin @jsoriano ?

@ruflin
Copy link
Contributor

ruflin commented May 16, 2018

Thanks a lot for creating this detailed proposal. I like the checklist on what each metricset should document.

One thing I would like to add is that from some metricsets it can be configured if it should be cluster or node level. Example for RabbitMQ is here: #6971

The reasons behind this it's not always possible to install Metricbeat on each node, for example in the case of using RabbitMQ as a hosted service but still want to monitor all nodes. For such cases we should document both cases and the pros and cons.

One take away from the above proposal for me is also that we should have some docs outside the modules that describe the differences and the different approaches.

@botelastic
Copy link

botelastic bot commented Mar 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Mar 18, 2021
@jsoriano jsoriano added the Team:Integrations Label for the Integrations team label Mar 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Mar 19, 2021
@botelastic
Copy link

botelastic bot commented Mar 19, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Mar 19, 2022
@botelastic botelastic bot closed this as completed Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants