[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

scottcrespo · 2018-05-15T20:01:32Z

TL/DR

Whether the metricset is node-level or cluster-level is important.

In each metricset's documentation, the following information should be provided:

The endpoint called to retrieve the data with a link to the official documentation on that endpoint
Whether the metricset is node-level or cluster-level when the target is clustered
(Optional) Deployment notes/recommendations

Overview

This is a documentation request, with a specific documentation policy proposal.

When monitoring a clustered service with metricbeat, it is not clear if a particular metricset is a node-level metricset or cluster-level metricset. How the metricset is obtained impacts how the beat must be deployed and the integrity of the data that's reported.

I realized this issue while deploying a variety of metricbeat modules to collect data on clustered services running on Kubernetes.

Currently, to determine if a metricset is node-level or cluster-level, I review the metricbeat source code to find the specific endpoint being called. Then I head to that technology's official docs to figure out if I must call each node individually, or if I can address the cluster as a whole.

Determining if a metricset is node-level or cluster-level is important. For example, if the metricset is node-level but I treat it as cluster-level, I may just get a random node's report each time period, which results in inconsistent and incomplete data. Vice-versa, if I have a cluster-level metricset, but I treat it as node-level, I may receive the same cluster-wide, aggregated stats each time period n times. Then, I'm also likely to perform aggregations on the duplicate reports in Kibana, which results in a double aggregation, and potentially invalid results.

You'll find that metricbeat's official docs touch on the need to deploy agents differently on Kubernetes (Daemonset vs. Deployment) depending on whether the metricset is node-level or cluster-level.

Below I define node-level and cluster-level, and propose some specific documentation policies that might provide more information to users as to how metricbeat will collect data from a clustered service.

Definitions

Node-level metricset

Definition
The metric beat should contact each node in the cluster directly to obtain data.

Therefore, deployment of a node-level metricset requires that a metricbeat agent be placed on every node in the cluster.

Deployment
In the case of deploying on Kubernetes, you would likely create a DaemonSet.

Example
Mongo's dbstats metricset. Each node in a cluster may have different databases (in the case of sharded cluster), and may have different data volumes (as a result of replication and eventual consistency).

Cluster-Level metricset

Definition
A cluster-level metricset may contact one particular endpoint in the cluster (which may be an LB, master node, or any random node), and the response will either (a) report on each individual node, or (b) represent an aggregation of all nodes.

Deployment
Deployment will likely single agent, which contacts a single endpoint. In the case of a Kubernetes deployment, this would be a Deployment with one replica.

Example
When retrieving data from the RabbitMQ managmenet plugin, all metricsets are cluster level.

Documentation Proposal

In each metricset's documentation, the following information should be provided:

The endpoint called to retrieve the data with a link to the official documentation on that endpoint
Whether the metricset is node-level or cluster-level when the target is clustered
(Optional) Deployment notes/recommendations

Example

MongoDB Module

Metricsets

dbstats (beta)

Description
DbStats iterate through all databases on a particular node, and calls the db.stats() command.

Endpoint
db.stats()

Node Level
When operating a MongoDB cluster, it is recommend this metricset is treated as node-level

Deployment Notes
When collecting dbstats from a MongoDB cluster, metricbeat should be configured to connect to each node in the cluster directly.

The text was updated successfully, but these errors were encountered:

exekias · 2018-05-15T22:20:51Z

Hi @scottcrespo, thank you for opening this one, he recently had conversations about this specific topic, your proposal looks good to me, any opinions here @ruflin @jsoriano ?

ruflin · 2018-05-16T14:31:39Z

Thanks a lot for creating this detailed proposal. I like the checklist on what each metricset should document.

One thing I would like to add is that from some metricsets it can be configured if it should be cluster or node level. Example for RabbitMQ is here: #6971

The reasons behind this it's not always possible to install Metricbeat on each node, for example in the case of using RabbitMQ as a hosted service but still want to monitor all nodes. For such cases we should document both cases and the pros and cons.

One take away from the above proposal for me is also that we should have some docs outside the modules that describe the differences and the different approaches.

botelastic · 2021-03-18T19:16:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

elasticmachine · 2021-03-19T09:51:24Z

Pinging @elastic/integrations (Team:Integrations)

botelastic · 2022-03-19T09:52:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

exekias added enhancement docs Metricbeat Metricbeat labels May 15, 2018

botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Mar 18, 2021

jsoriano added the Team:Integrations Label for the Integrations team label Mar 19, 2021

botelastic bot removed Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Mar 19, 2021

botelastic bot added the Stalled label Mar 19, 2022

botelastic bot closed this as completed Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

scottcrespo commented May 15, 2018 •

edited

Loading

exekias commented May 15, 2018

ruflin commented May 16, 2018

botelastic bot commented Mar 18, 2021

elasticmachine commented Mar 19, 2021

botelastic bot commented Mar 19, 2022

[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

[Metricbeat] Document Cluster Level vs. Node Level Metricsets #7110

Comments

scottcrespo commented May 15, 2018 • edited Loading

TL/DR

Overview

Definitions

Node-level metricset

Cluster-Level metricset

Documentation Proposal

Example

MongoDB Module

Metricsets

dbstats (beta)

exekias commented May 15, 2018

ruflin commented May 16, 2018

botelastic bot commented Mar 18, 2021

elasticmachine commented Mar 19, 2021

botelastic bot commented Mar 19, 2022

scottcrespo commented May 15, 2018 •

edited

Loading