Stop collecting the beat state metricset as part of agent monitoring #4153

cmacknz · 2024-01-26T16:58:32Z

Our agent monitoring implementation currently uses the beat Metricbeat module to monitor Beat subprocesses. We collect both the stats and state metricsets.

elastic-agent/internal/pkg/agent/application/monitoring/v1_monitor.go

Lines 617 to 625 in b39b9af

    
           if isSupportedBeatsBinary(binaryName) { 
        
           	beatsStreams = append(beatsStreams, map[string]interface{}{ 
        
           		idKey: "metrics-monitoring-" + name, 
        
           		"data_stream": map[string]interface{}{ 
        
           			"type":      "metrics", 
        
           			"dataset":   fmt.Sprintf("elastic_agent.%s", name), 
        
           			"namespace": monitoringNamespace, 
        
           		}, 
        
           		"metricsets": []interface{}{"stats", "state"},

It seems to me that nothing actually uses the data from the state metricset. We don't map the fields in the Elastic Agent integration. I believe we can remove this metricset and stop pointlessly storing this data for every Beat process we start.

We currently store both the state and stats metricset in the same datastream, and as such include the metricset name as a TSDB dimension which could probably be removed after this change.

https://github.com/elastic/integrations/blob/a2c55c4cbf752e0490f9fe2d3e68698517c7b74d/packages/elastic_agent/data_stream/elastic_agent_metrics/fields/ecs.yml#L21-L23

- name: metricset.name
  type: keyword
  dimension: true

Acceptance Criteria:

The beat state documents are no longer queryable from the metrics-elastic_agent.* datastream and any existing mappings the elastic agent package are removed https://github.com/elastic/integrations/tree/main/packages/elastic_agent
The data storage savings after removing this metricset are calculated and included in the release notes

elasticmachine · 2024-01-26T16:58:33Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

nimarezainia · 2024-04-22T02:31:22Z

@pchila thanks for your diligence on this issue. Would it be possible to have a benchmark on what the savings we could expect from this change?

cc: @pierrehilbert

ycombinator · 2024-05-02T22:00:07Z

Reopening this issue as the second part of the acceptance criteria isn't actually done yet AFAICT:

The data storage savings after removing this metricset are calculated and included in the release notes

Also related to @nimarezainia's question in the previous comment.

pchila · 2024-05-03T06:25:37Z

@pchila thanks for your diligence on this issue. Would it be possible to have a benchmark on what the savings we could expect from this change?

@cmacknz did a quick check on the data savings here on the PR #4579 (comment)

I will re-run 2 versions of agent (with and without the change) and check the index size and document count

ycombinator · 2024-05-03T11:17:03Z

@cmacknz did a quick check on the data savings here on the PR #4579 (comment)

I will re-run 2 versions of agent (with and without the change) and check the index size and document count

Thanks. Could you make a small PR to update

elastic-agent/changelog/fragments/1713257367-Remove-beat-state-metricset-from-elastic-agent-monitoring.yaml

Line 19 in fd7984b

#description:

with these savings numbers?

pchila · 2024-05-03T14:30:38Z

@nimarezainia @ycombinator
Re-measured index size difference between commit 1e88a94 (commit just before the change) and commit 0d31445 (merge commit of the related PR) for a 10 min period after startup.

In both cases I used a policy that included the System Integration and agent logs and metrics collection.

Here's the sizes of the reindexed documents

Document count for metrics-elastic_agent.filebeat-* and metrics-elastic_agent.metricbeat-disksize.baseline is down by 50% (as expected removing half of the metricsets) with a size on disk gain of ~13% for both indices

I am gonna put up a small PR with the changelog patching and link it to this issue

cmacknz · 2024-05-03T14:38:18Z

In that same PR, can you add something under the doc directory describing how to reproduce these test results?

pchila · 2024-05-03T14:41:17Z

@cmacknz
I used a script that is part of PR #4633 for extracting and reindexing logs and metrics but it's not merged yet

cmacknz · 2024-05-03T14:47:05Z

Sure, doesn't matter when or how it gets documented then, as long as we have a way to remember what we did if we want to re-evaluate this again later.

strawgate · 2024-05-03T14:59:18Z

Isn't the number of metrics produced dependent on the number of components running under agent? i.e. something like x document per beat per interval? so the % savings depends on the number of deployed integrations/managed beats?

cmacknz · 2024-05-03T15:20:08Z

That is correct yes, more complex configurations will see greater savings. I assume @pchila likely tested this with the default system integration installed, I will comment on the changelog entry.

pchila · 2024-05-03T15:41:36Z

@strawgate @cmacknz edited my comment adding clarification on what policy I used for the test. This is the reason why I expressed the savings in % as the absolute numbers will scale with the number of impacted indices

elasticmachine · 2024-05-04T02:24:43Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

cmacknz added the Team:Elastic-Agent Label for the Agent team label Jan 26, 2024

This was referenced Jan 26, 2024

Replace the beat/metrics input with http/metrics for collecting Beat process stats #4154

Open

TestFleet* seems to be flaky (again) elastic/cloud-on-k8s#7389

Closed

cmacknz changed the title ~~Stop collect the beat state metricset as part of agent monitoring~~ Stop collecting the beat state metricset as part of agent monitoring Jan 26, 2024

cmacknz self-assigned this Jan 26, 2024

cmacknz mentioned this issue Feb 20, 2024

elastic-agent and beats should handle last 30s metrics separately from regular log lines #3804

Open

cmacknz mentioned this issue Mar 5, 2024

Add new agentbeat with all beats shipped with Elastic Agent as a single beat elastic/beats#38183

Merged

11 tasks

cmacknz removed their assignment Apr 8, 2024

pierrehilbert assigned pierrehilbert and pchila and unassigned pierrehilbert Apr 8, 2024

pchila mentioned this issue Apr 16, 2024

Remove state beat state metricset #4579

Merged

7 tasks

cmacknz closed this as completed in #4579 Apr 17, 2024

ycombinator reopened this May 2, 2024

pchila mentioned this issue May 3, 2024

Patched changelog for PR 4579 #4671

Merged

ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 4, 2024

pchila closed this as completed in #4671 May 6, 2024

mergify bot mentioned this issue May 6, 2024

[8.14](backport #4671) Patched changelog for PR 4579 #4677

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop collecting the beat state metricset as part of agent monitoring #4153

Stop collecting the beat state metricset as part of agent monitoring #4153

cmacknz commented Jan 26, 2024 •

edited

Loading

elasticmachine commented Jan 26, 2024

nimarezainia commented Apr 22, 2024

ycombinator commented May 2, 2024

pchila commented May 3, 2024

ycombinator commented May 3, 2024

pchila commented May 3, 2024 •

edited

Loading

cmacknz commented May 3, 2024

pchila commented May 3, 2024

cmacknz commented May 3, 2024

strawgate commented May 3, 2024 •

edited

Loading

cmacknz commented May 3, 2024

pchila commented May 3, 2024 •

edited

Loading

elasticmachine commented May 4, 2024

Stop collecting the beat state metricset as part of agent monitoring #4153

Stop collecting the beat state metricset as part of agent monitoring #4153

Comments

cmacknz commented Jan 26, 2024 • edited Loading

elasticmachine commented Jan 26, 2024

nimarezainia commented Apr 22, 2024

ycombinator commented May 2, 2024

pchila commented May 3, 2024

ycombinator commented May 3, 2024

pchila commented May 3, 2024 • edited Loading

cmacknz commented May 3, 2024

pchila commented May 3, 2024

cmacknz commented May 3, 2024

strawgate commented May 3, 2024 • edited Loading

cmacknz commented May 3, 2024

pchila commented May 3, 2024 • edited Loading

elasticmachine commented May 4, 2024

cmacknz commented Jan 26, 2024 •

edited

Loading

pchila commented May 3, 2024 •

edited

Loading

strawgate commented May 3, 2024 •

edited

Loading

pchila commented May 3, 2024 •

edited

Loading