Reduce agent logs by default #4633

pchila · 2024-04-29T14:42:15Z

What does this PR do?

This PR drops the Non-zero metrics after 30s... logs before they get sent to Elasticsearch.

Why is it important?

The goal is to reduce the amount of elastic-agent monitoring events ingested in Elasticsearch in order to reduce index disk size.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~
I have added an entry in ./changelog/fragments using the changelog tool
~~[ ] I have added an integration test or an E2E test~~

Author's Checklist

[ ]

How to test this PR locally

Run an elastic agent with and without this change on a deployment making a note of the start times and elastic-agent id of both.
Using the included ES script we then extract the documents ingested during those runs and store them in dedicated indices.

In my tests I ran 2 elastic agents managed by fleet with a default policy including system integration, first without this change and then including this change, then sliced the first 10 minutes of logs and metrics from startup using the included script.

This is a screenshot from the Index management page were we can see the disk sizes and document count for each index:

The measured impact on *logs-elastic_agent.filebeat* and *logs-elastic_agent.metricbeat* is:

reduction of ~27% of document count for *logs-elastic_agent.filebeat* with a disk size reduction of ~18%
reduction of ~21% of document count for *logs-elastic_agent.metricbeat* with a disk size reduction of ~16%

Related issues

Relates Reduce the amount the agent logs by default #4252

Use cases

Screenshots

Logs

Questions to ask yourself

How are we going to support this in production?
How are we going to measure its adoption?
How are we going to debug this?
What are the metrics I should take care of?
...

elasticmachine · 2024-04-29T14:42:18Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine · 2024-04-29T14:42:18Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

mergify · 2024-04-29T14:42:57Z

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

amitkanfer · 2024-04-29T15:05:18Z

@pchila does this really close #4252 ? or only address this suggestion by @cmacknz ?

pchila · 2024-04-29T15:13:54Z

@pchila does this really close #4252 ? or only address this suggestion by @cmacknz ?

At this stage it addresses @cmacknz 's suggestion and removes some unnecessary logs, not sure if we want to stop here for this PR or we are going deeper, it depends on the measurement of how much data we are saving as it is and general 8.14 timeframe... Will change the "Closes" to "Relates"

pierrehilbert · 2024-04-30T07:22:01Z

I agree here, we shouldn't try to close #4252 but try to get as many improvement as possible for 8.14 and then continue for the next versions what we won't have time to do in that timeframe.

…ison

elastic-sonarqube · 2024-05-03T06:59:29Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
9.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

cmacknz

I enrolled an agent built from this branch and observed 30s metrics in the local agent logs but not in Fleet as expected.

pierrehilbert · 2024-05-06T11:12:28Z

@cmacknz any objections with backporting this in 8.14?

cmacknz · 2024-05-06T15:54:43Z

No objections to backporting.

* set intermediate verification error logs to debug * Drop non-zero metrics periodic logs in monitoring config * Add script for elastic-agent logs and metrics disk size comparison * changelog (cherry picked from commit 584713c)

* set intermediate verification error logs to debug * Drop non-zero metrics periodic logs in monitoring config * Add script for elastic-agent logs and metrics disk size comparison * changelog (cherry picked from commit 584713c) Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Apr 29, 2024

pchila self-assigned this Apr 29, 2024

pchila requested a review from a team as a code owner April 29, 2024 14:42

pchila requested review from blakerouse and michel-laterman April 29, 2024 14:42

mergify bot added the backport-skip label Apr 29, 2024

pchila force-pushed the reduce-agent-logs-by-default branch from 7931c95 to cc97fae Compare April 30, 2024 07:06

pchila added 7 commits May 2, 2024 13:48

set intermediate verification error logs to debug

2b6c91d

Drop non-zero metrics periodic logs in monitoring config

a82e45e

Add script for elastic-agent logs and metrics disk size comparison

1c7b68c

fixup! Add script for elastic-agent logs and metrics disk size compar…

2361cda

…ison

Switch ES script to use datastreams

d21e4aa

fixup! Switch ES script to use datastreams

0280818

changelog

2af6f43

pchila force-pushed the reduce-agent-logs-by-default branch from cc97fae to 2af6f43 Compare May 3, 2024 06:31

This was referenced May 3, 2024

Reduce the amount the agent logs by default #4252

Open

Stop collecting the beat state metricset as part of agent monitoring #4153

Closed

cmacknz approved these changes May 3, 2024

View reviewed changes

pchila merged commit 584713c into elastic:main May 6, 2024
9 checks passed

cmacknz added backport-v8.14.0 Automated backport with mergify and removed backport-skip labels May 6, 2024

mergify bot mentioned this pull request May 6, 2024

[8.14](backport #4633) Reduce agent logs by default #4682

Merged

3 tasks

cmacknz mentioned this pull request Dec 13, 2024

agent.logging.metrics options for stand-alone agent not passed through to beats components #3011

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce agent logs by default #4633

Reduce agent logs by default #4633

pchila commented Apr 29, 2024 •

edited

Loading

elasticmachine commented Apr 29, 2024

elasticmachine commented Apr 29, 2024

mergify bot commented Apr 29, 2024

amitkanfer commented Apr 29, 2024

pchila commented Apr 29, 2024

pierrehilbert commented Apr 30, 2024

elastic-sonarqube bot commented May 3, 2024

cmacknz left a comment

pierrehilbert commented May 6, 2024

cmacknz commented May 6, 2024

Reduce agent logs by default #4633

Reduce agent logs by default #4633

Conversation

pchila commented Apr 29, 2024 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

elasticmachine commented Apr 29, 2024

elasticmachine commented Apr 29, 2024

mergify bot commented Apr 29, 2024

amitkanfer commented Apr 29, 2024

pchila commented Apr 29, 2024

pierrehilbert commented Apr 30, 2024

elastic-sonarqube bot commented May 3, 2024

Quality Gate passed

cmacknz left a comment

Choose a reason for hiding this comment

pierrehilbert commented May 6, 2024

cmacknz commented May 6, 2024

pchila commented Apr 29, 2024 •

edited

Loading