Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce agent logs by default #4633

Merged
merged 7 commits into from
May 6, 2024

Conversation

pchila
Copy link
Member

@pchila pchila commented Apr 29, 2024

What does this PR do?

This PR drops the Non-zero metrics after 30s... logs before they get sent to Elasticsearch.

Why is it important?

The goal is to reduce the amount of elastic-agent monitoring events ingested in Elasticsearch in order to reduce index disk size.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Run an elastic agent with and without this change on a deployment making a note of the start times and elastic-agent id of both.
Using the included ES script we then extract the documents ingested during those runs and store them in dedicated indices.

In my tests I ran 2 elastic agents managed by fleet with a default policy including system integration, first without this change and then including this change, then sliced the first 10 minutes of logs and metrics from startup using the included script.

This is a screenshot from the Index management page were we can see the disk sizes and document count for each index:
Screenshot 2024-05-02 at 18-48-16 Index Management - Elastic

The measured impact on *logs-elastic_agent.filebeat* and *logs-elastic_agent.metricbeat* is:

  • reduction of ~27% of document count for *logs-elastic_agent.filebeat* with a disk size reduction of ~18%
  • reduction of ~21% of document count for *logs-elastic_agent.metricbeat* with a disk size reduction of ~16%

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Apr 29, 2024
@pchila pchila self-assigned this Apr 29, 2024
@pchila pchila requested a review from a team as a code owner April 29, 2024 14:42
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

mergify bot commented Apr 29, 2024

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@amitkanfer
Copy link
Contributor

@pchila does this really close #4252 ? or only address this suggestion by @cmacknz ?

@pchila
Copy link
Member Author

pchila commented Apr 29, 2024

@pchila does this really close #4252 ? or only address this suggestion by @cmacknz ?

At this stage it addresses @cmacknz 's suggestion and removes some unnecessary logs, not sure if we want to stop here for this PR or we are going deeper, it depends on the measurement of how much data we are saving as it is and general 8.14 timeframe... Will change the "Closes" to "Relates"

@pchila pchila force-pushed the reduce-agent-logs-by-default branch from 7931c95 to cc97fae Compare April 30, 2024 07:06
@pierrehilbert
Copy link
Contributor

I agree here, we shouldn't try to close #4252 but try to get as many improvement as possible for 8.14 and then continue for the next versions what we won't have time to do in that timeframe.

@pchila pchila force-pushed the reduce-agent-logs-by-default branch from cc97fae to 2af6f43 Compare May 3, 2024 06:31
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enrolled an agent built from this branch and observed 30s metrics in the local agent logs but not in Fleet as expected.

@pchila pchila merged commit 584713c into elastic:main May 6, 2024
9 checks passed
@pierrehilbert
Copy link
Contributor

@cmacknz any objections with backporting this in 8.14?

@cmacknz
Copy link
Member

cmacknz commented May 6, 2024

No objections to backporting.

@cmacknz cmacknz added backport-v8.14.0 Automated backport with mergify and removed backport-skip labels May 6, 2024
mergify bot pushed a commit that referenced this pull request May 6, 2024
* set intermediate verification error logs to debug

* Drop non-zero metrics periodic logs in monitoring config

* Add script for elastic-agent logs and metrics disk size comparison

* changelog

(cherry picked from commit 584713c)
pierrehilbert pushed a commit that referenced this pull request May 6, 2024
* set intermediate verification error logs to debug

* Drop non-zero metrics periodic logs in monitoring config

* Add script for elastic-agent logs and metrics disk size comparison

* changelog

(cherry picked from commit 584713c)

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.14.0 Automated backport with mergify enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants