Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some policy updates can cause duplicate Endpoint processes #2008

Closed
danielharada opened this issue Dec 23, 2022 · 5 comments
Closed

Some policy updates can cause duplicate Endpoint processes #2008

danielharada opened this issue Dec 23, 2022 · 5 comments
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@danielharada
Copy link

After some policy updates, the agent will report as unhealthy and errors showing listen tcp 127.0.0.1:6788: bind: address already in use will appear in the agent logs.

Steps to Reproduce:

  1. Create an Agent policy that has the Endpoint integration enabled, the output for the integrations should be Elasticsearch
  2. Create a Logstash output
  3. Switch from an Elasticsearch output to Logstash output for integrations for the Agent policy created in step 1
  4. See that Elastic Agent spawns an additional Endpoint process that causes this same issue.

OR

  1. Create an Agent policy that has the Endpoint integration enabled
  2. Deploy an Elastic Agent using the Agent policy in step 1
  3. Clone the Agent policy created in step 1
  4. Select the deployed in from step 2, reassign the agent to use the cloned policy from step 3.

Once one set of the steps above have been followed, we can check the main Elastic Agent logs and see messages like:

{"log.level":"error","@timestamp":"2022-12-23T21:13:35.641Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-12-23T21:13:35Z - message: Application: endpoint-security--8.4.0[d26c65bf-251a-456b-8880-0c6ec54751b1]: State changed to FAILED: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

One Endpoint process has already bound to 127.0.0.1:6788, and so the other process fails the bind. This will lead to the agent reporting as unhealthy.

Workaround steps:

  1. Remove Endpoint Integration from Agent policy
  2. Wait for change to rollout to Agents (this will stop the "old" Endpoint security process/instance)
  3. Switch to Logstash output
  4. Re-add the Endpoint Integration to the Agent policy
@danielharada danielharada added the bug Something isn't working label Dec 23, 2022
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Jan 4, 2023
@pierrehilbert
Copy link
Contributor

@danielharada: do we still have the problem with the v2 control protocol (8.6+ version)?

@pushkar-todyl
Copy link

Is this going to be in 8.7 release? We are experiencing this issue with 8.5.3.

@cmacknz cmacknz added the QA:Ready For Testing Code is merged and ready for QA to validate label Mar 23, 2023
@cmacknz
Copy link
Member

cmacknz commented Mar 23, 2023

The way the agent starts endpoint was completely changed in 8.6. We need to confirm this bug still exists in that release.

@dikshachauhan-qasource @amolnater-qasource can one of you follow the reproduction steps in the description in the latest 8.7.0 build candidate and report the results?

@amolnater-qasource
Copy link

Hi @cmacknz

We have revalidated this on latest 8.7.0 BC9 kibana cloud environment and had below observations:

1. Create an Agent policy that has the Endpoint integration enabled, the output for the integrations should be Elasticsearch
2. Create a Logstash output
3. Switch from an Elasticsearch output to Logstash output for integrations for the Agent policy created in step 1
4. See that Elastic Agent spawns an additional Endpoint process that causes this same issue.
1. Create an Agent policy that has the Endpoint integration enabled
2. Deploy an Elastic Agent using the Agent policy in step 1
3. Clone the Agent policy created in step 1
4. Select the deployed in from step 2, reassign the agent to use the cloned policy from step 3.
  • We have followed these steps and observed that the agent remains Healthy throughout.
  • No errors related to State changed to FAILED: failed to start connection credentials listener are observed.

Please find below detailed screenshots:
1
2
3
4
5

Logs:
elastic-agent-diagnostics-2023-03-24T09-55-50Z-00.zip

Build details:
VERSION: 8.7 BC9
BUILD: 61093
COMMIT: 8eda067283f541c673beb406ae5480da6dab9296

Please let us know if we are missing anything here.

Thanks!

@cmacknz
Copy link
Member

cmacknz commented Mar 24, 2023

Thanks. The Failed messages are from endpoint attempting to disable and stop the Elastic endpoint service before the install happens, which fails because this is the first time it has been installed on the system.

I'm going to close this since this appears to be working properly now.

@cmacknz cmacknz closed this as completed Mar 24, 2023
@amolnater-qasource amolnater-qasource removed the QA:Ready For Testing Code is merged and ready for QA to validate label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

5 participants