Some policy updates can cause duplicate Endpoint processes #2008

danielharada · 2022-12-23T21:34:30Z

Versions tested / reported: 8.4.0, 8.4.2, 8.5.2
Operating System: OpenSUSE 15.3, OpenSUSE 15.4, CentOS 7, Ubuntu 20.04 LTS, and Ubuntu 22.04 LTS
Discuss Forum URL: https://discuss.elastic.co/t/unhealthy-status-yet-sending-events-agent-via-logstash/321801

After some policy updates, the agent will report as unhealthy and errors showing listen tcp 127.0.0.1:6788: bind: address already in use will appear in the agent logs.

Steps to Reproduce:

Create an Agent policy that has the Endpoint integration enabled, the output for the integrations should be Elasticsearch
Create a Logstash output
Switch from an Elasticsearch output to Logstash output for integrations for the Agent policy created in step 1
See that Elastic Agent spawns an additional Endpoint process that causes this same issue.

OR

Create an Agent policy that has the Endpoint integration enabled
Deploy an Elastic Agent using the Agent policy in step 1
Clone the Agent policy created in step 1
Select the deployed in from step 2, reassign the agent to use the cloned policy from step 3.

Once one set of the steps above have been followed, we can check the main Elastic Agent logs and see messages like:

{"log.level":"error","@timestamp":"2022-12-23T21:13:35.641Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-12-23T21:13:35Z - message: Application: endpoint-security--8.4.0[d26c65bf-251a-456b-8880-0c6ec54751b1]: State changed to FAILED: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

One Endpoint process has already bound to 127.0.0.1:6788, and so the other process fails the bind. This will lead to the agent reporting as unhealthy.

Workaround steps:

Remove Endpoint Integration from Agent policy
Wait for change to rollout to Agents (this will stop the "old" Endpoint security process/instance)
Switch to Logstash output
Re-add the Endpoint Integration to the Agent policy

The text was updated successfully, but these errors were encountered:

pierrehilbert · 2023-02-02T09:56:18Z

@danielharada: do we still have the problem with the v2 control protocol (8.6+ version)?

pushkar-todyl · 2023-03-17T22:17:49Z

Is this going to be in 8.7 release? We are experiencing this issue with 8.5.3.

cmacknz · 2023-03-23T21:55:34Z

The way the agent starts endpoint was completely changed in 8.6. We need to confirm this bug still exists in that release.

@dikshachauhan-qasource @amolnater-qasource can one of you follow the reproduction steps in the description in the latest 8.7.0 build candidate and report the results?

amolnater-qasource · 2023-03-24T10:39:08Z

Hi @cmacknz

We have revalidated this on latest 8.7.0 BC9 kibana cloud environment and had below observations:

1. Create an Agent policy that has the Endpoint integration enabled, the output for the integrations should be Elasticsearch
2. Create a Logstash output
3. Switch from an Elasticsearch output to Logstash output for integrations for the Agent policy created in step 1
4. See that Elastic Agent spawns an additional Endpoint process that causes this same issue.

While validating with these steps we have observed that we are unable to add Logstash output on BC9.
For this we have logged an issue at [Fleet]: Unable to add Logstash output under Fleet>Settings tab. kibana#153622

1. Create an Agent policy that has the Endpoint integration enabled
2. Deploy an Elastic Agent using the Agent policy in step 1
3. Clone the Agent policy created in step 1
4. Select the deployed in from step 2, reassign the agent to use the cloned policy from step 3.

We have followed these steps and observed that the agent remains Healthy throughout.
No errors related to State changed to FAILED: failed to start connection credentials listener are observed.

Please find below detailed screenshots:

Logs:
elastic-agent-diagnostics-2023-03-24T09-55-50Z-00.zip

Build details:
VERSION: 8.7 BC9
BUILD: 61093
COMMIT: 8eda067283f541c673beb406ae5480da6dab9296

Please let us know if we are missing anything here.

Thanks!

cmacknz · 2023-03-24T19:11:54Z

Thanks. The Failed messages are from endpoint attempting to disable and stop the Elastic endpoint service before the install happens, which fails because this is the first time it has been installed on the system.

I'm going to close this since this appears to be working properly now.

danielharada added the bug Something isn't working label Dec 23, 2022

cmacknz added the Team:Elastic-Agent Label for the Agent team label Jan 4, 2023

cmacknz added the QA:Ready For Testing Code is merged and ready for QA to validate label Mar 23, 2023

cmacknz closed this as completed Mar 24, 2023

amolnater-qasource removed the QA:Ready For Testing Code is merged and ready for QA to validate label Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some policy updates can cause duplicate Endpoint processes #2008

Some policy updates can cause duplicate Endpoint processes #2008

danielharada commented Dec 23, 2022

pierrehilbert commented Feb 2, 2023

pushkar-todyl commented Mar 17, 2023

cmacknz commented Mar 23, 2023

amolnater-qasource commented Mar 24, 2023

cmacknz commented Mar 24, 2023

Some policy updates can cause duplicate Endpoint processes #2008

Some policy updates can cause duplicate Endpoint processes #2008

Comments

danielharada commented Dec 23, 2022

pierrehilbert commented Feb 2, 2023

pushkar-todyl commented Mar 17, 2023

cmacknz commented Mar 23, 2023

amolnater-qasource commented Mar 24, 2023

cmacknz commented Mar 24, 2023