Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28260

blakerouse · 2021-10-05T18:05:59Z

What does this PR do?

It allows the metrics endpoint to run during Fleet Server bootstrap mode. Adds timeouts (including negative for indefinite) for waiting on the Elastic Agent daemon and the Fleet Server bootstrap process.

Why is it important?

This is needed by Cloud to allow it to check the status of the Elastic Agent even when Fleet Server cannot complete bootstrap process. Cloud will set the timeout to be indefinite and the system will only check every 10 mins after the exponential backoff to see if it should continue.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Closes [elastic-agent] Elastic Agent shuts down when Fleet Server is unhealthy #28209

mergify · 2021-10-05T18:06:05Z

This pull request does not have a backport label. Could you fix it @blakerouse? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

elasticmachine · 2021-10-05T18:08:24Z

Pinging @elastic/agent (Team:Elastic-Agent)

elasticmachine · 2021-10-05T18:08:25Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

elasticmachine · 2021-10-05T18:10:28Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Duration: 86 min 52 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.

simitt · 2021-10-06T10:15:22Z

@andresrc are you ok with backporting this as a fix to 7.15?

simitt

Generally looks good to me. I also want to test on ECE though, if you could hold back with merging until then.

blakerouse · 2021-10-06T19:35:43Z

/package

jlind23 · 2021-10-07T07:42:31Z

@simitt @blakerouse did you have a chance to test it yet?

simitt · 2021-10-11T11:50:11Z

I tested and created elastic/fleet-server#763 as a follow up as the observed behavior was not quite the expected one, and the agent/fleet-server were very noisily logging the same errors.
Also, with the changes in this PR, the agent would always also return a fleet-server process with a pid in the /processes response, althought the fleet-server is not healthy.

blakerouse · 2021-10-11T16:01:27Z

I have the fix for elastic/fleet-server#763 here elastic/fleet-server#768. That will provide the behavior we need for this to work properly.

…meouts for Fleet Server.

…mode.

blakerouse · 2021-10-12T17:56:55Z

/package

mergify · 2021-10-13T09:02:15Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b http-metrics-in-bootstrap upstream/http-metrics-in-bootstrap
git merge upstream/master
git push upstream http-metrics-in-bootstrap

simitt · 2021-10-13T15:46:28Z

I retested with the fleet-server fix, and the agent and fleet-server work as expected now on cloud. The healthcheck endpoint is immediately exposed, the container is considered healthy, while fleet-server is still trying to start up. The agent returns status: STARTING for the fleet-server.

fleet-server still logs every ~5sec that it is waiting for the policy, but the agent logging is pretty silent.

…meouts for Fleet Server. (#28260) * Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. * Add changelog. * Add the persistent agent configuration to the fleet.yml in bootstrap mode. * Fix format issues. (cherry picked from commit 15366ff)

…meouts for Fleet Server. (#28260) (#28445) * Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. * Add changelog. * Add the persistent agent configuration to the fleet.yml in bootstrap mode. * Fix format issues. (cherry picked from commit 15366ff) Co-authored-by: Blake Rouse <blake.rouse@elastic.co>

…meouts for Fleet Server. (#28260) (#28444) * Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. * Add changelog. * Add the persistent agent configuration to the fleet.yml in bootstrap mode. * Fix format issues. (cherry picked from commit 15366ff) Co-authored-by: Blake Rouse <blake.rouse@elastic.co>

…meouts for Fleet Server. (elastic#28260) * Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. * Add changelog. * Add the persistent agent configuration to the fleet.yml in bootstrap mode. * Fix format issues.

blakerouse added Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Oct 5, 2021

blakerouse self-assigned this Oct 5, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 5, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 5, 2021

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Oct 5, 2021

blakerouse added backport-v7.15.0 Automated backport with mergify backport-v7.16.0 Automated backport with mergify backport-v8.0.0 Automated backport with mergify labels Oct 5, 2021

mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Oct 5, 2021

blakerouse marked this pull request as ready for review October 5, 2021 18:08

simitt approved these changes Oct 6, 2021

View reviewed changes

blakerouse added 3 commits October 12, 2021 08:44

Allow HTTP metrics to run in bootstrap mode. Add ability to adjust ti…

92d6f86

…meouts for Fleet Server.

Add changelog.

8af1652

Add the persistent agent configuration to the fleet.yml in bootstrap …

68631ae

…mode.

blakerouse force-pushed the http-metrics-in-bootstrap branch from a29e234 to 68631ae Compare October 12, 2021 12:44

Merge branch 'master' into http-metrics-in-bootstrap

763bafc

Fix format issues.

c316133

blakerouse merged commit 15366ff into elastic:master Oct 14, 2021

blakerouse deleted the http-metrics-in-bootstrap branch October 14, 2021 13:05

mergify bot mentioned this pull request Oct 14, 2021

[master](backport #28260) Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28443

Closed

mergify bot mentioned this pull request Oct 14, 2021

[7.x](backport #28260) Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28444

Merged

mergify bot mentioned this pull request Oct 14, 2021

[7.15](backport #28260) Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28445

Merged

jlind23 mentioned this pull request Oct 18, 2021

Improve Agent container cmd and fleet-server logs #28492

Closed

dedemorton mentioned this pull request Dec 6, 2021

Add 7.16 release notes for Elastic Agent and Fleet elastic/observability-docs#1319

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28260

Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28260

blakerouse commented Oct 5, 2021 •

edited

Loading

mergify bot commented Oct 5, 2021

elasticmachine commented Oct 5, 2021

elasticmachine commented Oct 5, 2021

elasticmachine commented Oct 5, 2021 •

edited by jenkins-beats-ci bot

Loading

Build stats

simitt commented Oct 6, 2021

simitt left a comment

blakerouse commented Oct 6, 2021

jlind23 commented Oct 7, 2021

simitt commented Oct 11, 2021

blakerouse commented Oct 11, 2021

blakerouse commented Oct 12, 2021

mergify bot commented Oct 13, 2021

simitt commented Oct 13, 2021

Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28260

Allow HTTP metrics to run in bootstrap mode. Add ability to adjust timeouts for Fleet Server. #28260

Conversation

blakerouse commented Oct 5, 2021 • edited Loading

What does this PR do?

Why is it important?

Checklist

Related issues

mergify bot commented Oct 5, 2021

elasticmachine commented Oct 5, 2021

elasticmachine commented Oct 5, 2021

elasticmachine commented Oct 5, 2021 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

❕ Flaky test report

🤖 GitHub comments

simitt commented Oct 6, 2021

simitt left a comment

Choose a reason for hiding this comment

blakerouse commented Oct 6, 2021

jlind23 commented Oct 7, 2021

simitt commented Oct 11, 2021

blakerouse commented Oct 11, 2021

blakerouse commented Oct 12, 2021

mergify bot commented Oct 13, 2021

simitt commented Oct 13, 2021

blakerouse commented Oct 5, 2021 •

edited

Loading

elasticmachine commented Oct 5, 2021 •

edited by jenkins-beats-ci bot

Loading