Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add healthcheck for shibboleth-idp in idp-fixture #100369

Merged
merged 7 commits into from
Oct 6, 2023

Conversation

brianseeders
Copy link
Contributor

@brianseeders brianseeders commented Oct 5, 2023

With newer versions of docker-compose, like the version we currently have on Buildkite agents, this fixture fails a significant percentage of the time. It's the same failure described here.

I did a lot of debugging, and here's what I found:

  • If the container is in a restarting state while the docker-compose gradle plugin docker inspects the container, it receives incomplete networking information
  • The plugin caches the information it receives, so it keeps passing around this incomplete information
  • After docker-compose up finishes, the plugin doesn't capture appropriate networking information about the service, nor does it probe it's ports to ensure it started successfully, because of the incomplete information
  • When we check for the information we need, it's missing and we get this error

And, importantly: for some reason, the jetty server exits with exit code 0 immediately upon starting for the first time. So, when docker-compose up is called, the container starts once and exits immediately, and then starts back up again.

I'm guessing that newer versions of docker-compose are just a little faster, and are able to sometimes catch the container in it's restarting state more often. The restart window is pretty small.

So, here's what I've done here:

  • Add a docker-compose healthcheck for the shibboleth-idp service
  • Ensure the gradle plugin passes --wait to docker-compose up, so that it will wait until all of the services are healthy before moving on (it was already essentially doing this with the port probing, this just moves it to docker-compose)
  • Allow jetty to restart once inside the container, rather than having the entire container restart, if it exits quickly after start. I had to do this, because docker-compose will immediately consider the container unhealthy and stop waiting if the container restarts during the healthcheck period.

Here's what the Jetty logs look like without any changes:

{"log":"Running Jetty: \n","stream":"stdout","time":"2023-10-04T20:20:21.323711544Z"}
{"log":"MKDIR: ${jetty.base}/logs\n","stream":"stderr","time":"2023-10-04T20:20:21.602355421Z"}
{"log":"INFO: Base directory was modified\n","stream":"stderr","time":"2023-10-04T20:20:21.602788358Z"}
{"log":"Running Jetty: \n","stream":"stdout","time":"2023-10-04T20:20:22.191547668Z"}

The server exits with code 0 right after Base directory was modified. The final line is after the container restarts, and then Jetty continues normally.

@brianseeders brianseeders added >non-issue :Delivery/Tooling Developer tooliing and automation buildkite-opt-in Opts your PR into Buildkite instead of Jenkins v8.12.0 labels Oct 5, 2023
@brianseeders brianseeders force-pushed the fix-idp-fixture-issue branch from c298cf4 to f5dd550 Compare October 5, 2023 20:38
@brianseeders brianseeders force-pushed the fix-idp-fixture-issue branch from f5dd550 to 9c9b102 Compare October 5, 2023 20:49
@brianseeders brianseeders marked this pull request as ready for review October 5, 2023 22:08
@elasticsearchmachine elasticsearchmachine added the Team:Delivery Meta label for Delivery team label Oct 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@brianseeders brianseeders added v8.11.1 v7.17.15 v8.10.4 and removed Team:Delivery Meta label for Delivery team labels Oct 5, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Delivery Meta label for Delivery team label Oct 5, 2023
@brianseeders brianseeders merged commit 003912b into main Oct 6, 2023
1 check passed
@brianseeders brianseeders deleted the fix-idp-fixture-issue branch October 6, 2023 17:53
brianseeders added a commit to brianseeders/elasticsearch that referenced this pull request Oct 6, 2023
brianseeders added a commit to brianseeders/elasticsearch that referenced this pull request Oct 6, 2023
brianseeders added a commit to brianseeders/elasticsearch that referenced this pull request Oct 6, 2023
(cherry picked from commit 003912b)

# Conflicts:
#	x-pack/test/idp-fixture/build.gradle
@brianseeders
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.11
8.10
7.17

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buildkite-opt-in Opts your PR into Buildkite instead of Jenkins :Delivery/Tooling Developer tooliing and automation >non-issue Team:Delivery Meta label for Delivery team v7.17.15 v8.10.4 v8.11.1 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants