Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop opensearch: Graceful close client connection #16091

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

tmanninger
Copy link

@tmanninger tmanninger commented Sep 26, 2024

Description

Adds support for graceful shutdown of the opensearch server.

New option:
http.graceful_shutdown
If 0, graceful shutdown is Disabled.

Graceful shutdown procedure:
1.) Opensearch will stop accepting new connections
2.) Opensearch will close all Connections after response is sent to the client
3.) After timeout http.graceful_shutdown, Opensearch will close all client connections

I tested this manually:

  • Stop opensearch without any connections
  • Stop opensearch with > 0 connections and wait for timeout / send requests within the timout

Any idea, how can i can create Test- Case for this?

Related Issues

[Resolves #[Issue number to be closed when this PR is merged]
#1304

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: tmanninger <t.manninger@ixolit.com>
Signed-off-by: tmanninger <t.manninger@ixolit.com>
@tmanninger tmanninger changed the title graceful shutdown Stop opensearch: Graceful close client connection Sep 26, 2024
Signed-off-by: tmanninger <t.manninger@ixolit.com>
Copy link
Contributor

❌ Gradle check result for f7b4ddd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@tmanninger
Copy link
Author

❌ Gradle check result for f7b4ddd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

I think the error is not related to my changes.

@owaiskazi19
Copy link
Member

@tmanninger the error is because the backport PR is mer BWC version upgrade one #16086. Once it's merged, you can rebase the change and push again.

@owaiskazi19
Copy link
Member

@tmanninger PR is merged. Please rebase your change and push again

Signed-off-by: tmanninger <t.manninger@ixolit.com>
Copy link
Contributor

❌ Gradle check result for 732d091: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 88b9aaf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@tmanninger
Copy link
Author

Now the test org.opensearch.indices.IndicesRequestCacheCleanupIT.testDynamicStalenessThresholdUpdate failed, looks like also not releated.

Is the change OK?

);

// Set all httpchannels to close gracefully
for (HttpChannel httpChannel : httpChannels) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HttpChannel is CloseableChannel,I believe it should be sufficient to call HttpChannel::close() to gracefully disconnect HTTP clients?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I testet it and yes, i can replace ((DefaultRestChannel) httpChannel).gracefulCloseConnection(); with httpChannel.close(); and it works.
I will remove the gracefulCloseConnection() stuff

Copy link
Collaborator

@reta reta Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what we do already on line 282:

 CloseableChannel.closeChannels(new ArrayList<>(httpChannels), true);

It seems like the current impl does block indefinitely, so we could use the newly introduced SETTING_HTTP_GRACEFUL_SHUTDOWN to time bound it

Copy link
Author

@tmanninger tmanninger Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 CloseableChannel.closeChannels(new ArrayList<>(httpChannels), true);

This close the TCP socket immediately without waiting. This must be the last step of the "shutdown" procedure.
We must close the connection gracefully in DefaultRestChannel::sendResponse() to ensure that the connection will be closed after the client received the response.

Signed-off-by: tmanninger <t.manninger@ixolit.com>
@tmanninger
Copy link
Author

tmanninger commented Sep 27, 2024

I see, "keep-alive" connections are not closed after responding in the shutdown phase...
I will check this weekend or next week.

Signed-off-by: tmanninger <t.manninger@ixolit.com>
Copy link
Contributor

❕ Gradle check result for 73c29d1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 17.39130% with 19 lines in your changes missing coverage. Please review.

Project coverage is 71.93%. Comparing base (7caca26) to head (d308955).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/http/AbstractHttpServerTransport.java 5.88% 15 Missing and 1 partial ⚠️
...java/org/opensearch/http/HttpHandlingSettings.java 50.00% 2 Missing ⚠️
...n/java/org/opensearch/http/DefaultRestChannel.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16091      +/-   ##
============================================
- Coverage     72.04%   71.93%   -0.11%     
+ Complexity    64441    64412      -29     
============================================
  Files          5281     5281              
  Lines        301088   301213     +125     
  Branches      43500    43522      +22     
============================================
- Hits         216918   216683     -235     
- Misses        66438    66724     +286     
- Partials      17732    17806      +74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: tmanninger <t.manninger@ixolit.com>
Signed-off-by: tmanninger <t.manninger@ixolit.com>
Copy link
Contributor

✅ Gradle check result for d308955: SUCCESS

Signed-off-by: tmanninger <t.manninger@ixolit.com>
@tmanninger
Copy link
Author

tmanninger commented Sep 27, 2024

Now i have a new working version (manuell tested).

When i shutdown the opensearch server graceful:

  • Opensearch will not accept new connections
  • Open connections will be closed after the client receives the response
    • "Connection: close" will be added to the response to close http 1 "keep-alive" connections

How can i close http 2 connections? http 2 have a "goaway" signal, but i found no implementation.
Is there a way to add the "goaway" signal after the response?

Copy link
Contributor

❌ Gradle check result for 65df2af: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: tmanninger <t.manninger@ixolit.com>
Copy link
Contributor

❌ Gradle check result for d2c8c6a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

);

// Close connection after the client is getting a response.
handlingSettings.setForceCloseConnection();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @tmanninger , getting to it again just now. So I think this is what we want:

  • try to send the "in-flight" response to the clients within gracefulShutdownMillis timeframe
  • close the connection (forceably or not)

I believe we could make it part of the RestChannel (and leave AbstractHttpServerTransport out of it) by introducing RestChannel::closeGracefuly(long timeout) method and calling it in CloseableChannel.closeChannels (would need some adjustments).

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants