Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Graceful shutdown in jetty-10 #4321

Closed
gregw opened this issue Nov 18, 2019 · 4 comments
Closed

Review Graceful shutdown in jetty-10 #4321

gregw opened this issue Nov 18, 2019 · 4 comments

Comments

@gregw
Copy link
Contributor

gregw commented Nov 18, 2019

Currently jetty-9 has a number of features associated with stopping and graceful shutdown that are somewhat duplicated/orthogonal/confused:

  • All LifeCycle objects have the stop() method, but AbstractLifeCycle objects have a setStopTimeout that imposes a maximum time that stop() can execute for, however this is not a cumulative time for aggregated LifeCycles!
  • We have a Graceful interface that has a Future<Void> shutdown() API that allows for cumulative timeout handling. It is implemented by Connectors, ContextHandler and the StatisticsHandler
  • The ContextHandler has an availability mechanism that includes the states: UNAVAILABLE, STARTING, AVAILABLE, SHUTDOWN. This mechanism is tied to both the LifeCycle stop/start and the Graceful API.
  • The NetworkConnector interface has open() and close() methods that can be invoked independently of start() and stop() to affect availability of the listening port.
  • The AbstractConnector class has the setAccepting(boolean) method that affects the availability of the connector to accept new connections.

All these mechanisms need to be rationalised into a single mechanism. To consider what design that mechanism needs, we need to consider the following active participants in the stop/start/graceful/available conversation:

  • QueuedThreadPool has an atomic mechanism to stop accepting new jobs and a doStop() implementation that waits half a stop timeout for jobs to stop naturally before interrupting remaining threads and waiting another half stop timeout for them to stop. Any jobs left in the queue that are Closeable are closed... which can also be an operation that takes time and could potentially even block.
  • The AbstractConnector.setAccepting(boolean) mechanism use a lock condition to hold up the acceptors to prevent them accepting new connections when not accepting.
  • The AbstractNetworkConnector.shutdown() does not use the setAccepting(boolean) mechanism, but does a more permanent close()` before declaring shutdown complete.
  • The StatisticHandler.shutdown() method is implemented to return a Future that is completed once the active request count goes to zero. The handler does not prevent new requests from being dispatched.
  • The ContextHandler.shutdown() method is implemented using it's availability state, which stops new requests from being accepted by the context by sending a 503 response (which can be redispatched in a ERROR dispatch... but that will then also get a sendError... so perhaps some refinement needed on that response)
  • The ManagedSelector.doStop() currently does an unbounded wait for connections to be closed, followed by an unbounded wait for the selectors to be stopped. Both of those probably should be time limited if possible.
  • Graceful shutdown is currently initiated either on the entire server (connectors, contexts, selectors and threadpool) or on individual contexts during hot undeploys.
@gregw
Copy link
Contributor Author

gregw commented Nov 18, 2019

@sbordet @janbartel @joakime @lachlan-roberts @olamy have a think about this one.... I'm going to ask for some design review before I commit too much code. Feel free to riff your thoughts here.

@gregw
Copy link
Contributor Author

gregw commented Nov 18, 2019

My current thoughts are that we should:

  • remove the stopTimeout from lifecycle. Only components such as server/context that can initiate a graceful shutdown should have a timeout. All stop methods that take time should be made brutal so that graceful should be used if graceful is wanted.
  • make an abstracted available mechanism that can apply to connectors, contexts and threadPool. This should be able to be atomically switched on/off without blocking. This will replace the accepting API in connectors.
  • retain the existing Graceful mechanism that entire purpose is to wait until there is no more activity in the component

To gracefully shutdown a content:

  1. context.setAvailable(true);
  2. context.shutdown().get(timeout); (this assumes a StatisticsHandler or similar is part of the context)
  3. context.stop();

To coordinate the shutdown of a server it is a little more complex:

  1. For all connectors: connector.setAvailable(false) to stop new connections
  2. For all contexts: context.setAvailable(false) to stop new requests on existing connections and to close existing connections
  3. For all handlers that are graceful: handler.shutdown().get(timeout-timespent) to drain out existing requests.
  4. For all selectors: selector.shutdown().get(timeout-timespent) to close existing idle connections
  5. For all connectors: connector.shutdown().get(timeout-timespent) to also wait for closed connections (or is the selector one called from this one?)
  6. The queuedThreadPool.setAvailable(false) to stop taking new tasks
  7. queuedThreadPool.shutdown().get(timeout-timespent) to allow exiting jobs to complete
  8. stop everything

@gregw
Copy link
Contributor Author

gregw commented Nov 18, 2019

The issue with the above is that the QTP graceful shutdown should not be done until everything else is shutdown, as threads may be needed to do graceful closing handshakes etc.
It is tempting to leave the QTP graceful behaviour in doStop, but that breaks the removal of stopTimeout.

I think the solution is to separate out the Graceful API which is something that can be gracefully shutdown from a Shutdown API, which is something that can be shutdown and can have bespoke coordination of shutdown - ie just the Server and Context would implement Shutdown

@joakime
Copy link
Contributor

joakime commented Nov 20, 2019

Do we want to tackle HttpClient as well? (Issue #1445)
And what about WebSocketClient?
Or even active WebSocket (server) connections?

gregw added a commit that referenced this issue Jan 15, 2020
Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 16, 2020
removed stopTimeout from all abstractLifeCycles.  It is on Graceful.LifeCycle, which is only implemented by components that can start a graceful shutdown (eg Server, ContextHandler and QueuedThreadPool)

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 17, 2020
cleanup after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 17, 2020
reinstate other stop tests (more work to do).

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 17, 2020
Fixes for stop test by improving LocalConnector shutdown handling

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 17, 2020
Removed broken test on LocalConnector that is already tested in GracefulStopTest

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 17, 2020
Fixed all stop tests

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 17, 2020
fixed checkstyle

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 20, 2020
No stopTimeout JMX attribute

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 20, 2020
Dump stopTimeout
test with default stopTimeout

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 21, 2020
USe sendError for 503

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 21, 2020
minor cleanups

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 22, 2020
Simplifications after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 28, 2020
after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 29, 2020
after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Jan 30, 2020
* Issue #4321 Refactored Graceful shutdown

removed stopTimeout from all abstractLifeCycles.  It is on Graceful.LifeCycle, which is only implemented by components that can start a graceful shutdown (eg Server, ContextHandler and QueuedThreadPool)

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

cleanup after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

reinstate other stop tests (more work to do).

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

Fixes for stop test by improving LocalConnector shutdown handling

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

Removed broken test on LocalConnector that is already tested in GracefulStopTest

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

Fixed all stop tests

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

fixed checkstyle

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

No stopTimeout JMX attribute

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

Dump stopTimeout
test with default stopTimeout

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

USe sendError for 503

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

minor cleanups

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

Simplifications after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>

* Issue #4321 Refactored Graceful shutdown

after review

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw gregw closed this as completed Sep 21, 2020
risdenk added a commit to markrmiller/solr that referenced this issue Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants