Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document graceful shutdown of net.box connections #2633

Closed
5 tasks done
Tracked by #2646
TarantoolBot opened this issue Jan 26, 2022 · 3 comments · Fixed by #3100
Closed
5 tasks done
Tracked by #2646

Document graceful shutdown of net.box connections #2633

TarantoolBot opened this issue Jan 26, 2022 · 3 comments · Fixed by #3100
Assignees
Labels
feature A new functionality iproto Related to the iproto protocol reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality

Comments

@TarantoolBot
Copy link
Collaborator

TarantoolBot commented Jan 26, 2022

Related dev. PR: tarantool/tarantool#6813

Product: Tarantool
Since: 2.10
Audience/target: developers
Root document:
https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/
https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_ctl/
https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_events/ (when it appears, after #2407)
https://www.tarantool.io/en/doc/latest/dev_guide/internals/box_protocol/ (check after #2408 if anything is to be added)
SME: @ locker

Details

In Tarantool 2.10.0-beta2-78-g2e9cbec3091e, a new system event was introduced, 'box.shutdown'. A server generates this event with the value equal to true when it's asked to exit (os.exit() is called or SIGTERM signal is received). (Essentially, a server simply calls box.broadcast('box.shutdown', true) from a box.ctl.on_shutdown() trigger callback.) As any other event, 'box.shutdown' is broadcasted to all remote watchers subscribed to it (see IPROTO_WATCH). The event is supposed to be used by connectors to implement the graceful shutdown protocol:

  1. Server receives a shutdown request (os.exit() or SIGTERM).
  2. Server broadcasts 'box.shutdown' event with the value set to true.
  3. Server stops accepting new connections.
  4. Server waits for all connections that subscribed to the event to close.
  5. Client receives 'box.shutdown' event with the value true.
  6. Client does its housekeeping needed to gracefully close the connection. (It may send new requests.)
  7. Client closes the connection.
  8. Server exits once all connections that received the 'box.shutdown' event have been closed or a timeout occurs.

The timeout is configured with box.ctl.set_on_shutdown_timeout(). It's set to 3 seconds by default.

The graceful shutdown protocol is implemented by the net.box connector as follows:

  1. Upon receiving a 'box.shutdown' event with the value set to true, a net.box connection invokes user-defined triggers installed with the new connection method, on_shutdown(). The on_shutdown() method has the same API as any other connection method used for installing triggers, for example, on_disconnect(). on_shutdown() triggers are invoked from a new fiber. While on_shutdown() triggers are running, the connection remains active. This means that it's allowed to send new requests from a trigger callback.
  2. After on_shutdown() triggers return, the net.box connection switches to the new graceful_shutdown state. In this state, no new requests are allowed.
  3. Once all in-progress requests have been completed, the net.box connection is closed. To be more precise, it's switched to the error or error_reconnect state, depending on whether reconnect_after connection option is set, with the error message set to "Peer closed", just like it used to without the new graceful_shutdown state, when the server immediately closed the connection on shutdown.

If the server doesn't support the new 'box.shutdown' event (or doesn't support IPROTO_WATCH), on_shutdown() triggers will never be executed and the connection will be abruptly closed by the server.

Don't forget to update the net.box state machine diagram on this page:

initial -> auth -> fetch_schema <-> active

fetch_schema, active -> graceful_shutdown

(any state, on error) -> error_reconnect -> auth -> ...
                                         \
                                          -> error
(any state, but 'error') -> closed

Definition of done

Do this issue after the following:
Document box.watch and box.broadcast
Document IPROTO watchers

  • Document the box.shutdown system event
  • Document box.ctl.set_on_shutdown_timeout()
  • Document conn:on_shutdown()
  • Update the net.box state machine diagram
  • Check the translation
@locker
Copy link
Member

locker commented Jan 26, 2022

Related issue: #2632

@locker
Copy link
Member

locker commented Jan 28, 2022

Hold on! We are currently thinking about reworking the feature implementation, see tarantool/tarantool#6813. I'll update the ticket once we agree how to proceed.

@locker
Copy link
Member

locker commented Feb 2, 2022

tarantool/tarantool#6813 was merged. It reworked the graceful shutdown protocol using watch/event. The updated description is below (I can't update the issue description).

In Tarantool 2.10.0-beta2-78-g2e9cbec3091e a new system event was introduced, 'box.shutdown'. A server generates this event with the value equal to true when it's asked to exit (os.exit() is called or SIGTERM signal is received). (Essentially, a server simply calls box.broadcast('box.shutdown', true) from a box.ctl.on_shutdown() trigger callback.) As any other event, 'box.shutdown' is broadcasted to all remote watchers subscribed to it (see IPROTO_WATCH). The event is supposed to be used by connectors to implement the graceful shutdown protocol:

  1. Server receives a shutdown request (os.exit() or SIGTERM).
  2. Server broadcasts 'box.shutdown' event with the value set to true.
  3. Server stops accepting new connections.
  4. Server waits for all connections that subscribed to the event to close.
  5. Client receives 'box.shutdown' event with the value true.
  6. Client does its house keeping needed to gracefully close the connection. (It may send new requests.)
  7. Client closes the connection.
  8. Server exits once all connections that received the 'box.shutdown' event have been closed or a timeout occurs.

The timeout is configured with box.ctl.set_on_shutdown_timeout(). It's set to 3 seconds by default.

The graceful shutdown protocol is implemented by the net.box connector as follows:

  1. Upon receiving a 'box.shutdown' event with the value set to true, a net.box connection invokes user-defined triggers installed with the new connection method, on_shutdown(). The on_shutdown() method has the same API as any other connection method used for installing triggers, for example, on_disconnect(). on_shutdown() triggers are invoked from a new fiber. While on_shutdown() triggers are running, the connection remains active. This means that it's allowed to send new requests from a trigger callback.
  2. After on_shutdown() triggers return, the net.box connection switches to the new graceful_shutdown state. In this state, no new requests are allowed.
  3. Once all in-progress requests have been completed, the net.box connection is closed. To be more precise, it's switched to the error or error_reconnect state, depending on whether reconnect_after connection option is set, with the error message set to "Peer closed", just like it used to without the new graceful_shutdown state, when the server immediately closed the connection on shutdown.

If the server doesn't support the new 'box.shutdown' event (or doesn't support IPROTO_WATCH), on_shutdown() triggers will never be executed and the connection will be abruptly closed by the server.

Don't forget to update the net.box state machine diagram on this page:

initial -> auth -> fetch_schema <-> active

fetch_schema, active -> graceful_shutdown

(any state, on error) -> error_reconnect -> auth -> ...
                                         \
                                          -> error
(any state, but 'error') -> closed

@patiencedaur patiencedaur added this to the Estimate [@patiencedaur] milestone Jul 8, 2022
@patiencedaur patiencedaur added server [area] Task relates to Tarantool's server (core) functionality reference [location] Tarantool manual, Reference part feature A new functionality 5sp labels Jul 9, 2022
@patiencedaur patiencedaur removed this from the Estimate [@patiencedaur] milestone Jul 14, 2022
@xuniq xuniq self-assigned this Aug 12, 2022
@patiencedaur patiencedaur added 3sp and removed 5sp labels Aug 17, 2022
@patiencedaur patiencedaur added 1sp and removed 3sp labels Aug 30, 2022
xuniq added a commit that referenced this issue Sep 7, 2022
Fixes #2633 

* Add ``box.shutdown`` event, ``box_ctl-on_shutdown_timeout`` function, and net.box method
* Add diagram
* Update .po files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality iproto Related to the iproto protocol reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants