-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VSR/Client: Add vsr.Client.register()
#1946
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Context Clients (at the level above `vsr.Client`) are responsible for automatically coalescing user operations into batches. The maximum batch size is currently `message_body_size_max`. In order to minimize replica memory usage when maximum throughput is not needed (e.g. during development), the replica is going to support running with a lower, _runtime-defined_ batch size limit. Since this batch size limit impacts the clients, the replica must communicate it to clients. This will occur via the result of the `register` request. (Note that this is not implemented in this PR – this PR is a prerequisite.) Previously, `vsr.Client` would send its `command=request operation=register` message when the first of the following occurred: 1. `vsr.Client.request()` is called, or 2. the client receives a `pong_client` message. But that means that the first batch created by the client would potentially precede the client actually knowing the batch limit. # Solution Add `vsr.Client.register(callback, user_data)`. `vsr.Client.request()` cannot be called until `.register()` has invoked its callback. `tb_client` invoked `vsr.Client.register` automatically during its setup – no waiting until `request()`/`on_pong()`. It will refrain from generating/sending any batches until the register reply arrives. # Future Work (Maybe?) The above opens up some nice possibilities for UX: - If `.register` takes too long, we could bubble up an exception/warning to the application suggesting that they might have misconfigured TB (e.g. wrong host/port/cluster_id). (Right now the client just hangs silently forever, which is not a great developer experience.) - We could expose `.register()`, to allow developers to (optionally) wait until the connection is established before they start generating work – since any generated work would just be buffered in memory until `register` arrives.
All clients begin to register when the simulation starts. But if one is partitioned/slow, then it might still be trying to register even after all of the simulator's `requests_max` have been completed. (Register messages are not counted by `requests_max`.)
sentientwaffle
force-pushed
the
dj-vsr-client-register
branch
from
May 13, 2024 16:05
1c61eba
to
068dca6
Compare
batiati
approved these changes
May 28, 2024
@@ -30,7 +30,6 @@ pub const Options = union(vsr.ProcessType) { | |||
sum += constants.replicas_max; // Connection.recv_message | |||
// Connection.send_queue: | |||
sum += constants.replicas_max * constants.connection_send_queue_max_client; | |||
sum += 1; // Client.register_inflight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Clients (at the level above
vsr.Client
) are responsible for automatically coalescing user operations into batches. The maximum batch size is currentlymessage_body_size_max
.In order to minimize replica memory usage when maximum throughput is not needed (e.g. during development), the replica is going to support running with a lower, runtime-defined batch size limit. Since this batch size limit impacts the clients, the replica must communicate it to clients. This will occur via the result of the
register
request. (Note that this is not implemented in this PR – this PR is a prerequisite.)Previously,
vsr.Client
would send itscommand=request operation=register
message when the first of the following occurred:vsr.Client.request()
is called, orpong_client
message.But that means that the first batch created by the client would potentially precede the client actually knowing the batch limit.
Solution
Add
vsr.Client.register(callback, user_data)
.vsr.Client.request()
cannot be called until.register()
has invoked its callback.tb_client
invokedvsr.Client.register
automatically during its setup – no waiting untilrequest()
/on_pong()
. It will refrain from generating/sending any batches until the register reply arrives.Future Work (Maybe?)
The above opens up some nice possibilities for UX of the client API:
.register
takes too long, we could bubble up an exception/warning to the application suggesting that they might have misconfigured TB (e.g. wrong host/port/cluster_id). (Right now the client just hangs silently forever, which is not a great developer experience.).register()
, to allow developers to (optionally) wait until the connection is established before they start generating work – since any generated work would just be buffered in memory untilregister
arrives.Note that CI (replica_test.zig
) is failing on this branch. This is semi-unrelated; it is addressed in a separate PR.