-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SUNSUBSCRIBE corner case causes client out-of-sync #1066
Comments
Nice! I guess this could also be solved when we introduce atomic slot migration right? |
No, how would that solve it? This is about the subscribed client getting a spontaneous message when the slot migration happens. |
Please note that although cumbersome, the resulting local subscription state could be tracked correctly - receiving SUNSUBSCRIBE notification (regardless if caused by the an active command or due to slot migration) allows client to know that it will no longer receive notifications on the channel in question.
Thus, the above does not really matter, although indeed complicates the unsubscription logic. |
@ikolomi What do you mean it does not really matter? The -CLUSTERDOWN (or MOVED) is still in the reply pipeline. If the client sends a |
Oh I assumed that during atomic slot ownership the command will be blocked until we identify the ownership was migrated so MOVED response will be sent, but you are right that it can still be a problem if we allow unblocking and running the command so the client would get MOVED error which can still be identified as a response for another command. |
BTW I wonder if preventing MOVED error and just reply [sunsubscribe, ch, 0] again would make things better? I mean there is no real point in redirecting unsubscribe messages to a different node IMO. |
Ran, we already get a MOVED today in this race condition. (My example above uses CLUSTERDOWN instead because I deleted the slot instead of migrated it, but it's the only difference.) The client just doesn't understand that this MOVED is the reply to the SUNSUBSCRIBE command it just sent, because there is a 'sunsubscribe' push message in the reply pipeline which arrives to the client first. When the client is subscribed and doesn't do anything, when a slot is migrated we send an 'sunsubscribed' push notification to the client. |
Thank you @zuiderkwast, I understand the point. So the client would probably have a double reroute in the case he was pipelining more requests (like ssubscribe or other commands in case of resp3) right? Probably double moved will also cause the client to perform new topology refresh which in turn can lead to high load on the server if there are many subscribers on the same slot/s. I was only thinking if instead of MOVED we could just return that there are zero subscribers on the channel, but later I also thought it might still break some stateless clients. I currently have no good solution for that. |
Load on the server is not the issue. The client getting out of sync and returning the wrong replies back to the user application is. The best way to avoid this is probably to just never send UNSUBSCRIBE commands. That's what GLIDE does. It seems a bit broken though and it'd be nicer if the behaviour were different, like if every command would always get one in-band reply (in RESP3). That'd have to be client opt-in though. If we do this, then we can send +OK as a reply to the subscribe/unsubscribe commands when successful and the push replies can be handled completely out-of-band as they were intended to. |
To clarify client getting out of sync: An async client sends a constant stream of commands, receives the stream of repsonseses and needs to match each response with the command it corresponds to, to be able to invoke the right callback or otherwise match it with the code that sent each command. That's what GLIDE does (iiuc) and that's what hiredis does in the async API. If a client sends |
I mean that the current subscription state could still be correctly constructed on the client side.
The problematic parts here are
Due to these complications, we dont use unsubscription commands in GLIDE, relying on the transport tear down instead. |
If no channels are subscribed, the server still sends one notification on the form
Yes, this part can still happen even with draining. |
@zuiderkwast I Understand the problem you refer to and thank you for the patience putting up with my questions :) |
When a client is subscribed to a sharded-pubsub channel in cluster mode and the slot is moved to another shard, or deleted, the client receives a spontaneous
sunsubscribe
push message.If a client has just sent a SUNSUBSCRIBE command, the client cannot know if the
sunsubscribe
message is a response to the command or a sponaneous message.In the following scenario, client 1 believes that the
[sunsubscribe, ch, 0]
push message is received as a response to SUNSUBSCRIBE. The CLUSTERDOWN error reply remains to be read and it appears to be out of sync, i.e. to the client it doesn't appear to match a command it has sent. (If the client has sent another command in the pipeline, the CLUSTERDOWN appears to be its reply.)Originally posted by @zuiderkwast in #759 (comment) (but edited)
Test case
This test case passes, i.e. it illustrates what the clients actually see.
The text was updated successfully, but these errors were encountered: