Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decline unsubscribe related command in non-subscribed mode #759

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

hwware
Copy link
Member

@hwware hwware commented Jul 8, 2024

Now, when clients run the unsubscribe, sunsubscribe and punsubscribe commands in the non-subscribed mode, it returns 0.
Indeed this is a bug, we should not allow client run these kind of commands here.

Thus, this PR fixes this bug, but it is a break change for existing clients

Update

This was introduced in Valkey8, and it is a breaking change, this will be revert in #1265

@hwware hwware added the breaking-change Indicates a possible backwards incompatible change label Jul 8, 2024
Copy link

codecov bot commented Jul 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.60%. Comparing base (b8dd4fb) to head (9002627).
Report is 6 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #759      +/-   ##
============================================
- Coverage     70.63%   70.60%   -0.03%     
============================================
  Files           112      112              
  Lines         61509    61512       +3     
============================================
- Hits          43444    43429      -15     
- Misses        18065    18083      +18     
Files Coverage Δ
src/server.c 88.55% <100.00%> (-0.03%) ⬇️

... and 10 files with indirect coverage changes

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. We can discuss the exact error message.

I'm not sure we can call it a bug, but it is at least unexpected behaviour when called with more than one channel. I prefer we decribe it as changing an unexpected behaviour rather than a bugfix.

src/server.c Outdated Show resolved Hide resolved
tests/unit/info.tcl Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
tests/unit/pubsub.tcl Outdated Show resolved Hide resolved
tests/unit/pubsub.tcl Outdated Show resolved Hide resolved
@hwware
Copy link
Member Author

hwware commented Jul 9, 2024

Great. We can discuss the exact error message.

I'm not sure we can call it a bug, but it is at least unexpected behaviour when called with more than one channel. I prefer we decribe it as changing an unexpected behaviour rather than a bugfix.

Whatever it is a bug or unexpected behavior, this is a design error. Thus we need to fix it and make the right turn.

@hwware hwware force-pushed the decline-unsubscribe-related-command branch from caa1874 to cd67cb8 Compare August 16, 2024 14:05
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
@hwware hwware force-pushed the decline-unsubscribe-related-command branch from cd67cb8 to 9002627 Compare August 21, 2024 15:53
@hwware hwware merged commit 959dd34 into valkey-io:unstable Aug 22, 2024
47 checks passed
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Aug 22, 2024
madolson pushed a commit that referenced this pull request Sep 2, 2024
Now, when clients run the unsubscribe, sunsubscribe and punsubscribe
commands in the non-subscribed mode, it returns 0.
Indeed this is a bug, we should not allow client run these kind of
commands here.

Thus, this PR fixes this bug, but it is a break change for existing
clients

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
madolson pushed a commit that referenced this pull request Sep 3, 2024
Now, when clients run the unsubscribe, sunsubscribe and punsubscribe
commands in the non-subscribed mode, it returns 0.
Indeed this is a bug, we should not allow client run these kind of
commands here.

Thus, this PR fixes this bug, but it is a break change for existing
clients

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
@zuiderkwast
Copy link
Contributor

zuiderkwast commented Sep 20, 2024

This change introduced a race condition related to slot migration of sharded channel and at the same time client sending sunsubscribe, which can lead to the client getting out of sync in matching each reply to each command sent.

When a slot is migrated, the shard-channels are automatically usubscribed and clients will receive a spontaneous sunsubscribed message (push in RESP3) for each channel in the migrated slot. If the client also manually unsubscribes at the same time, the client can believe the spontaneous sunsubscribed response is a response to the SUNSUBSCRIBE command, but later it will receive an extra -NOSUB error after this PR. The client will believe this error is the response to the next command sent, such as GET foo, the -NOSUB error is matched with GET foo and returned as the reply to GET. Now the client's mapping of command-to-response is out of sync.

Scenario diagram:

Client                     Primary A       Primary B
  |                          |                |
  |  SUNSUBSCRIBE ch         |                |
  |------------------------->|                |
  |                        +-------------------------+
  |                        |  Slot migration A to B  |
  | [sunsubscribed, ch, 0] +-------------------------+
  |<-------------------------|                |
  |                          |                |
  | Error...                 |                |
  |<-------------------------|                |
  |                          |                |

An error reply is always in-band, i.e. it is always for a command.

If a client ignores errors for a command like subscribe and unsubscribe, then these commands get no in-band replies and one or more push replies. But there is an error, then subscribe/unsibscribe commands receives an in-band reply. Therefore, a correct client which sends [S|P][UN]SUBSCRIBE it must wait for either one push reply to confirm 'subscribed' or 'unsubscribed' or an error, before it can know that the command has been handled. If there are multiple channels, the remaining messages can just be handled as out-of-band.

But the above scenario with -NOSUB error breaks this logic. Now, there is no way for a client to make sure it stays in sync.

Therefore, I think we should maybe revert this PR. Without this PR, the client will instead in this scenario receive two 'sunsubscribed' push messages. Two push messages are easy to handle without getting out of sync.

@bjosv
Copy link
Contributor

bjosv commented Sep 23, 2024

As stated before this breaks clients, one example is hiredis when running with the async API.
In async, hiredis waits for a successful reply to know the servers state and how to handle reply callbacks.
The handling in hiredis is a bit flawed from start, but now it would need to determine the result of an unsubscribe to avoid getting out of sync.

bjosv added a commit to bjosv/libvalkey that referenced this pull request Sep 23, 2024
Perform the punsubscribe before entering non-pubsub mode since Valkey 8
dont allow several UNSUBSCRIBE commands executed under non-pubsub mode.
See valkey-io/valkey#759

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
@zuiderkwast
Copy link
Contributor

I made a test case to verify this. Actually, the corner case exists since before, ever since we added the sponaneous sunsubscribe message (when sharded pubsub was introduced?). After moving a slot and the client has received a sunsubscribe message, the SUNSUBSCRIBE command doesn't get a NOSUB error but a MOVED error, or a CLUSTERDOWN if the slot was deleted.

I simulated it with the following scenario, where Client 1 can be assumed to not have read the first message when it sends the SUNSUBSCRIBE command. Client 1 then believes that the [sunsubscribe, ch, 0] push message is received as a response to SUNSUBSCRIBE. The CLUSTERDOWN error reply remains to be read and it appears to be out of sync, i.e. to the client it doesn't appear to match a command it has sent. (If the client has sent another command in the pipeline, the CLUSTERDOWN appears to be its reply.)

Client 1                Client 2            Primary
  |                          |  DELSLOT        |
  |                          |---------------->|
  |                          |      OK         |
  | [sunsubscribe, ch, 0]    |<----------------|
  |<-------------------------------------------|
  |                          |                 |
  | SUNSUBSCRIBE ch          |                 |
  |------------------------------------------->|
  | -CLUSTERDOWN             |                 |
  |<-------------------------------------------|
  |                          |                 |

I don't see any way for a client to protect itself against this scenario. @bjosv do you?

@bjosv
Copy link
Contributor

bjosv commented Sep 24, 2024

No, I don't think there is a way to protect against this currently..

bjosv added a commit to valkey-io/libvalkey that referenced this pull request Sep 24, 2024
Perform the punsubscribe before entering non-pubsub mode since Valkey 8
dont allow several UNSUBSCRIBE commands executed under non-pubsub mode.
See valkey-io/valkey#759

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
nihohit added a commit to nihohit/redis-rs that referenced this pull request Oct 8, 2024
hwware added a commit that referenced this pull request Nov 8, 2024
…1265)

This PR goal is to revert the changes on PR
#759

Recently, we got some reports that in Valkey 8.0 the PR
#759 (Decline unsubscribe
related command in non-subscribed mode) causes break change.
(#1228)

Although from my thought, call commands "unsubscribeCommand",
"sunsubscribeCommand", "punsubscribeCommand" in request-response mode
make no sense. This is why I created PR
#759

But breaking change is always no good, @valkey-io/core-team How do you
think we revert this PR code changes?

Signed-off-by: hwware <wen.hui.ware@gmail.com>
proost pushed a commit to proost/valkey that referenced this pull request Nov 9, 2024
…alkey-io#1265)

This PR goal is to revert the changes on PR
valkey-io#759

Recently, we got some reports that in Valkey 8.0 the PR
valkey-io#759 (Decline unsubscribe
related command in non-subscribed mode) causes break change.
(valkey-io#1228)

Although from my thought, call commands "unsubscribeCommand",
"sunsubscribeCommand", "punsubscribeCommand" in request-response mode
make no sense. This is why I created PR
valkey-io#759

But breaking change is always no good, @valkey-io/core-team How do you
think we revert this PR code changes?

Signed-off-by: hwware <wen.hui.ware@gmail.com>
zvi-code pushed a commit to zvi-code/valkey-z that referenced this pull request Nov 13, 2024
…alkey-io#1265)

This PR goal is to revert the changes on PR
valkey-io#759

Recently, we got some reports that in Valkey 8.0 the PR
valkey-io#759 (Decline unsubscribe
related command in non-subscribed mode) causes break change.
(valkey-io#1228)

Although from my thought, call commands "unsubscribeCommand",
"sunsubscribeCommand", "punsubscribeCommand" in request-response mode
make no sense. This is why I created PR
valkey-io#759

But breaking change is always no good, @valkey-io/core-team How do you
think we revert this PR code changes?

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Indicates a possible backwards incompatible change release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants