Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 4 commands for sentinel and update most test cases and json files #789

Merged

Conversation

hwware
Copy link
Member

@hwware hwware commented Jul 15, 2024

Add 4 new commands for Sentinel (reference #36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note:

  1. runtest-sentinel pass all test cases

  2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3 sentinel
    there are 4 steps in this test scenario:
    step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and then new primary can be voted successfully.
    step 2: replace sentinel 1 with new sentinel bin file, and then shutdown primary, and then another new primary can be voted successfully
    step 3: replace sentinel 2 with new sentinel bin file, and then shutdown primary, and then another new primary can be voted successfully
    step 4: replace sentinel 3 with new sentinel bin file, and then shutdown primary, and then another new primary can be voted successfully

    We can see, even mixed version sentinel running, whole system still works.

@hwware hwware changed the title Update 4 commands json file and update most test cases Add 4 commands for sentinel and update most test cases and json files Jul 15, 2024
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 70.30%. Comparing base (6cb86ff) to head (87c9469).
Report is 65 commits behind head on unstable.

Files with missing lines Patch % Lines
src/sentinel.c 0.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #789      +/-   ##
============================================
- Coverage     70.33%   70.30%   -0.04%     
============================================
  Files           112      112              
  Lines         61489    61491       +2     
============================================
- Hits          43249    43230      -19     
- Misses        18240    18261      +21     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/sentinel.c 0.00% <0.00%> (ø)

... and 6 files with indirect coverage changes

@hwware hwware force-pushed the update-sentinel-commands-with-master-keyword branch from 142a1ad to 737cd6a Compare July 16, 2024 18:13
@hwware hwware marked this pull request as ready for review July 16, 2024 18:49
src/sentinel.c Outdated
@@ -4483,7 +4485,7 @@ void sentinelAskPrimaryStateToOtherSentinels(sentinelValkeyInstance *primary, in
/* Ask */
ll2string(port, sizeof(port), primary->addr->port);
retval = redisAsyncCommand(
ri->link->cc, sentinelReceiveIsPrimaryDownReply, ri, "%s is-master-down-by-addr %s %s %llu %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old sentinel does not recognize the new command, but i suppose it is ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, once we update this part, all sentinels should run on the same version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a rolling upgrade scenario for sentinels where the user runs mixed versions of sentinels temporarily?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is. For example, this Helm chart https://github.com/bitnami/charts/tree/main/bitnami/valkey sets up e.g. a k8s StatefulSet with three pods running a sentinel and a valkey container each. During a RollingUpdate of the StatefulSet, there will be mixed versions of sentinel and valkey temporarily. The sentinels will still need to find a quorum and initiate failovers during this time (if the primary gets updated)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. @hwware I think we should leave this command as it is. Rolling upgrades can take time and we need to be resilient to any failures during this transitional period.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. @hwware I think we should leave this command as it is. Rolling upgrades can take time and we need to be resilient to any failures during this transitional period.

Agree with you. Because @gmbnomis provides me a real world case, let us just leave these commands as they are, util I find a solution to solve the problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmbnomis @PingXie I just update the top comment, and describe a test scenario with mixed version sentinel. Please let me if you have more concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, sentinel accepts the new command variants, but it still uses the old ones internally? (which is "IS-MASTER-DOWN-BY-ADDR" only, if I get this correctly)

That looks reasonable. We could get rid of the old forms in some later release when we can be sure that all sentinel versions that are supported during an upgrade understand the new form.

Thank you for explicitly testing the mixed sentinel version upgrade scenario. Highly appreciated!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, sentinel accepts the new command variants, but it still uses the old ones internally? (which is "IS-MASTER-DOWN-BY-ADDR" only, if I get this correctly)

That looks reasonable. We could get rid of the old forms in some later release when we can be sure that all sentinel versions that are supported during an upgrade understand the new form.

Thank you for explicitly testing the mixed sentinel version upgrade scenario. Highly appreciated!

I keep the IS-MASTER-DOWN-BY-ADDR only internal, for user, we still encourage them to use new command format. Thanks

@@ -57,16 +57,16 @@ test "SENTINEL PENDING-SCRIPTS returns the information about pending scripts" {
}

test "SENTINEL PRIMARIES returns a list of monitored primaries" {
assert_match "*mymaster*" [S 0 SENTINEL MASTERS]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we also need some dummy tests that can cover the old commands, otherwise the reply-schemas-validator CI will fail (and we also need it to make sure the old commands can work).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check https://github.com/hwware/valkey/actions/runs/9962102363/job/27525129293, it looks like reply-schemas-validator CI can accept all changes, and it passes. But adding some tests to cover old commands is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have somewhat forgotten the logic, but it does exist, maybe it's just not as strict now

https://github.com/hwware/valkey/actions/runs/9962102363/job/27525129293#step:11:70

WARNING! The following commands were not hit at all:
  acl|cat
  acl|dryrun
  acl|genpass
  acl|getuser
  acl|help
  acl|list
  acl|load
  acl|log

Copy link
Member Author

@hwware hwware Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks BinBin,I find 5 sentinel deprecated commands are not hit as below
image

But I am curious why some many commands are not hit? it is normal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is because you just trigger a sentinel-test, it does not run the other tests, so those commands are not hit

@@ -200,7 +200,7 @@ test "SENTINEL RESET can resets the primary" {
for {set j 0} {$j < 10} {incr j} {
assert_equal 1 [S 0 SENTINEL RESET mymaster]
set res1 [llength [S 0 SENTINEL SENTINELS mymaster]]
set res2 [llength [S 0 SENTINEL SLAVES mymaster]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep it, i believe i add this for the old name coverage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I will update it and add the comment for old name coverage.

@enjoy-binbin enjoy-binbin added the major-decision-pending Major decision pending by TSC team label Jul 18, 2024
@hwware hwware force-pushed the update-sentinel-commands-with-master-keyword branch 3 times, most recently from 969ecb1 to ef82e8e Compare July 18, 2024 20:49
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with just one pending question.

src/sentinel.c Outdated
@@ -4483,7 +4485,7 @@ void sentinelAskPrimaryStateToOtherSentinels(sentinelValkeyInstance *primary, in
/* Ask */
ll2string(port, sizeof(port), primary->addr->port);
retval = redisAsyncCommand(
ri->link->cc, sentinelReceiveIsPrimaryDownReply, ri, "%s is-master-down-by-addr %s %s %llu %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a rolling upgrade scenario for sentinels where the user runs mixed versions of sentinels temporarily?

@hwware
Copy link
Member Author

hwware commented Jul 23, 2024

LGTM overall with just one pending question.

According to the latest modification, there is no rolling upgrade scenario because for the redisAsyncCommand in old version, it will send is-master-down-by-addr command, however for new version sentinel, it will send is-primary-down-by-addr.

In the case, if one sentinel node just upgrades, and it is "lucky" it detects the primary down, then it will send command "SENTINEL IS-PRIMARY-DOWN-BY-ADDR", however, old version sentinel can not recognize this command. Failover will not happen.
Thus, when clients decide to upgrade sentinel version, they should upgrade all bin files in all sentinel nodes in case the mentioned scenario happens. But from my personal experience, it happens with very low probability as the below reasons:

  1. After sentinel bin file is replaced, sentinel starts with very short time, there is no data loaded.
  2. The sentinel conf file no changes, and since the old version sentinel and new version sentinel keep high compatible, it does not need to check primary nodes and replicas node. It just need to send PING message as usual.

@PingXie @zuiderkwast

@gmbnomis
Copy link
Contributor

In the case, if one sentinel node just upgrades, and it is "lucky" it detects the primary down, then it will send command "SENTINEL IS-PRIMARY-DOWN-BY-ADDR", however, old version sentinel can not recognize this command. Failover will not happen. ...

@hwware Please see my comment #789 (comment), which is a "real world" example of a rolling upgrade scenario which relies on failover to happen.

@hwware
Copy link
Member Author

hwware commented Jul 31, 2024

In the case, if one sentinel node just upgrades, and it is "lucky" it detects the primary down, then it will send command "SENTINEL IS-PRIMARY-DOWN-BY-ADDR", however, old version sentinel can not recognize this command. Failover will not happen. ...

@hwware Please see my comment #789 (comment), which is a "real world" example of a rolling upgrade scenario which relies on failover to happen.

Thanks for your information. Let us just leave these commands as they are now.

@hwware hwware force-pushed the update-sentinel-commands-with-master-keyword branch from ef82e8e to 211d133 Compare August 12, 2024 20:01
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
@hwware hwware force-pushed the update-sentinel-commands-with-master-keyword branch 2 times, most recently from ecd4b2b to 87c9469 Compare August 13, 2024 19:13
@madolson
Copy link
Member

I assume the major decision is the deprecation and renaming, which I am OK with. So just indicating my +1 on that.

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Wen!

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understand correctly:

The sentinels still use the MASTER commands (GET-MASTER-ADDR-BY-NAME, etc.) when they talk to each other, so we can support a rolling upgrade of sentinels.

In the future (Valkey 9?) the sentinels can use the PRIMARY variant of the commands instead. We still can't delete the MASTER commands, because we still want to support rolling upgrade of Sentinel from 8 to 9. Even later (Valkey 10?) we can delete the MASTER variants.

Correct?

@hwware
Copy link
Member Author

hwware commented Aug 16, 2024

Just to be sure I understand correctly:

The sentinels still use the MASTER commands (GET-MASTER-ADDR-BY-NAME, etc.) when they talk to each other, so we can support a rolling upgrade of sentinels.

Yes, when the communication between 2 sentinel nodes, the MASTER command is used.

In the future (Valkey 9?) the sentinels can use the PRIMARY variant of the commands instead. We still can't delete the MASTER commands, because we still want to support rolling upgrade of Sentinel from 8 to 9. Even later (Valkey 10?) we can delete the MASTER variants.

I will find a way to delete the master command ASAP, maybe in Valkey 8.2 version or Valkey 9. Valkey 10 is too long (I hope not).

Correct?

@hwware hwware merged commit 33c7ca4 into valkey-io:unstable Aug 16, 2024
48 checks passed
@zuiderkwast
Copy link
Contributor

I see no hurry in deleting the deprecated commands, as long as there is a plan. I think we need to keep them as long as we support rolling upgrades from 8.0.

mapleFU pushed a commit to mapleFU/valkey that referenced this pull request Aug 21, 2024
…valkey-io#789)

Add 4 new commands for Sentinel (reference
valkey-io#36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here
https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note:

1. runtest-sentinel pass all test cases
2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3
sentinel
   there are 4 steps in this test scenario:
step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and
then new primary can be voted successfully.
step 2: replace sentinel 1 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 3: replace sentinel 2 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 4: replace sentinel 3 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully

We can see, even mixed version sentinel running, whole system still
works.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
mapleFU pushed a commit to mapleFU/valkey that referenced this pull request Aug 22, 2024
…valkey-io#789)

Add 4 new commands for Sentinel (reference
valkey-io#36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here
https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note:

1. runtest-sentinel pass all test cases
2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3
sentinel
   there are 4 steps in this test scenario:
step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and
then new primary can be voted successfully.
step 2: replace sentinel 1 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 3: replace sentinel 2 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 4: replace sentinel 3 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully

We can see, even mixed version sentinel running, whole system still
works.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
madolson pushed a commit that referenced this pull request Sep 2, 2024
…#789)

Add 4 new commands for Sentinel (reference
#36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here
https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note: 

1. runtest-sentinel pass all test cases
2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3
sentinel
   there are 4 steps in this test scenario: 
step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and
then new primary can be voted successfully.
step 2: replace sentinel 1 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 3: replace sentinel 2 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 4: replace sentinel 3 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
   
We can see, even mixed version sentinel running, whole system still
works.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
madolson pushed a commit that referenced this pull request Sep 3, 2024
…#789)

Add 4 new commands for Sentinel (reference
#36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here
https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note: 

1. runtest-sentinel pass all test cases
2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3
sentinel
   there are 4 steps in this test scenario: 
step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and
then new primary can be voted successfully.
step 2: replace sentinel 1 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 3: replace sentinel 2 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 4: replace sentinel 3 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
   
We can see, even mixed version sentinel running, whole system still
works.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
@enjoy-binbin enjoy-binbin added release-notes This issue should get a line item in the release notes and removed major-decision-pending Major decision pending by TSC team labels Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants