-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VTOrc: Refactor and reload of ephemeral information for remaining recovery functions #10150
Conversation
…n analysis info to be provided for adding unit tests for GetReplicationAnalysis Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…ommon code path for repair functions Signed-off-by: Manan Gupta <manan@planetscale.com>
… have the same recovery function Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@@ -929,7 +929,7 @@ func MakeCoPrimary(instanceKey *InstanceKey) (*Instance, error) { | |||
} | |||
log.Infof("Will make %+v co-primary of %+v", instanceKey, primary.Key) | |||
|
|||
var gitHint OperationGTIDHint = GTIDHintNeutral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a linter fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link the catch-all vtorc issue into this PR?
I went and looked at how we decide on PrimaryHasPrimary
and now I'm having second thoughts about simply removing the replication. Even though the tablet type is PRIMARY, we should make sure that the underlying mysqld is writable as well. That doesn't seem to be something that is considered when setting IsClusterPrimary
to true.
The PRIMARY tablet not being writable is a failure scenario in itself. If the designated primary is read only, it will spawn the |
…ical errors later Signed-off-by: Manan Gupta <manan@planetscale.com>
Description
This is the follow up PR for #10115.
In that PR, we had added the functionality to VTOrc to refresh its ephemeral information before running any cluster operation. But according to our discussions, we should be doing this for all failure scenarios. For example, we see that a tablet has its replication stopped, but we might be using old information and actually it is a BACKUP tablet_type. So, even in these cases, we should not be going ahead with the recovery to start the replication, since it might break other parts of Vitess. This PR addresses that concern and adds this reload of information for the other recovery functions as well.
In this PR, we have also adds some unit tests to VTOrc. We have also refactored the replicationAnalysis codepath to do the locking of the shard and refreshing of the information before we call the recoveryFunction, to prevent repeated code. This PR also deletes any unused failure scenarios that were coming from
orchestrator
legacy code. Vitess doesn't support Co-primary or intermediate-primary configuration, so it is safe to remove these failure scenarios. We anyways didn't have any associated recoveries for them.Moreover, the codepath for
PrimaryHasPrimary
failure has been refactored, to only reset the replication on the primary instance, instead of running a forceDeadPrimary
recovery.Related Issue(s)
Checklist
Deployment Notes