Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

EnableMasterSSL with graceful-master-takeover-auto errors #1279

Closed
dtest opened this issue Dec 10, 2020 · 1 comment
Closed

EnableMasterSSL with graceful-master-takeover-auto errors #1279

dtest opened this issue Dec 10, 2020 · 1 comment

Comments

@dtest
Copy link
Contributor

dtest commented Dec 10, 2020

On orchestrator 3.2.3, doing a graceful-master-takeover-auto when AllowTLS is set for the instance will throw an error that replication couldn't be started because the replication threads are already running:

$ orchestrator-client -c topology -a $(orchestrator-client -c clusters)
mysql1:3306   [0s,ok,5.7.32-log,rw,ROW,>>,GTID]
+ mysql2:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]
+ mysql3:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]

$ orchestrator-client -c graceful-master-takeover-auto -i mysql1 -d mysql2
EnableMasterSSL: Cannot enable SSL replication on mysql1:3306 because replication threads are not stopped
{ "Id": 1, "UID": "1607638055531065398:a0a3933a4dd565b60e3becf7cd4c1b6b27060e69c59a862a4b06e0ebcec5f744", "AnalysisEntry": { "AnalyzedInstanceKey": { "Hostname": "mysql1", "Port": 3306 }, "AnalyzedInstanceMasterKey": { "Hostname": "", "Port": 0 }, "ClusterDetails": { "ClusterName": "mysql1:3306", "ClusterAlias": "mysql1:3306", "ClusterDomain": "", "CountInstances": 3, "HeuristicLag": 0, "HasAutomatedMasterRecovery": true, "HasAutomatedIntermediateMasterRecovery": true }, "AnalyzedInstanceDataCenter": "", "AnalyzedInstanceRegion": "", "AnalyzedInstancePhysicalEnvironment": "", "AnalyzedInstanceBinlogCoordinates": { "LogFile": "8141a32e30e9-bin.000003", "LogPos": 644, "Type": 0 }, "IsMaster": true, "IsReplicationGroupMember": false, "IsCoMaster": false, "LastCheckValid": true, "LastCheckPartialSuccess": true, "CountReplicas": 1, "CountValidReplicas": 1, "CountValidReplicatingReplicas": 1, "CountReplicasFailingToConnectToMaster": 0, "CountDowntimedReplicas": 0, "ReplicationDepth": 0, "Replicas": [ { "Hostname": "mysql2", "Port": 3306 } ], "SlaveHosts": [ { "Hostname": "mysql2", "Port": 3306 } ], "IsFailingToConnectToMaster": false, "Analysis": "DeadMaster", "Description": "", "StructureAnalysis": null, "IsDowntimed": false, "IsReplicasDowntimed": false, "DowntimeEndTimestamp": "", "DowntimeRemainingSeconds": 0, "IsBinlogServer": false, "PseudoGTIDImmediateTopology": false, "OracleGTIDImmediateTopology": true, "MariaDBGTIDImmediateTopology": false, "BinlogServerImmediateTopology": false, "SemiSyncMasterEnabled": false, "SemiSyncMasterStatus": false, "SemiSyncMasterWaitForReplicaCount": 0, "SemiSyncMasterClients": 0, "CountSemiSyncReplicasEnabled": 0, "CountLoggingReplicas": 1, "CountStatementBasedLoggingReplicas": 0, "CountMixedBasedLoggingReplicas": 0, "CountRowBasedLoggingReplicas": 1, "CountDistinctMajorVersionsLoggingReplicas": 1, "CountDelayedReplicas": 0, "CountLaggingReplicas": 0, "IsActionableRecovery": true, "ProcessingNodeHostname": "6235160f51df", "ProcessingNodeToken": "690e32e28307d9520449c66844e0003960815b84d5eddc64bdb811fcf14e2f06", "CountAdditionalAgreeingNodes": 0, "StartActivePeriod": "", "SkippableDueToDowntime": false, "GTIDMode": "ON", "MinReplicaGTIDMode": "ON", "MaxReplicaGTIDMode": "ON", "MaxReplicaGTIDErrant": "", "CommandHint": "graceful-master-takeover", "IsReadOnly": false }, "SuccessorKey": { "Hostname": "mysql2", "Port": 3306 }, "SuccessorAlias": "", "IsActive": false, "IsSuccessful": true, "LostReplicas": [], "ParticipatingInstanceKeys": [], "AllErrors": [], "RecoveryStartTimestamp": "", "RecoveryEndTimestamp": "", "ProcessingNodeHostname": "", "ProcessingNodeToken": "", "Acknowledged": false, "AcknowledgedAt": "", "AcknowledgedBy": "", "AcknowledgedComment": "", "LastDetectionId": 0, "RelatedRecoveryId": 0, "Type": "MasterRecovery", "RecoveryType": "MasterRecoveryGTID" }

$ orchestrator-client -c topology -a $(orchestrator-client -c clusters)
mysql2:3306   [0s,ok,5.7.32-log,rw,ROW,>>,GTID]
+ mysql1:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]
+ mysql3:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]

Here are the relevant logs:

2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 62: write-recovery-step
2020-12-10 22:07:36 DEBUG PostponedFunctionsContainer: waiting on 1 postponed functions
2020-12-10 22:07:36 DEBUG PostponedFunctionsContainer: done waiting
2020-12-10 22:07:36 INFO topology_recovery: Executed 1 postponed functions
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 63: write-recovery-step
2020-12-10 22:07:36 INFO topology_recovery: Executed postponed functions: regroup-replicas-gtid mysql2:3306
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 64: write-recovery-step
2020-12-10 22:07:36 DEBUG ChangeMasterTo: will attempt changing master on mysql1:3306 to mysql2:3306, b83336727d97-bin.000001:3075084
2020-12-10 22:07:36 INFO ChangeMasterTo: Changed master on mysql1:3306 to: mysql2:3306, b83336727d97-bin.000001:3075084. GTID: true
2020-12-10 22:07:36 DEBUG ChangeMasterTo: will attempt changing master credentials on mysql1:3306
2020-12-10 22:07:36 INFO ChangeMasterTo: Changed master credentials on mysql1:3306
2020-12-10 22:07:36 INFO Started replication on mysql1:3306
2020-12-10 22:07:36 INFO topology_recovery: No PostGracefulTakeoverProcesses hooks to run
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 65: write-recovery-step
[martini] Completed 500 Internal Server Error in 845.505737ms

It seems Replication is started and then another command is run that errors at 500 (the enableMasterSSL I'm pretty sure). I traced it to these lines.

In my case, replication was successfully started anyway because I did not require SSL, but in environments where replication requires SSL I think it would fail.

Is it intentional that the enableMasterSSL call comes after the 'auto' block?

@shlomi-noach
Copy link
Collaborator

Thank you for the report and for the PR! This makes perfect sense. I recall the code for enabling SSL was recently contributed and I didn't validate it well enough.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants