EnableMasterSSL with graceful-master-takeover-auto errors #1279

dtest · 2020-12-10T22:41:02Z

On orchestrator 3.2.3, doing a graceful-master-takeover-auto when AllowTLS is set for the instance will throw an error that replication couldn't be started because the replication threads are already running:

$ orchestrator-client -c topology -a $(orchestrator-client -c clusters)
mysql1:3306   [0s,ok,5.7.32-log,rw,ROW,>>,GTID]
+ mysql2:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]
+ mysql3:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]

$ orchestrator-client -c graceful-master-takeover-auto -i mysql1 -d mysql2
EnableMasterSSL: Cannot enable SSL replication on mysql1:3306 because replication threads are not stopped
{ "Id": 1, "UID": "1607638055531065398:a0a3933a4dd565b60e3becf7cd4c1b6b27060e69c59a862a4b06e0ebcec5f744", "AnalysisEntry": { "AnalyzedInstanceKey": { "Hostname": "mysql1", "Port": 3306 }, "AnalyzedInstanceMasterKey": { "Hostname": "", "Port": 0 }, "ClusterDetails": { "ClusterName": "mysql1:3306", "ClusterAlias": "mysql1:3306", "ClusterDomain": "", "CountInstances": 3, "HeuristicLag": 0, "HasAutomatedMasterRecovery": true, "HasAutomatedIntermediateMasterRecovery": true }, "AnalyzedInstanceDataCenter": "", "AnalyzedInstanceRegion": "", "AnalyzedInstancePhysicalEnvironment": "", "AnalyzedInstanceBinlogCoordinates": { "LogFile": "8141a32e30e9-bin.000003", "LogPos": 644, "Type": 0 }, "IsMaster": true, "IsReplicationGroupMember": false, "IsCoMaster": false, "LastCheckValid": true, "LastCheckPartialSuccess": true, "CountReplicas": 1, "CountValidReplicas": 1, "CountValidReplicatingReplicas": 1, "CountReplicasFailingToConnectToMaster": 0, "CountDowntimedReplicas": 0, "ReplicationDepth": 0, "Replicas": [ { "Hostname": "mysql2", "Port": 3306 } ], "SlaveHosts": [ { "Hostname": "mysql2", "Port": 3306 } ], "IsFailingToConnectToMaster": false, "Analysis": "DeadMaster", "Description": "", "StructureAnalysis": null, "IsDowntimed": false, "IsReplicasDowntimed": false, "DowntimeEndTimestamp": "", "DowntimeRemainingSeconds": 0, "IsBinlogServer": false, "PseudoGTIDImmediateTopology": false, "OracleGTIDImmediateTopology": true, "MariaDBGTIDImmediateTopology": false, "BinlogServerImmediateTopology": false, "SemiSyncMasterEnabled": false, "SemiSyncMasterStatus": false, "SemiSyncMasterWaitForReplicaCount": 0, "SemiSyncMasterClients": 0, "CountSemiSyncReplicasEnabled": 0, "CountLoggingReplicas": 1, "CountStatementBasedLoggingReplicas": 0, "CountMixedBasedLoggingReplicas": 0, "CountRowBasedLoggingReplicas": 1, "CountDistinctMajorVersionsLoggingReplicas": 1, "CountDelayedReplicas": 0, "CountLaggingReplicas": 0, "IsActionableRecovery": true, "ProcessingNodeHostname": "6235160f51df", "ProcessingNodeToken": "690e32e28307d9520449c66844e0003960815b84d5eddc64bdb811fcf14e2f06", "CountAdditionalAgreeingNodes": 0, "StartActivePeriod": "", "SkippableDueToDowntime": false, "GTIDMode": "ON", "MinReplicaGTIDMode": "ON", "MaxReplicaGTIDMode": "ON", "MaxReplicaGTIDErrant": "", "CommandHint": "graceful-master-takeover", "IsReadOnly": false }, "SuccessorKey": { "Hostname": "mysql2", "Port": 3306 }, "SuccessorAlias": "", "IsActive": false, "IsSuccessful": true, "LostReplicas": [], "ParticipatingInstanceKeys": [], "AllErrors": [], "RecoveryStartTimestamp": "", "RecoveryEndTimestamp": "", "ProcessingNodeHostname": "", "ProcessingNodeToken": "", "Acknowledged": false, "AcknowledgedAt": "", "AcknowledgedBy": "", "AcknowledgedComment": "", "LastDetectionId": 0, "RelatedRecoveryId": 0, "Type": "MasterRecovery", "RecoveryType": "MasterRecoveryGTID" }

$ orchestrator-client -c topology -a $(orchestrator-client -c clusters)
mysql2:3306   [0s,ok,5.7.32-log,rw,ROW,>>,GTID]
+ mysql1:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]
+ mysql3:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]

Here are the relevant logs:

2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 62: write-recovery-step
2020-12-10 22:07:36 DEBUG PostponedFunctionsContainer: waiting on 1 postponed functions
2020-12-10 22:07:36 DEBUG PostponedFunctionsContainer: done waiting
2020-12-10 22:07:36 INFO topology_recovery: Executed 1 postponed functions
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 63: write-recovery-step
2020-12-10 22:07:36 INFO topology_recovery: Executed postponed functions: regroup-replicas-gtid mysql2:3306
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 64: write-recovery-step
2020-12-10 22:07:36 DEBUG ChangeMasterTo: will attempt changing master on mysql1:3306 to mysql2:3306, b83336727d97-bin.000001:3075084
2020-12-10 22:07:36 INFO ChangeMasterTo: Changed master on mysql1:3306 to: mysql2:3306, b83336727d97-bin.000001:3075084. GTID: true
2020-12-10 22:07:36 DEBUG ChangeMasterTo: will attempt changing master credentials on mysql1:3306
2020-12-10 22:07:36 INFO ChangeMasterTo: Changed master credentials on mysql1:3306
2020-12-10 22:07:36 INFO Started replication on mysql1:3306
2020-12-10 22:07:36 INFO topology_recovery: No PostGracefulTakeoverProcesses hooks to run
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 65: write-recovery-step
[martini] Completed 500 Internal Server Error in 845.505737ms

It seems Replication is started and then another command is run that errors at 500 (the enableMasterSSL I'm pretty sure). I traced it to these lines.

In my case, replication was successfully started anyway because I did not require SSL, but in environments where replication requires SSL I think it would fail.

Is it intentional that the enableMasterSSL call comes after the 'auto' block?

The text was updated successfully, but these errors were encountered:

Fixes #1279

shlomi-noach · 2020-12-16T07:12:54Z

Thank you for the report and for the PR! This makes perfect sense. I recall the code for enabling SSL was recently contributed and I didn't validate it well enough.

dtest mentioned this issue Dec 11, 2020

Fixes #1279 #1280

Merged

shlomi-noach closed this as completed in 3a63659 Dec 16, 2020

shlomi-noach added a commit that referenced this issue Dec 16, 2020

Merge pull request #1280 from dtest/dtest/ssl_replication_issue_1279

41fdb47

Fixes #1279

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EnableMasterSSL with graceful-master-takeover-auto errors #1279

EnableMasterSSL with graceful-master-takeover-auto errors #1279

dtest commented Dec 10, 2020

shlomi-noach commented Dec 16, 2020

EnableMasterSSL with graceful-master-takeover-auto errors #1279

EnableMasterSSL with graceful-master-takeover-auto errors #1279

Comments

dtest commented Dec 10, 2020

shlomi-noach commented Dec 16, 2020