Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Hook for take-master / GracefulIntermediateMasterTakeover #799

Open
daniel-2647 opened this issue Feb 8, 2019 · 2 comments
Open

Hook for take-master / GracefulIntermediateMasterTakeover #799

daniel-2647 opened this issue Feb 8, 2019 · 2 comments

Comments

@daniel-2647
Copy link
Contributor

Hello Shlomi, we have the following test topology and db hosts:

1 MASTER --> shadowmaster (has all schemas)
3 SLAVES --> sm hosts (have only certain schemas being replicated, but they all have the same filter)

Initial topology:

shadowmaster:3306          [unknown,invalid,Unknown,rw,nobinlog,downtimed]
+ sm-ohq-applogdb-1:3306   [0s,ok,10.3.12-MariaDB-log,rw,MIXED,>>]
  + sm-atl-applogdb-3:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]
  + sm-ohq-applogdb-2:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]

screen shot 2019-02-08 at 5 15 22 pm

We are using AutoPseudoGTID and everything is working as expected.

Here's a scenario I'm trying to make work, but so far have not been able to:

We would like to be able to drag/drop (promote) sm-atl-applogdb-3 so it becomes master of both sm-ohq-applogdb-1 and sm-ohq-applogdb-2, and have sm-atl-applogdb-3 replicate from shadowmaster, as shown below:

shadowmaster:3306          [unknown,invalid,Unknown,rw,nobinlog,downtimed]
+ sm-atl-applogdb-3:3306   [0s,ok,10.3.12-MariaDB-log,rw,MIXED,>>]
  + sm-ohq-applogdb-1:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]
  + sm-ohq-applogdb-2:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]

screen shot 2019-02-08 at 5 17 36 pm

Unfortunately, this does not happen, and we end up with the following: (notice sm-ohq-applogdb-2 remained as slave of the old master)

shadowmaster:3306            [unknown,invalid,Unknown,rw,nobinlog,downtimed]
+ sm-atl-applogdb-3:3306     [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]
  + sm-ohq-applogdb-1:3306   [0s,ok,10.3.12-MariaDB-log,rw,MIXED,>>]
    + sm-ohq-applogdb-2:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]

screen shot 2019-02-08 at 5 15 49 pm

I attempted to use a hook, as I thought this would fall under PostIntermediateMasterFailoverProcesses. I created a hook that would move all slaves of the old intermediary master (in this case sm-ohq-applogdb-1) as slaves of the new master (sm-atl-applogdb-3), but it never got called.

When troubleshooting the PostIntermediateMasterFailoverProcesses hook to find out why it was not being called, I noticed it never get's triggered, and maybe it is because this is being handled during the take-master call, and not as a graceful intermediate master promotion.

Here are the logs:

2019-02-08 17:05:18 DEBUG raft leader is 10.0.84.117:10008 (this host); state: Leader
[martini] Started GET /api/take-master/sm-atl-applogdb-3/3306 for 69.41.14.254:15162
2019-02-08 17:05:22 DEBUG TakeMaster: will attempt making sm-atl-applogdb-3:3306 take its master sm-ohq-applogdb-1:3306, now resolved as sm-ohq-applogdb-1:3306
2019-02-08 17:05:22 INFO Stopped replication on sm-ohq-applogdb-1:3306, Self:mysql-bin-sm-ohq-applogdb-1.000057:169115058, Exec:shadowmaster.027720:455315683
2019-02-08 17:05:23 DEBUG analysis: IsMaster: true, LastCheckValid: false, LastCheckPartialSuccess: true, CountReplicas: 1, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0,
2019-02-08 17:05:23 DEBUG raft leader is 10.0.84.117:10008 (this host); state: Leader
2019-02-08 17:05:23 DEBUG orchestrator/raft: applying command 1863: request-health-report
[martini] Started GET /api/raft-follower-health-report/4c36b85e/sm-ohq-proxysql-1/sm-ohq-proxysql-1 for 10.0.84.117:50200
[martini] Completed 200 OK in 582.534µs
[martini] Started GET /api/raft-follower-health-report/4c36b85e/sm-ohq-proxysql-2/sm-ohq-proxysql-2 for 10.0.84.118:10344
[martini] Completed 200 OK in 580.334µs
[martini] Started GET /api/raft-follower-health-report/4c36b85e/sm-atl-proxysql-3/sm-atl-proxysql-3 for 10.5.4.171:47266
[martini] Completed 200 OK in 566.785µs
2019-02-08 17:05:24 INFO Stopped replication on sm-atl-applogdb-3:3306, Self:mysql-bin-sm-atl-applogdb-3.000057:169115074, Exec:mysql-bin-sm-ohq-applogdb-1.000057:169115058
2019-02-08 17:05:24 INFO Will start replication on sm-atl-applogdb-3:3306 until coordinates: mysql-bin-sm-ohq-applogdb-1.000057:169115058
2019-02-08 17:05:26 INFO Stopped replication on sm-atl-applogdb-3:3306, Self:mysql-bin-sm-atl-applogdb-3.000057:169115074, Exec:mysql-bin-sm-ohq-applogdb-1.000057:169115058
2019-02-08 17:05:26 DEBUG ChangeMasterTo: will attempt changing master on sm-atl-applogdb-3:3306 to shadowmaster:3306, shadowmaster.027720:455315683
2019-02-08 17:05:26 INFO ChangeMasterTo: Changed master on sm-atl-applogdb-3:3306 to: shadowmaster:3306, shadowmaster.027720:455315683. GTID: false
2019-02-08 17:05:26 DEBUG ChangeMasterTo: will attempt changing master on sm-ohq-applogdb-1:3306 to sm-atl-applogdb-3:3306, mysql-bin-sm-atl-applogdb-3.000057:169115074
2019-02-08 17:05:26 INFO ChangeMasterTo: Changed master on sm-ohq-applogdb-1:3306 to: sm-atl-applogdb-3:3306, mysql-bin-sm-atl-applogdb-3.000057:169115074. GTID: false
2019-02-08 17:05:27 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: AllIntermediateMasterSlavesNotReplicating; key: sm-atl-applogdb-3:3306
2019-02-08 17:05:27 INFO Started replication on sm-atl-applogdb-3:3306
2019-02-08 17:05:28 DEBUG raft leader is 10.0.84.117:10008 (this host); state: Leader
2019-02-08 17:05:28 INFO Started replication on sm-ohq-applogdb-1:3306
2019-02-08 17:05:29 INFO auditType:take-master instance:sm-atl-applogdb-3:3306 cluster:shadowmaster:3306 message:took master: sm-ohq-applogdb-1:3306

Would it be possible to create a GracefulIntermediateMasterTakeover hook or a hook for the take-master call above?
Thanks for your time, please let me know if you have any questions and I can try to explain more if needed.

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Feb 12, 2019

@daniel-2647 is there anything you'd want to do other than relocating those replicas under the promoted server?
For relocating the replicas I'm happy to just make that behavior the new default, as it makes perfect sense.
I'm less enthusiast about creating a GracefulIntermediateMasterTakeover hooks logic. While I agree you could benefit from that hook, users ask for all sorts of hooks, for a lot of specific use cases, and I'm unsure yet what a correct approach would be. I have something in mind that is a generic response for dozens of actions; but this thought has not matured yet.

@daniel-2647
Copy link
Contributor Author

daniel-2647 commented Feb 12, 2019

Thanks @shlomi-noach for the response, that would be wonderful. The reasoning for asking for a hook, is because we use proxysql, and at times, also haproxy with consul / consul-template.
I use the hooks to update the consul store, and consul-template which in turn updates the config for the slave/master haproxy pools. We also update proxysql hostgroups where appropriate.

For instance, we would rather not have the intermediate master receive reads, and also, not have the secondary slaves receive any writes.

Let's say in the scenario above, sm-ohq-applogdb-1 was the original intermediate master. That host can receive writes, it belongs to the writer hostgroup in proxysql, and all its slaves (which are read only) can only receive reads and belong to the readers hostgroup in proxysql.

When we do a graceful promotion of sm-atl-applogdb-3 to become the new intermediate master, we would like to also make changes to proxysql, so this host is moved from the readers hostgroup to the writers hostgroup, and the old intermediate master, is moved to the readers hostgroup.

We would also like to make changes to the consul store (via the hooks) when these changes took place. We can also make other policy changes and re-assign certain proxysql query rules to a given host via the hooks once the intermediate promotion takes place. Does that make sense?

I also use the hooks to set the slaves read-only (including the old intermediate master) and the new intermediate master read/write.

As always, thank you for your time and appreciate all the work, dedication and the effort into orchestrator.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants