Master election should demotes nodes which try to join the cluster for the first time #7558

bleskes · 2014-09-03T12:19:56Z

With the change in #7493 , we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

bleskes · 2014-09-04T20:58:51Z

@kimchy I pushed another update based on our discussion.

s1monw · 2014-09-05T07:32:26Z

src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java

@@ -139,6 +140,9 @@

    private volatile boolean rejoinOnMasterGone;

+    // will be set to true upon the first successful cluster join
+    private final AtomicBoolean hasJoinedClusterOnce = new AtomicBoolean();


not sure if it will help but maybe make that an integer such that we can log how often it did join it might help when debugging? just an idea

It might.. won't hurt. I'll add it.

s1monw · 2014-09-05T07:41:23Z

I left some comments @bleskes

…to have extra trace info. Change pingRensponse.target to pingResponse.node, for clarity. Added comments and docs

s1monw · 2014-09-10T11:29:28Z

LGTM

bleskes · 2014-09-11T09:20:17Z

Re-opening as there is some BWC work to be done for 1.4

…same node elastic#5413 introduced a change where we prefer ping responses containing a master over those who don't. The same change changes the preference of acceptance if both pings have a master indication or if neither do. elastic#7558 added new flag to the PingResponse which changes after a node has joined the cluster for the very first time. Giving preference to older pings cause the wrong value of this flag to be used. This commit restores the preference to the original one.

…e cluster for the first time With the change in elastic#7493, we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed. Closes elastic#7558

…he new ping on master gone introduced in elastic#7493 The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used, this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state. Further two bwc protections are added: 1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0 2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0

…e cluster for the first time With the change in elastic#7493, we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed. Closes elastic#7558

…he new ping on master gone introduced in elastic#7493 The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used, this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state. Further two bwc protections are added: 1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0 2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0 Closes elastic#7694

…e cluster for the first time With the change in #7493, we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed. Closes #7558

…ping on master gone introduced in #7493 The change in #7558 adds a flag to PingResponse. However, when unicast discovery is used, this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state. Further two bwc protections are added: 1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0 2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0 Closes #7694

bleskes · 2014-09-16T15:41:38Z

this is now back ported to 1.x & 1.4

…e cluster for the first time With the change in elastic#7493, we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed. Closes elastic#7558

…he new ping on master gone introduced in elastic#7493 The change in elastic#7558 adds a flag to PingResponse. However, when unicast discovery is used, this extra flag can not be serialized by the very initial pings as they do not know yet what node version they ping (i.e., they have to default to 1.0.0, which excludes changing the serialization format). This commit bypasses this problem by adding a dedicated action which only exist on nodes of version 1.4 or up. Nodes first try to ping this endpoint using 1.4.0 as a serialization version. If that fails they fall back to the pre 1.4.0 action. This is optimal if all nodes are on 1.4.0 or higher, with a small down side if the cluster has mixed versions - but this is a temporary state. Further two bwc protections are added: 1) Disable the preference to nodes who previously joined the cluster if some of the pings are on version < 1.4.0 2) Disable the rejoin on master gone functionality if some nodes in the cluster or version < 1.4.0 Closes elastic#7694

bleskes added 3 commits September 3, 2014 11:52

initial work

1d541fe

pingRequests may contain client & data. nodes...

b6b44b7

comment update

1dd4ef9

bleskes added v1.4.0 labels Sep 3, 2014

bleskes force-pushed the master_prefer_non_initial_join branch 2 times, most recently from 399117d to d33b937 Compare September 4, 2014 20:55

move to a PingContextProvider implemented by ZenDiscovery

32083c5

bleskes force-pushed the master_prefer_non_initial_join branch from d33b937 to 32083c5 Compare September 4, 2014 20:57

logging improvement

0dedbaa

s1monw reviewed Sep 5, 2014
View reviewed changes

Rewrite firstClusterJoin to hasJoinedClusterOnce. Use a join cluster …

cfa0a21

…to have extra trace info. Change pingRensponse.target to pingResponse.node, for clarity. Added comments and docs

bleskes added blocker and removed review labels Sep 5, 2014

clintongormley changed the title ~~[Discovery] Master election should demotes nodes which try to join the cluster for the first time~~ Resiliency: Master election should demotes nodes which try to join the cluster for the first time Sep 8, 2014

clintongormley added the resiliency label Sep 8, 2014

bleskes added the review label Sep 10, 2014

bleskes removed review v1.4.0.Beta1 labels Sep 11, 2014

bleskes closed this in a50934e Sep 11, 2014

bleskes reopened this Sep 11, 2014

bleskes mentioned this pull request Sep 11, 2014

Discovery: back port #7558 to 1.x and add bwc protections of the new ping on master gone introduced in #7493 #7694

Closed

bleskes mentioned this pull request Sep 12, 2014

Restore preference to latest unicast pings describing the same node #7702

Closed

clintongormley added v1.4.0.Beta v1.4.0.Beta1 and removed v1.4.0.Beta labels Sep 12, 2014

bleskes closed this Sep 16, 2014

bleskes deleted the master_prefer_non_initial_join branch September 16, 2014 15:41

clintongormley added the :Cluster label Jun 7, 2015

clintongormley changed the title ~~Resiliency: Master election should demotes nodes which try to join the cluster for the first time~~ Master election should demotes nodes which try to join the cluster for the first time Jun 7, 2015

clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master election should demotes nodes which try to join the cluster for the first time #7558

Master election should demotes nodes which try to join the cluster for the first time #7558

bleskes commented Sep 3, 2014

bleskes commented Sep 4, 2014

s1monw Sep 5, 2014

bleskes Sep 5, 2014

s1monw commented Sep 5, 2014

s1monw commented Sep 10, 2014

bleskes commented Sep 11, 2014

bleskes commented Sep 16, 2014

Master election should demotes nodes which try to join the cluster for the first time #7558

Master election should demotes nodes which try to join the cluster for the first time #7558

Conversation

bleskes commented Sep 3, 2014

bleskes commented Sep 4, 2014

s1monw Sep 5, 2014

Choose a reason for hiding this comment

bleskes Sep 5, 2014

Choose a reason for hiding this comment

s1monw commented Sep 5, 2014

s1monw commented Sep 10, 2014

bleskes commented Sep 11, 2014

bleskes commented Sep 16, 2014