Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add voting-only master node #43410

Merged
merged 31 commits into from
Jun 25, 2019
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0567b49
Voting-only nodes
DaveCTurner Jun 18, 2019
7dd80ce
Randomise good-quorum calculation in CoordinationStateTests
DaveCTurner Jun 18, 2019
c298508
State transfer only
ywelsch Jun 19, 2019
0d3750d
Move to isElectionQuorum
ywelsch Jun 19, 2019
ee52a89
use JoinVoteCollection
ywelsch Jun 19, 2019
1a50872
fix test
ywelsch Jun 19, 2019
2cb2010
fix build and use transport intercepter
ywelsch Jun 19, 2019
8383823
move tests
ywelsch Jun 19, 2019
9ee9fdc
move tests
ywelsch Jun 19, 2019
77c252f
rest test
ywelsch Jun 19, 2019
9de65ea
Add x-pack feature set
ywelsch Jun 19, 2019
03d249f
add docs
ywelsch Jun 19, 2019
b170c9d
fixup
ywelsch Jun 19, 2019
c11b2e6
fix docs tests
ywelsch Jun 19, 2019
d109686
Register usage action
ywelsch Jun 20, 2019
e410121
more fixups
ywelsch Jun 20, 2019
88f081d
Merge remote-tracking branch 'elastic/master' into state-transfer-only
ywelsch Jun 20, 2019
5e260d5
more fixes
ywelsch Jun 20, 2019
65fda75
More fixups
ywelsch Jun 20, 2019
1b30c71
Merge remote-tracking branch 'elastic/master' into state-transfer-only
ywelsch Jun 20, 2019
fbba2c3
fix docs tests on OSS distrib
ywelsch Jun 20, 2019
e7c325e
Fold JoinVoteCollection into VoteCollection
ywelsch Jun 21, 2019
1caa1b6
s/election type/election strategy/
ywelsch Jun 21, 2019
6907d06
test adjustment
ywelsch Jun 21, 2019
e284426
Move ElectionStrategy from interface to class
ywelsch Jun 21, 2019
bcd6ec2
Have VotingOnlyNodePlugin always enabled
ywelsch Jun 21, 2019
264e5a3
Ryan feedback
ywelsch Jun 21, 2019
d8a0b9a
checkstyle
ywelsch Jun 21, 2019
ccdb483
doc changes
ywelsch Jun 21, 2019
28efcf0
Add note about voting-only in default distrib
ywelsch Jun 24, 2019
7bb6fa2
Reword docs
DaveCTurner Jun 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added A
Empty file.
14 changes: 8 additions & 6 deletions docs/reference/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ one of the following:
* an IP address or hostname, to add all matching nodes to the subset.
* a pattern, using `*` wildcards, which adds all nodes to the subset
whose name, address or hostname matches the pattern.
* `master:true`, `data:true`, `ingest:true` or `coordinating_only:true`, which
respectively add to the subset all master-eligible nodes, all data nodes,
all ingest nodes, and all coordinating-only nodes.
* `master:false`, `data:false`, `ingest:false` or `coordinating_only:false`,
which respectively remove from the subset all master-eligible nodes, all data
nodes, all ingest nodes, and all coordinating-only nodes.
* `master:true`, `data:true`, `ingest:true`, `voting_only:true` or
`coordinating_only:true`, which respectively add to the subset all
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes, and all coordinating-only nodes.
* `master:false`, `data:false`, `ingest:false`, `voting_only:true`, or
`coordinating_only:false`, which respectively remove from the subset all
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
nodes and all coordinating-only nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docs should clearly delineate that voting_only requires x-pack, we can wrap it in conditionals and add x-pack annotations so that it doesn't show in the docs if someone builds the OSS-only docs, and has x-pack designations in the docs when the full docs are published.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that OSS-only docs were even a thing. Are we publishing those somewhere? How do you build those? You will have to spell out the details on how to set up the conditionals, I'm not aware of any such infrastructure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we do have [x-pack] macros, I'm not aware of an OSS-only docs build functionality.

@debadair @lcawl Are you aware of an OSS-only docs build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t build them but it’s available for users that want to build OSS-only docs.

@lcawl Can you help @ywelsch add the appropriate x-pack annotations here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content that requires the default distribution should be tagged with the [role="xpack"] directive. Now that all of the doc source is in the public repo we no longer maintain two versions of the index.asciidoc file, so conditional statements based on the include_xpack attribute have no effect.

Inline references like this are tricky. If using the voting_only attribute throws an error in the OSS distro, I'd be inclined to add a note to that effect. Something like:

NOTE: Designating nodes as voting_only and using voting_only in node filters is requires the default distribution of Elasticsearch.

@lcawl can correct me if I'm wrong, but I don't think there's (currently) any way to attach the xpack bug to an admonition block.

* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
which adds to the subset all nodes with a custom node attribute whose name
and value match the respective patterns. Custom node attributes are
Expand Down
9 changes: 7 additions & 2 deletions docs/reference/cluster/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ Will return, for example:
"data": 1,
"coordinating_only": 0,
"master": 1,
"ingest": 1
"ingest": 1,
"voting_only": 0
},
"versions": [
"{version}"
Expand Down Expand Up @@ -207,6 +208,7 @@ Will return, for example:
// TESTRESPONSE[s/"plugins": \[[^\]]*\]/"plugins": $body.$_path/]
// TESTRESPONSE[s/"network_types": \{[^\}]*\}/"network_types": $body.$_path/]
// TESTRESPONSE[s/"discovery_types": \{[^\}]*\}/"discovery_types": $body.$_path/]
// TESTRESPONSE[s/"count": \{[^\}]*\}/"count": $body.$_path/]
// TESTRESPONSE[s/"packaging_types": \[[^\]]*\]/"packaging_types": $body.$_path/]
// TESTRESPONSE[s/: true|false/: $body.$_path/]
// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
Expand All @@ -217,7 +219,10 @@ Will return, for example:
// see an exhaustive list anyway.
// 2. Similarly, ignore the contents of `network_types`, `discovery_types`, and
// `packaging_types`.
// 3. All of the numbers and strings on the right hand side of *every* field in
// 3. Ignore the contents of the (nodes) count object, as what's shown here
// depends on the license. Voting-only nodes are e.g. only shown when this
// test runs with a basic license.
// 4. All of the numbers and strings on the right hand side of *every* field in
// the response are ignored. So we're really only asserting things about the
// the shape of this response, not the values in it.

Expand Down
31 changes: 31 additions & 0 deletions docs/reference/modules/node.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,37 @@ cluster.remote.connect: false <4>
<3> Disable the `node.ingest` role (enabled by default).
<4> Disable {ccs} (enabled by default).

[float]
[[voting-only-node]]
==== Voting-only master-eligible node

A voting-only master-eligible node is a node that can participate in master
elections but will not act as a master in the cluster. In particular, a
voting-only node can help elect another master-eligible node as master, and
can serve as a tiebreaker in elections. To mark a master-eligible node as
voting-only, set:

[source,yaml]
-------------------
node.voting_only: true <1>
-------------------
<1> The `node.voting_only` role is disabled by default.

IMPORTANT: If you use the {oss-dist}, do not set `node.voting_only`. Otherwise,
the node fails to start. Also note that only master-eligible nodes can be
marked as voting-only.

High availability (HA) clusters require at least three master-eligible nodes,
so that if one of the three nodes is down, then the remaining two can still
elect a master amongst them-selves. This only requires one of the two remaining
nodes to have the capability to act as master, but both need to have voting
powers. This means that one of the three master-eligible nodes can be made as
voting-only. If this voting-only node is a dedicated master, a less powerful
machine or a smaller heap-size can be chosen for this node. Alternatively,
a voting-only non-dedicated master node can play the role of the third
master-eligible node, which allows running an HA cluster with only two
dedicated master nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make it clearer that voting-only nodes still need a fast disk and a low-latency connection to the true masters in order to be effective, and will use the same amount of disk space as any other dedicated master node. They should need less CPU and less heap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to address this in ccdb483. Let me know if that's ok, or perhaps suggest an alternative wording.


[float]
[[data-node]]
=== Data Node
Expand Down
4 changes: 4 additions & 0 deletions docs/reference/rest-api/info.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ Example response:
"available" : true,
"enabled" : true
},
"voting_only" : {
"available" : true,
"enabled" : true
},
"watcher" : {
"available" : true,
"enabled" : true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,14 +121,16 @@ static class ClusterFormationState {
private final List<TransportAddress> resolvedAddresses;
private final List<DiscoveryNode> foundPeers;
private final long currentTerm;
private final ElectionStrategy electionStrategy;

ClusterFormationState(Settings settings, ClusterState clusterState, List<TransportAddress> resolvedAddresses,
List<DiscoveryNode> foundPeers, long currentTerm) {
List<DiscoveryNode> foundPeers, long currentTerm, ElectionStrategy electionStrategy) {
this.settings = settings;
this.clusterState = clusterState;
this.resolvedAddresses = resolvedAddresses;
this.foundPeers = foundPeers;
this.currentTerm = currentTerm;
this.electionStrategy = electionStrategy;
}

String getDescription() {
Expand Down Expand Up @@ -185,7 +187,9 @@ String getDescription() {
final VoteCollection voteCollection = new VoteCollection();
foundPeers.forEach(voteCollection::addVote);
final String isQuorumOrNot
= CoordinationState.isElectionQuorum(voteCollection, clusterState) ? "is a quorum" : "is not a quorum";
= electionStrategy.isElectionQuorum(clusterState.nodes().getLocalNode(), currentTerm, clusterState.term(),
clusterState.version(), clusterState.getLastCommittedConfiguration(), clusterState.getLastAcceptedConfiguration(),
voteCollection) ? "is a quorum" : "is not a quorum";

return String.format(Locale.ROOT,
"master not discovered or elected yet, an election requires %s, have discovered %s which %s; %s",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@
import org.elasticsearch.cluster.coordination.CoordinationMetaData.VotingConfiguration;
import org.elasticsearch.cluster.metadata.MetaData;
import org.elasticsearch.cluster.node.DiscoveryNode;
import org.elasticsearch.common.settings.Settings;

import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

/**
* The core class of the cluster state coordination algorithm, directly implementing the
Expand All @@ -42,6 +43,8 @@ public class CoordinationState {

private final DiscoveryNode localNode;

private final ElectionStrategy electionStrategy;

// persisted state
private final PersistedState persistedState;

Expand All @@ -53,11 +56,12 @@ public class CoordinationState {
private VotingConfiguration lastPublishedConfiguration;
private VoteCollection publishVotes;

public CoordinationState(Settings settings, DiscoveryNode localNode, PersistedState persistedState) {
public CoordinationState(DiscoveryNode localNode, PersistedState persistedState, ElectionStrategy electionStrategy) {
this.localNode = localNode;

// persisted state
this.persistedState = persistedState;
this.electionStrategy = electionStrategy;

// transient state
this.joinVotes = new VoteCollection();
Expand Down Expand Up @@ -100,13 +104,9 @@ public boolean electionWon() {
return electionWon;
}

public boolean isElectionQuorum(VoteCollection votes) {
return isElectionQuorum(votes, getLastAcceptedState());
}

static boolean isElectionQuorum(VoteCollection votes, ClusterState lastAcceptedState) {
return votes.isQuorum(lastAcceptedState.getLastCommittedConfiguration())
&& votes.isQuorum(lastAcceptedState.getLastAcceptedConfiguration());
public boolean isElectionQuorum(VoteCollection joinVotes) {
return electionStrategy.isElectionQuorum(localNode, getCurrentTerm(), getLastAcceptedTerm(), getLastAcceptedVersion(),
getLastCommittedConfiguration(), getLastAcceptedConfiguration(), joinVotes);
}

public boolean isPublishQuorum(VoteCollection votes) {
Expand All @@ -117,6 +117,11 @@ public boolean containsJoinVoteFor(DiscoveryNode node) {
return joinVotes.containsVoteFor(node);
}

// used for tests
boolean containsJoin(Join join) {
return joinVotes.getJoins().contains(join);
}

public boolean joinVotesHaveQuorumFor(VotingConfiguration votingConfiguration) {
return joinVotes.isQuorum(votingConfiguration);
}
Expand Down Expand Up @@ -243,7 +248,7 @@ public boolean handleJoin(Join join) {
throw new CoordinationStateRejectedException("rejecting join since this node has not received its initial configuration yet");
}

boolean added = joinVotes.addVote(join.getSourceNode());
boolean added = joinVotes.addJoinVote(join);
boolean prevElectionWon = electionWon;
electionWon = isElectionQuorum(joinVotes);
assert !prevElectionWon || electionWon; // we cannot go from won to not won
Expand Down Expand Up @@ -489,18 +494,28 @@ default void markLastAcceptedStateAsCommitted() {
}

/**
* A collection of votes, used to calculate quorums.
* A collection of votes, used to calculate quorums. Optionally records the Joins as well.
*/
public static class VoteCollection {

private final Map<String, DiscoveryNode> nodes;
private final Set<Join> joins;

public boolean addVote(DiscoveryNode sourceNode) {
return nodes.put(sourceNode.getId(), sourceNode) == null;
}

public boolean addJoinVote(Join join) {
final boolean added = addVote(join.getSourceNode());
if (added) {
joins.add(join);
}
return added;
}

public VoteCollection() {
nodes = new HashMap<>();
joins = new HashSet<>();
}

public boolean isQuorum(VotingConfiguration configuration) {
Expand All @@ -519,24 +534,31 @@ public Collection<DiscoveryNode> nodes() {
return Collections.unmodifiableCollection(nodes.values());
}

public Set<Join> getJoins() {
return Collections.unmodifiableSet(joins);
}

@Override
public String toString() {
return "VoteCollection{" + String.join(",", nodes.keySet()) + "}";
return "VoteCollection{votes=" + nodes.keySet() + ", joins=" + joins + "}";
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
if (!(o instanceof VoteCollection)) return false;

VoteCollection that = (VoteCollection) o;

return nodes.equals(that.nodes);
if (!nodes.equals(that.nodes)) return false;
return joins.equals(that.joins);
}

@Override
public int hashCode() {
return nodes.hashCode();
int result = nodes.hashCode();
result = 31 * result + joins.hashCode();
return result;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
import org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper.ClusterFormationState;
import org.elasticsearch.cluster.coordination.CoordinationMetaData.VotingConfigExclusion;
import org.elasticsearch.cluster.coordination.CoordinationMetaData.VotingConfiguration;
import org.elasticsearch.cluster.coordination.CoordinationState.VoteCollection;
import org.elasticsearch.cluster.coordination.FollowersChecker.FollowerCheckRequest;
import org.elasticsearch.cluster.coordination.JoinHelper.InitialJoinAccumulator;
import org.elasticsearch.cluster.metadata.MetaData;
Expand Down Expand Up @@ -100,6 +101,7 @@ public class Coordinator extends AbstractLifecycleComponent implements Discovery

private final Settings settings;
private final boolean singleNodeDiscovery;
private final ElectionStrategy electionStrategy;
private final TransportService transportService;
private final MasterService masterService;
private final AllocationService allocationService;
Expand Down Expand Up @@ -150,13 +152,14 @@ public Coordinator(String nodeName, Settings settings, ClusterSettings clusterSe
NamedWriteableRegistry namedWriteableRegistry, AllocationService allocationService, MasterService masterService,
Supplier<CoordinationState.PersistedState> persistedStateSupplier, SeedHostsProvider seedHostsProvider,
ClusterApplier clusterApplier, Collection<BiConsumer<DiscoveryNode, ClusterState>> onJoinValidators, Random random,
Consumer<String> reroute) {
Consumer<String> reroute, ElectionStrategy electionStrategy) {
this.settings = settings;
this.transportService = transportService;
this.masterService = masterService;
this.allocationService = allocationService;
this.onJoinValidators = JoinTaskExecutor.addBuiltInJoinValidators(onJoinValidators);
this.singleNodeDiscovery = DiscoveryModule.SINGLE_NODE_DISCOVERY_TYPE.equals(DiscoveryModule.DISCOVERY_TYPE_SETTING.get(settings));
this.electionStrategy = electionStrategy;
this.joinHelper = new JoinHelper(settings, allocationService, masterService, transportService,
this::getCurrentTerm, this::getStateForMasterService, this::handleJoinRequest, this::joinLeaderInTerm, this.onJoinValidators,
reroute);
Expand All @@ -168,7 +171,7 @@ public Coordinator(String nodeName, Settings settings, ClusterSettings clusterSe
this.publishTimeout = PUBLISH_TIMEOUT_SETTING.get(settings);
this.random = random;
this.electionSchedulerFactory = new ElectionSchedulerFactory(settings, random, transportService.getThreadPool());
this.preVoteCollector = new PreVoteCollector(transportService, this::startElection, this::updateMaxTermSeen);
this.preVoteCollector = new PreVoteCollector(transportService, this::startElection, this::updateMaxTermSeen, electionStrategy);
configuredHostsResolver = new SeedHostsResolver(nodeName, settings, transportService, seedHostsProvider);
this.peerFinder = new CoordinatorPeerFinder(settings, transportService,
new HandshakingTransportAddressConnector(settings, transportService), configuredHostsResolver);
Expand All @@ -191,7 +194,7 @@ public Coordinator(String nodeName, Settings settings, ClusterSettings clusterSe
private ClusterFormationState getClusterFormationState() {
return new ClusterFormationState(settings, getStateForMasterService(), peerFinder.getLastResolvedAddresses(),
Stream.concat(Stream.of(getLocalNode()), StreamSupport.stream(peerFinder.getFoundPeers().spliterator(), false))
.collect(Collectors.toList()), getCurrentTerm());
.collect(Collectors.toList()), getCurrentTerm(), electionStrategy);
}

private void onLeaderFailure(Exception e) {
Expand Down Expand Up @@ -649,7 +652,7 @@ boolean publicationInProgress() {
protected void doStart() {
synchronized (mutex) {
CoordinationState.PersistedState persistedState = persistedStateSupplier.get();
coordinationState.set(new CoordinationState(settings, getLocalNode(), persistedState));
coordinationState.set(new CoordinationState(getLocalNode(), persistedState, electionStrategy));
peerFinder.setCurrentTerm(getCurrentTerm());
configuredHostsResolver.start();
final ClusterState lastAcceptedState = coordinationState.get().getLastAcceptedState();
Expand Down Expand Up @@ -1101,11 +1104,10 @@ protected void onFoundPeersUpdated() {
synchronized (mutex) {
final Iterable<DiscoveryNode> foundPeers = getFoundPeers();
if (mode == Mode.CANDIDATE) {
final CoordinationState.VoteCollection expectedVotes = new CoordinationState.VoteCollection();
final VoteCollection expectedVotes = new VoteCollection();
foundPeers.forEach(expectedVotes::addVote);
expectedVotes.addVote(Coordinator.this.getLocalNode());
final ClusterState lastAcceptedState = coordinationState.get().getLastAcceptedState();
final boolean foundQuorum = CoordinationState.isElectionQuorum(expectedVotes, lastAcceptedState);
final boolean foundQuorum = coordinationState.get().isElectionQuorum(expectedVotes);

if (foundQuorum) {
if (electionScheduler == null) {
Expand Down Expand Up @@ -1305,6 +1307,18 @@ public void onSuccess(String source) {
final List<DiscoveryNode> masterCandidates = completedNodes().stream()
.filter(DiscoveryNode::isMasterNode)
.filter(node -> nodeMayWinElection(state, node))
.filter(node -> {
// check if master candidate would be able to get an election quorum if we were to
// abdicate to it. Assume that every node that completed the publication can provide
// a vote in that next election and has the latest state.
final long futureElectionTerm = state.term() + 1;
final VoteCollection futureVoteCollection = new VoteCollection();
completedNodes().forEach(completedNode -> futureVoteCollection.addJoinVote(
new Join(completedNode, node, futureElectionTerm, state.term(), state.version())));
return electionStrategy.isElectionQuorum(node, futureElectionTerm,
state.term(), state.version(), state.getLastCommittedConfiguration(),
state.getLastAcceptedConfiguration(), futureVoteCollection);
})
.collect(Collectors.toList());
if (masterCandidates.isEmpty() == false) {
abdicateTo(masterCandidates.get(random.nextInt(masterCandidates.size())));
Expand Down Expand Up @@ -1345,7 +1359,7 @@ private void handleAssociatedJoin(Join join) {
}

@Override
protected boolean isPublishQuorum(CoordinationState.VoteCollection votes) {
protected boolean isPublishQuorum(VoteCollection votes) {
assert Thread.holdsLock(mutex) : "Coordinator mutex not held";
return coordinationState.get().isPublishQuorum(votes);
}
Expand Down
Loading