-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add voting-only master node #43410
Add voting-only master node #43410
Conversation
Adds a plugin that allows some master-eligible nodes to be used as "voting-only" nodes. Such nodes are never elected as the true master of the cluster, but they perform all the tasks needed of a master-eligible node to provide resilience.
Pinging @elastic/es-distributed |
35e8cc5
to
1b30c71
Compare
* `master:false`, `data:false`, `ingest:false`, `voting_only:true`, or | ||
`coordinating_only:false`, which respectively remove from the subset all | ||
master-eligible nodes, all data nodes, all ingest nodes, all voting-only | ||
nodes and all coordinating-only nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docs should clearly delineate that voting_only
requires x-pack, we can wrap it in conditionals and add x-pack annotations so that it doesn't show in the docs if someone builds the OSS-only docs, and has x-pack designations in the docs when the full docs are published.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that OSS-only docs were even a thing. Are we publishing those somewhere? How do you build those? You will have to spell out the details on how to set up the conditionals, I'm not aware of any such infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content that requires the default distribution should be tagged with the [role="xpack"] directive. Now that all of the doc source is in the public repo we no longer maintain two versions of the index.asciidoc file, so conditional statements based on the include_xpack
attribute have no effect.
Inline references like this are tricky. If using the voting_only
attribute throws an error in the OSS distro, I'd be inclined to add a note to that effect. Something like:
NOTE: Designating nodes as
voting_only
and usingvoting_only
in node filters is requires the default distribution of Elasticsearch.
@lcawl can correct me if I'm wrong, but I don't think there's (currently) any way to attach the xpack bug to an admonition block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a few minor comments but this all looks good.
I have read, but not properly reviewed, all the code that's needed to integrate this plugin with X-pack. That's not something with which I'm familiar enough to pass an opinion.
/** | ||
* A collection of votes, extending {@link VoteCollection}, which additionally records the Joins | ||
*/ | ||
public static class JoinVoteCollection extends VoteCollection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be folded into VoteCollection
. It's a bit strange to have it separate just to do that instanceof
call in the plugin. The ability to add a bare vote (without a join) is still going to be useful but I don't think we need a whole subclass to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in e7c325e
public static final String DEFAULT_ELECTION_TYPE = "default"; | ||
|
||
public static final Setting<String> ELECTION_TYPE_SETTING = | ||
new Setting<>("cluster.election.type", DEFAULT_ELECTION_TYPE, Function.identity(), Property.NodeScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the name for this setting from my PoC, but I think we should align the names type
and strategy
. I think it's probably ok to use strategy
everywhere, given that we don't expect users to adjust this setting directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 1caa1b6
assertThat(client().admin().cluster().prepareState().setMasterNodeTimeout("100ms") | ||
.execute().actionGet().getState().nodes().getMasterNodeId(), nullValue()); | ||
fail("should not be able to find master"); | ||
} catch (MasterNotDiscoveredException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not expectThrows()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 6907d06 (this was mostly copy-pasta)
|
||
// start a fresh full master node, which will be brought into the cluster as master by the voting-only nodes | ||
final String newMaster = internalCluster().startNode(); | ||
assertEquals(newMaster, internalCluster().getMasterName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert it's a different node ID from the original to emphasise it's a fresh node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 6907d06
docs/reference/modules/node.asciidoc
Outdated
machine or a smaller heap-size can be chosen for this node. Alternatively, | ||
a voting-only non-dedicated master node can play the role of the third | ||
master-eligible node, which allows running an HA cluster with only two | ||
dedicated master nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make it clearer that voting-only nodes still need a fast disk and a low-latency connection to the true masters in order to be effective, and will use the same amount of disk space as any other dedicated master node. They should need less CPU and less heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to address this in ccdb483. Let me know if that's ok, or perhaps suggest an alternative wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple comments on the plugin api.
server/src/main/java/org/elasticsearch/cluster/coordination/ElectionStrategy.java
Outdated
Show resolved
Hide resolved
@@ -117,6 +117,11 @@ private XPackSettings() { | |||
/** Setting for enabling or disabling vectors. Defaults to true. */ | |||
public static final Setting<Boolean> VECTORS_ENABLED = Setting.boolSetting("xpack.vectors.enabled", true, Setting.Property.NodeScope); | |||
|
|||
/** Setting for enabling or disabling the voting-only-node functionality. Needs to be enabled on both voting-only nodes and regular | |||
* master-eligible nodes for the voting-only functionality to work correctly. Defaults to true. */ | |||
public static final Setting<Boolean> VOTING_ONLY_ENABLED = Setting.boolSetting("xpack.voting_only.enabled", true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to have a separate enabling of the feature? Why coudln't it just always be available with xpack (especially since you have the default as enabled)? Why would someone need to disable this feature, when they would just not set any nodes to voting only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. My thinking was the following: We have this setting for all other features as well. When the setting is disabled, the other plugins are not adding anything to the extension points. I thought this could be used as an extra safeguard that would allow completely disabling the functionality provided by this plugin in case something was wrong with it, given that it's always bundled with x-pack. I'm fine not having this setting if you think that that is not necessary. I did not see a strong reason to deviate from the norm here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it contributes to an overload of settings users could mess with, with no real added benefit. I see what you mean about all the other xpack plugins. I wonder if we really need that though. For eg security it makes sense, but why would one ever disable deprecation, or rollup, or ilm? These features do not actively do anything unless used, afaik. For features like that, I think we should just have it always "enabled", but I'm open to hearing others arguments for why we need the ability to disable across all xpack features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For features that provide API endpoints, it makes perhaps more sense (so that these API endpoints don't even become available). For voting_only nodes, the only thing that becomes visible is that the voting_only
role appears in several stats endpoints, even if there is no voting_only node in the cluster. The node counts, that are part of the cluster stats, for example, expose this info.
I don't hold a particularly strong opinion either way on this, so I've removed the setting in bcd6ec2. I can revert that if others feel differently about this.
lastAcceptedConfiguration, joinVotes); | ||
} | ||
|
||
protected boolean isCustomElectionQuorum(DiscoveryNode localNode, long localCurrentTerm, long localAcceptedTerm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs on this would be good, explaining what a plugin author extending this should return. I would also reference the final method, explaining how the results are merged together. I might also name this with terminology like "additional" since it adds to the real quorum method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 264e5a3
/** | ||
* Allows plugging in a custom election strategy, restricting the notion of an election quorum. | ||
*/ | ||
public class ElectionStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this abstract and the protected method abstract? The default instance can be an anonymous class returning true. This way someone extending it does not eg mispell the method, at the same time forgetting @OverRide, and not understanding why their method is not being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. adapted in 264e5a3
@elasticmachine run elasticsearch-ci/docs-check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the plugin side
* Generalise the note on the node filters page to talk about the `voting_only` role, since it's not clear on this page what "designating" means. * Contrast `default-dist` to `oss-dist`, because in isolation the phrase "default distribution" is kinda mysterious to the general public. * Qualify statement about "any master eligible node may be elected" * Add excuse for the confusing terminology * Shrink explanation about HA clusters, because it involved concepts like "voting powers" and "capability to act as master" that aren't really defined. * Remove suggestion to use a data node as a voting-only node: although this might work in many cases, I think if we specifically mention it here we will see too many clusters with overloaded data nodes that can't reasonably be a master. I think this is best left implicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 7bb6fa2 to rework the docs a bit, which could do with a review from someone that isn't me. Everything else LGTM.
build succeeded, but Github hook failed, so I'm considering this as green to merge.
|
A voting-only master-eligible node is a node that can participate in master elections but will not act as a master in the cluster. In particular, a voting-only node can help elect another master-eligible node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two can still elect a master amongst them-selves. This only requires one of the two remaining nodes to have the capability to act as master, but both need to have voting powers. This means that one of the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a voting-only non-dedicated master node can play the role of the third master-eligible node, which allows running an HA cluster with only two dedicated master nodes. Closes #14340 Co-authored-by: David Turner <david.turner@elastic.co>
A voting-only master-eligible node is a node that can participate in master elections but will not act as a master in the cluster. In particular, a voting-only node can help elect another master-eligible node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two can still elect a master amongst them-selves. This only requires one of the two remaining nodes to have the capability to act as master, but both need to have voting powers. This means that one of the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a voting-only non-dedicated master node can play the role of the third master-eligible node, which allows running an HA cluster with only two dedicated master nodes.
Closes #14340
This is joint work with @DaveCTurner