Skip to content

Commit

Permalink
[Zen2] Update documentation for Zen2 (#34714)
Browse files Browse the repository at this point in the history
This commit overhauls the documentation of discovery and cluster coordination,
removing mention of the Zen Discovery module and replacing it with docs for the
new cluster coordination mechanism introduced in 7.0.

Relates #32006
  • Loading branch information
DaveCTurner authored Dec 20, 2018
1 parent 08bcd83 commit 1a23417
Show file tree
Hide file tree
Showing 27 changed files with 985 additions and 414 deletions.
14 changes: 8 additions & 6 deletions docs/plugins/discovery.asciidoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[[discovery]]
== Discovery Plugins

Discovery plugins extend Elasticsearch by adding new discovery mechanisms that
can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery].
Discovery plugins extend Elasticsearch by adding new hosts providers that can be
used to extend the {ref}/modules-discovery.html[cluster formation module].

[float]
==== Core discovery plugins
Expand All @@ -11,22 +11,24 @@ The core discovery plugins are:

<<discovery-ec2,EC2 discovery>>::

The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery.
The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API]
for unicast discovery.

<<discovery-azure-classic,Azure Classic discovery>>::

The Azure Classic discovery plugin uses the Azure Classic API for unicast discovery.
The Azure Classic discovery plugin uses the Azure Classic API for unicast
discovery.

<<discovery-gce,GCE discovery>>::

The Google Compute Engine discovery plugin uses the GCE API for unicast discovery.
The Google Compute Engine discovery plugin uses the GCE API for unicast
discovery.

[float]
==== Community contributed discovery plugins

A number of discovery plugins have been contributed by our community:

* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan)
* https://github.com/fabric8io/elasticsearch-cloud-kubernetes[Kubernetes Discovery Plugin] (by Jimmi Dyson, http://fabric8.io[fabric8])

include::discovery-ec2.asciidoc[]
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/migration/migrate_7_0.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ See also <<release-highlights>> and <<es-release-notes>>.

* <<breaking_70_aggregations_changes>>
* <<breaking_70_cluster_changes>>
* <<breaking_70_discovery_changes>>
* <<breaking_70_indices_changes>>
* <<breaking_70_mappings_changes>>
* <<breaking_70_search_changes>>
Expand Down Expand Up @@ -44,6 +45,7 @@ Elasticsearch 6.x in order to be readable by Elasticsearch 7.x.
include::migrate_7_0/aggregations.asciidoc[]
include::migrate_7_0/analysis.asciidoc[]
include::migrate_7_0/cluster.asciidoc[]
include::migrate_7_0/discovery.asciidoc[]
include::migrate_7_0/indices.asciidoc[]
include::migrate_7_0/mappings.asciidoc[]
include::migrate_7_0/search.asciidoc[]
Expand Down
9 changes: 0 additions & 9 deletions docs/reference/migration/migrate_7_0/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,3 @@ Clusters now have soft limits on the total number of open shards in the cluster
based on the number of nodes and the `cluster.max_shards_per_node` cluster
setting, to prevent accidental operations that would destabilize the cluster.
More information can be found in the <<misc-cluster,documentation for that setting>>.

[float]
==== Discovery configuration is required in production
Production deployments of Elasticsearch now require at least one of the following settings
to be specified in the `elasticsearch.yml` configuration file:

- `discovery.zen.ping.unicast.hosts`
- `discovery.zen.hosts_provider`
- `cluster.initial_master_nodes`
40 changes: 40 additions & 0 deletions docs/reference/migration/migrate_7_0/discovery.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
[float]
[[breaking_70_discovery_changes]]
=== Discovery changes

[float]
==== Cluster bootstrapping is required if discovery is configured

The first time a cluster is started, `cluster.initial_master_nodes` must be set
to perform cluster bootstrapping. It should contain the names of the
master-eligible nodes in the initial cluster and be defined on every
master-eligible node in the cluster. See <<discovery-settings,the discovery
settings summary>> for an example, and the
<<modules-discovery-bootstrap-cluster,cluster bootstrapping reference
documentation>> describes this setting in more detail.

The `discovery.zen.minimum_master_nodes` setting is required during a rolling
upgrade from 6.x, but can be removed in all other circumstances.

[float]
==== Removing master-eligible nodes sometimes requires voting exclusions

If you wish to remove half or more of the master-eligible nodes from a cluster,
you must first exclude the affected nodes from the voting configuration using
the <<modules-discovery-adding-removing-nodes,voting config exclusions API>>.
If you remove fewer than half of the master-eligible nodes at the same time,
voting exclusions are not required. If you remove only master-ineligible nodes
such as data-only nodes or coordinating-only nodes, voting exclusions are not
required. Likewise, if you add nodes to the cluster, voting exclusions are not
required.

[float]
==== Discovery configuration is required in production

Production deployments of Elasticsearch now require at least one of the
following settings to be specified in the `elasticsearch.yml` configuration
file:

- `discovery.zen.ping.unicast.hosts`
- `discovery.zen.hosts_provider`
- `cluster.initial_master_nodes`
12 changes: 6 additions & 6 deletions docs/reference/modules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ These settings can be dynamically updated on a live cluster with the

The modules in this section are:

<<modules-cluster,Cluster-level routing and shard allocation>>::
<<modules-discovery,Discovery and cluster formation>>::

Settings to control where, when, and how shards are allocated to nodes.
How nodes discover each other, elect a master and form a cluster.

<<modules-discovery,Discovery>>::
<<modules-cluster,Shard allocation and cluster-level routing>>::

How nodes discover each other to form a cluster.
Settings to control where, when, and how shards are allocated to nodes.

<<modules-gateway,Gateway>>::

Expand Down Expand Up @@ -85,10 +85,10 @@ The modules in this section are:
--


include::modules/cluster.asciidoc[]

include::modules/discovery.asciidoc[]

include::modules/cluster.asciidoc[]

include::modules/gateway.asciidoc[]

include::modules/http.asciidoc[]
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/modules/cluster.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[modules-cluster]]
== Cluster
== Shard allocation and cluster-level routing

One of the main roles of the master is to decide which shards to allocate to
which nodes, and when to move shards between nodes in order to rebalance the
Expand Down
87 changes: 66 additions & 21 deletions docs/reference/modules/discovery.asciidoc
Original file line number Diff line number Diff line change
@@ -1,30 +1,75 @@
[[modules-discovery]]
== Discovery
== Discovery and cluster formation

The discovery module is responsible for discovering nodes within a
cluster, as well as electing a master node.
The discovery and cluster formation module is responsible for discovering
nodes, electing a master, forming a cluster, and publishing the cluster state
each time it changes. It is integrated with other modules. For example, all
communication between nodes is done using the <<modules-transport,transport>>
module. This module is divided into the following sections:

Note, Elasticsearch is a peer to peer based system, nodes communicate
with one another directly if operations are delegated / broadcast. All
the main APIs (index, delete, search) do not communicate with the master
node. The responsibility of the master node is to maintain the global
cluster state, and act if nodes join or leave the cluster by reassigning
shards. Each time a cluster state is changed, the state is made known to
the other nodes in the cluster (the manner depends on the actual
discovery implementation).
<<modules-discovery-hosts-providers>>::

[float]
=== Settings
Discovery is the process where nodes find each other when the master is
unknown, such as when a node has just started up or when the previous
master has failed.

The `cluster.name` allows to create separated clusters from one another.
The default value for the cluster name is `elasticsearch`, though it is
recommended to change this to reflect the logical group name of the
cluster running.
<<modules-discovery-bootstrap-cluster>>::

include::discovery/azure.asciidoc[]
Bootstrapping a cluster is required when an Elasticsearch cluster starts up
for the very first time. In <<dev-vs-prod-mode,development mode>>, with no
discovery settings configured, this is automatically performed by the nodes
themselves. As this auto-bootstrapping is
<<modules-discovery-quorums,inherently unsafe>>, running a node in
<<dev-vs-prod-mode,production mode>> requires bootstrapping to be
explicitly configured via the
<<modules-discovery-bootstrap-cluster,`cluster.initial_master_nodes`
setting>>.

include::discovery/ec2.asciidoc[]
<<modules-discovery-adding-removing-nodes,Adding and removing master-eligible nodes>>::

include::discovery/gce.asciidoc[]
It is recommended to have a small and fixed number of master-eligible nodes
in a cluster, and to scale the cluster up and down by adding and removing
master-ineligible nodes only. However there are situations in which it may
be desirable to add or remove some master-eligible nodes to or from a
cluster. This section describes the process for adding or removing
master-eligible nodes, including the extra steps that need to be performed
when removing more than half of the master-eligible nodes at the same time.

<<cluster-state-publishing>>::

Cluster state publishing is the process by which the elected master node
updates the cluster state on all the other nodes in the cluster.

<<no-master-block>>::

The no-master block is put in place when there is no known elected master,
and can be configured to determine which operations should be rejected when
it is in place.

Advanced settings::

There are settings that allow advanced users to influence the
<<master-election-settings,master election>> and
<<fault-detection-settings,fault detection>> processes.

<<modules-discovery-quorums>>::

This section describes the detailed design behind the master election and
auto-reconfiguration logic.

include::discovery/discovery.asciidoc[]

include::discovery/bootstrapping.asciidoc[]

include::discovery/adding-removing-nodes.asciidoc[]

include::discovery/publishing.asciidoc[]

include::discovery/no-master-block.asciidoc[]

include::discovery/master-election.asciidoc[]

include::discovery/fault-detection.asciidoc[]

include::discovery/quorums.asciidoc[]

include::discovery/zen.asciidoc[]
125 changes: 125 additions & 0 deletions docs/reference/modules/discovery/adding-removing-nodes.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
[[modules-discovery-adding-removing-nodes]]
=== Adding and removing nodes

As nodes are added or removed Elasticsearch maintains an optimal level of fault
tolerance by automatically updating the cluster's _voting configuration_, which
is the set of <<master-node,master-eligible nodes>> whose responses are counted
when making decisions such as electing a new master or committing a new cluster
state.

It is recommended to have a small and fixed number of master-eligible nodes in a
cluster, and to scale the cluster up and down by adding and removing
master-ineligible nodes only. However there are situations in which it may be
desirable to add or remove some master-eligible nodes to or from a cluster.

==== Adding master-eligible nodes

If you wish to add some master-eligible nodes to your cluster, simply configure
the new nodes to find the existing cluster and start them up. Elasticsearch will
add the new nodes to the voting configuration if it is appropriate to do so.

==== Removing master-eligible nodes

When removing master-eligible nodes, it is important not to remove too many all
at the same time. For instance, if there are currently seven master-eligible
nodes and you wish to reduce this to three, it is not possible simply to stop
four of the nodes at once: to do so would leave only three nodes remaining,
which is less than half of the voting configuration, which means the cluster
cannot take any further actions.

As long as there are at least three master-eligible nodes in the cluster, as a
general rule it is best to remove nodes one-at-a-time, allowing enough time for
the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
configuration and adapt the fault tolerance level to the new set of nodes.

If there are only two master-eligible nodes remaining then neither node can be
safely removed since both are required to reliably make progress. You must first
inform Elasticsearch that one of the nodes should not be part of the voting
configuration, and that the voting power should instead be given to other nodes.
You can then take the excluded node offline without preventing the other node
from making progress. A node which is added to a voting configuration exclusion
list still works normally, but Elasticsearch tries to remove it from the voting
configuration so its vote is no longer required. Importantly, Elasticsearch
will never automatically move a node on the voting exclusions list back into the
voting configuration. Once an excluded node has been successfully
auto-reconfigured out of the voting configuration, it is safe to shut it down
without affecting the cluster's master-level availability. A node can be added
to the voting configuration exclusion list using the following API:

[source,js]
--------------------------------------------------
# Add node to voting configuration exclusions list and wait for the system to
# auto-reconfigure the node out of the voting configuration up to the default
# timeout of 30 seconds
POST /_cluster/voting_config_exclusions/node_name
# Add node to voting configuration exclusions list and wait for
# auto-reconfiguration up to one minute
POST /_cluster/voting_config_exclusions/node_name?timeout=1m
--------------------------------------------------
// CONSOLE
// TEST[skip:this would break the test cluster if executed]

The node that should be added to the exclusions list is specified using
<<cluster-nodes,node filters>> in place of `node_name` here. If a call to the
voting configuration exclusions API fails, you can safely retry it. Only a
successful response guarantees that the node has actually been removed from the
voting configuration and will not be reinstated.

Although the voting configuration exclusions API is most useful for down-scaling
a two-node to a one-node cluster, it is also possible to use it to remove
multiple master-eligible nodes all at the same time. Adding multiple nodes to
the exclusions list has the system try to auto-reconfigure all of these nodes
out of the voting configuration, allowing them to be safely shut down while
keeping the cluster available. In the example described above, shrinking a
seven-master-node cluster down to only have three master nodes, you could add
four nodes to the exclusions list, wait for confirmation, and then shut them
down simultaneously.

NOTE: Voting exclusions are only required when removing at least half of the
master-eligible nodes from a cluster in a short time period. They are not
required when removing master-ineligible nodes, nor are they required when
removing fewer than half of the master-eligible nodes.

Adding an exclusion for a node creates an entry for that node in the voting
configuration exclusions list, which has the system automatically try to
reconfigure the voting configuration to remove that node and prevents it from
returning to the voting configuration once it has removed. The current list of
exclusions is stored in the cluster state and can be inspected as follows:

[source,js]
--------------------------------------------------
GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
--------------------------------------------------
// CONSOLE

This list is limited in size by the following setting:

`cluster.max_voting_config_exclusions`::

Sets a limits on the number of voting configuration exclusions at any one
time. Defaults to `10`.

Since voting configuration exclusions are persistent and limited in number, they
must be cleaned up. Normally an exclusion is added when performing some
maintenance on the cluster, and the exclusions should be cleaned up when the
maintenance is complete. Clusters should have no voting configuration exclusions
in normal operation.

If a node is excluded from the voting configuration because it is to be shut
down permanently, its exclusion can be removed after it is shut down and removed
from the cluster. Exclusions can also be cleared if they were created in error
or were only required temporarily:

[source,js]
--------------------------------------------------
# Wait for all the nodes with voting configuration exclusions to be removed from
# the cluster and then remove all the exclusions, allowing any node to return to
# the voting configuration in the future.
DELETE /_cluster/voting_config_exclusions
# Immediately remove all the voting configuration exclusions, allowing any node
# to return to the voting configuration in the future.
DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
--------------------------------------------------
// CONSOLE
5 changes: 0 additions & 5 deletions docs/reference/modules/discovery/azure.asciidoc

This file was deleted.

Loading

0 comments on commit 1a23417

Please sign in to comment.