[Zen2] Update documentation for Zen2 (#34714)

This commit overhauls the documentation of discovery and cluster coordination, removing mention of the Zen Discovery module and replacing it with docs for the new cluster coordination mechanism introduced in 7.0. Relates #32006
elastic · Dec 20, 2018 · 1a23417 · 1a23417
1 parent 08bcd83
commit 1a23417
Show file tree

Hide file tree

Showing 27 changed files with 985 additions and 414 deletions.
diff --git a/docs/plugins/discovery.asciidoc b/docs/plugins/discovery.asciidoc
@@ -1,8 +1,8 @@
 [[discovery]]
 == Discovery Plugins
 
-Discovery plugins extend Elasticsearch by adding new discovery mechanisms that
-can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery].
+Discovery plugins extend Elasticsearch by adding new hosts providers that can be
+used to extend the {ref}/modules-discovery.html[cluster formation module].
 
 [float]
 ==== Core discovery plugins
@@ -11,22 +11,24 @@ The core discovery plugins are:
 
 <<discovery-ec2,EC2 discovery>>::
 
-The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery.
+The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API]
+for unicast discovery.
 
 <<discovery-azure-classic,Azure Classic discovery>>::
 
-The Azure Classic discovery plugin uses the Azure Classic API for unicast discovery.
+The Azure Classic discovery plugin uses the Azure Classic API for unicast
+discovery.
 
 <<discovery-gce,GCE discovery>>::
 
-The Google Compute Engine discovery plugin uses the GCE API for unicast discovery.
+The Google Compute Engine discovery plugin uses the GCE API for unicast
+discovery.
 
 [float]
 ==== Community contributed discovery plugins
 
 A number of discovery plugins have been contributed by our community:
 
-* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan)
 * https://github.com/fabric8io/elasticsearch-cloud-kubernetes[Kubernetes Discovery Plugin] (by Jimmi Dyson, http://fabric8.io[fabric8])
 
 include::discovery-ec2.asciidoc[]

diff --git a/docs/reference/migration/migrate_7_0.asciidoc b/docs/reference/migration/migrate_7_0.asciidoc
@@ -11,6 +11,7 @@ See also <<release-highlights>> and <<es-release-notes>>.
 
 * <<breaking_70_aggregations_changes>>
 * <<breaking_70_cluster_changes>>
+* <<breaking_70_discovery_changes>>
 * <<breaking_70_indices_changes>>
 * <<breaking_70_mappings_changes>>
 * <<breaking_70_search_changes>>
@@ -44,6 +45,7 @@ Elasticsearch 6.x in order to be readable by Elasticsearch 7.x.
 include::migrate_7_0/aggregations.asciidoc[]
 include::migrate_7_0/analysis.asciidoc[]
 include::migrate_7_0/cluster.asciidoc[]
+include::migrate_7_0/discovery.asciidoc[]
 include::migrate_7_0/indices.asciidoc[]
 include::migrate_7_0/mappings.asciidoc[]
 include::migrate_7_0/search.asciidoc[]

diff --git a/docs/reference/migration/migrate_7_0/cluster.asciidoc b/docs/reference/migration/migrate_7_0/cluster.asciidoc
@@ -25,12 +25,3 @@ Clusters now have soft limits on the total number of open shards in the cluster
 based on the number of nodes and the `cluster.max_shards_per_node` cluster
 setting, to prevent accidental operations that would destabilize the cluster.
 More information can be found in the <<misc-cluster,documentation for that setting>>.
-
-[float]
-==== Discovery configuration is required in production
-Production deployments of Elasticsearch now require at least one of the following settings
-to be specified in the `elasticsearch.yml` configuration file:
-
-- `discovery.zen.ping.unicast.hosts`
-- `discovery.zen.hosts_provider`
-- `cluster.initial_master_nodes`
diff --git a/docs/reference/migration/migrate_7_0/discovery.asciidoc b/docs/reference/migration/migrate_7_0/discovery.asciidoc
@@ -0,0 +1,40 @@
+[float]
+[[breaking_70_discovery_changes]]
+=== Discovery changes
+
+[float]
+==== Cluster bootstrapping is required if discovery is configured
+
+The first time a cluster is started, `cluster.initial_master_nodes` must be set
+to perform cluster bootstrapping. It should contain the names of the
+master-eligible nodes in the initial cluster and be defined on every
+master-eligible node in the cluster. See <<discovery-settings,the discovery
+settings summary>> for an example, and the
+<<modules-discovery-bootstrap-cluster,cluster bootstrapping reference
+documentation>> describes this setting in more detail.
+
+The `discovery.zen.minimum_master_nodes` setting is required during a rolling
+upgrade from 6.x, but can be removed in all other circumstances.
+
+[float]
+==== Removing master-eligible nodes sometimes requires voting exclusions
+
+If you wish to remove half or more of the master-eligible nodes from a cluster,
+you must first exclude the affected nodes from the voting configuration using
+the <<modules-discovery-adding-removing-nodes,voting config exclusions API>>.
+If you remove fewer than half of the master-eligible nodes at the same time,
+voting exclusions are not required.  If you remove only master-ineligible nodes
+such as data-only nodes or coordinating-only nodes, voting exclusions are not
+required. Likewise, if you add nodes to the cluster, voting exclusions are not
+required.
+
+[float]
+==== Discovery configuration is required in production
+
+Production deployments of Elasticsearch now require at least one of the
+following settings to be specified in the `elasticsearch.yml` configuration
+file:
+
+- `discovery.zen.ping.unicast.hosts`
+- `discovery.zen.hosts_provider`
+- `cluster.initial_master_nodes`
diff --git a/docs/reference/modules.asciidoc b/docs/reference/modules.asciidoc
@@ -18,13 +18,13 @@ These settings can be dynamically updated on a live cluster with the
 
 The modules in this section are:
 
-<<modules-cluster,Cluster-level routing and shard allocation>>::
+<<modules-discovery,Discovery and cluster formation>>::
 
-    Settings to control where, when, and how shards are allocated to nodes.
+    How nodes discover each other, elect a master and form a cluster.
 
-<<modules-discovery,Discovery>>::
+<<modules-cluster,Shard allocation and cluster-level routing>>::
 
-    How nodes discover each other to form a cluster.
+    Settings to control where, when, and how shards are allocated to nodes.
 
 <<modules-gateway,Gateway>>::
 
@@ -85,10 +85,10 @@ The modules in this section are:
 --
 
 
-include::modules/cluster.asciidoc[]
-
 include::modules/discovery.asciidoc[]
 
+include::modules/cluster.asciidoc[]
+
 include::modules/gateway.asciidoc[]
 
 include::modules/http.asciidoc[]

diff --git a/docs/reference/modules/cluster.asciidoc b/docs/reference/modules/cluster.asciidoc
@@ -1,5 +1,5 @@
 [[modules-cluster]]
-== Cluster
+== Shard allocation and cluster-level routing
 
 One of the main roles of the master is to decide which shards to allocate to
 which nodes, and when to move shards between nodes in order to rebalance the

diff --git a/docs/reference/modules/discovery.asciidoc b/docs/reference/modules/discovery.asciidoc
@@ -1,30 +1,75 @@
 [[modules-discovery]]
-== Discovery
+== Discovery and cluster formation
 
-The discovery module is responsible for discovering nodes within a
-cluster, as well as electing a master node.
+The discovery and cluster formation module is responsible for discovering
+nodes, electing a master, forming a cluster, and publishing the cluster state
+each time it changes. It is integrated with other modules. For example, all
+communication between nodes is done using the <<modules-transport,transport>>
+module. This module is divided into the following sections:
 
-Note, Elasticsearch is a peer to peer based system, nodes communicate
-with one another directly if operations are delegated / broadcast. All
-the main APIs (index, delete, search) do not communicate with the master
-node. The responsibility of the master node is to maintain the global
-cluster state, and act if nodes join or leave the cluster by reassigning
-shards. Each time a cluster state is changed, the state is made known to
-the other nodes in the cluster (the manner depends on the actual
-discovery implementation).
+<<modules-discovery-hosts-providers>>::
 
-[float]
-=== Settings
+    Discovery is the process where nodes find each other when the master is
+    unknown, such as when a node has just started up or when the previous
+    master has failed.
 
-The `cluster.name` allows to create separated clusters from one another.
-The default value for the cluster name is `elasticsearch`, though it is
-recommended to change this to reflect the logical group name of the
-cluster running.
+<<modules-discovery-bootstrap-cluster>>::
 
-include::discovery/azure.asciidoc[]
+    Bootstrapping a cluster is required when an Elasticsearch cluster starts up
+    for the very first time. In <<dev-vs-prod-mode,development mode>>, with no
+    discovery settings configured, this is automatically performed by the nodes
+    themselves. As this auto-bootstrapping is
+    <<modules-discovery-quorums,inherently unsafe>>, running a node in
+    <<dev-vs-prod-mode,production mode>> requires bootstrapping to be
+    explicitly configured via the
+    <<modules-discovery-bootstrap-cluster,`cluster.initial_master_nodes`
+    setting>>.
 
-include::discovery/ec2.asciidoc[]
+<<modules-discovery-adding-removing-nodes,Adding and removing master-eligible nodes>>::
 
-include::discovery/gce.asciidoc[]
+    It is recommended to have a small and fixed number of master-eligible nodes
+    in a cluster, and to scale the cluster up and down by adding and removing
+    master-ineligible nodes only. However there are situations in which it may
+    be desirable to add or remove some master-eligible nodes to or from a
+    cluster. This section describes the process for adding or removing
+    master-eligible nodes, including the extra steps that need to be performed
+    when removing more than half of the master-eligible nodes at the same time.
+
+<<cluster-state-publishing>>::
+
+    Cluster state publishing is the process by which the elected master node
+    updates the cluster state on all the other nodes in the cluster.
+
+<<no-master-block>>::
+
+    The no-master block is put in place when there is no known elected master,
+    and can be configured to determine which operations should be rejected when
+    it is in place.
+
+Advanced settings::
+
+    There are settings that allow advanced users to influence the
+    <<master-election-settings,master election>> and
+    <<fault-detection-settings,fault detection>> processes.
+
+<<modules-discovery-quorums>>::
+
+    This section describes the detailed design behind the master election and
+    auto-reconfiguration logic.
+
+include::discovery/discovery.asciidoc[]
+
+include::discovery/bootstrapping.asciidoc[]
+
+include::discovery/adding-removing-nodes.asciidoc[]
+
+include::discovery/publishing.asciidoc[]
+
+include::discovery/no-master-block.asciidoc[]
+
+include::discovery/master-election.asciidoc[]
+
+include::discovery/fault-detection.asciidoc[]
+
+include::discovery/quorums.asciidoc[]
 
-include::discovery/zen.asciidoc[]
diff --git a/docs/reference/modules/discovery/adding-removing-nodes.asciidoc b/docs/reference/modules/discovery/adding-removing-nodes.asciidoc
@@ -0,0 +1,125 @@
+[[modules-discovery-adding-removing-nodes]]
+=== Adding and removing nodes
+
+As nodes are added or removed Elasticsearch maintains an optimal level of fault
+tolerance by automatically updating the cluster's _voting configuration_, which
+is the set of <<master-node,master-eligible nodes>> whose responses are counted
+when making decisions such as electing a new master or committing a new cluster
+state.
+
+It is recommended to have a small and fixed number of master-eligible nodes in a
+cluster, and to scale the cluster up and down by adding and removing
+master-ineligible nodes only. However there are situations in which it may be
+desirable to add or remove some master-eligible nodes to or from a cluster.
+
+==== Adding master-eligible nodes
+
+If you wish to add some master-eligible nodes to your cluster, simply configure
+the new nodes to find the existing cluster and start them up. Elasticsearch will
+add the new nodes to the voting configuration if it is appropriate to do so.
+
+==== Removing master-eligible nodes
+
+When removing master-eligible nodes, it is important not to remove too many all
+at the same time. For instance, if there are currently seven master-eligible
+nodes and you wish to reduce this to three, it is not possible simply to stop
+four of the nodes at once: to do so would leave only three nodes remaining,
+which is less than half of the voting configuration, which means the cluster
+cannot take any further actions.
+
+As long as there are at least three master-eligible nodes in the cluster, as a
+general rule it is best to remove nodes one-at-a-time, allowing enough time for
+the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
+configuration and adapt the fault tolerance level to the new set of nodes.
+
+If there are only two master-eligible nodes remaining then neither node can be
+safely removed since both are required to reliably make progress. You must first
+inform Elasticsearch that one of the nodes should not be part of the voting
+configuration, and that the voting power should instead be given to other nodes.
+You can then take the excluded node offline without preventing the other node
+from making progress. A node which is added to a voting configuration exclusion
+list still works normally, but Elasticsearch tries to remove it from the voting
+configuration so its vote is no longer required.  Importantly, Elasticsearch
+will never automatically move a node on the voting exclusions list back into the
+voting configuration. Once an excluded node has been successfully
+auto-reconfigured out of the voting configuration, it is safe to shut it down
+without affecting the cluster's master-level availability. A node can be added
+to the voting configuration exclusion list using the following API:
+
+[source,js]
+--------------------------------------------------
+# Add node to voting configuration exclusions list and wait for the system to
+# auto-reconfigure the node out of the voting configuration up to the default
+# timeout of 30 seconds
+POST /_cluster/voting_config_exclusions/node_name
+
+# Add node to voting configuration exclusions list and wait for
+# auto-reconfiguration up to one minute
+POST /_cluster/voting_config_exclusions/node_name?timeout=1m
+--------------------------------------------------
+// CONSOLE
+// TEST[skip:this would break the test cluster if executed]
+
+The node that should be added to the exclusions list is specified using
+<<cluster-nodes,node filters>> in place of `node_name` here. If a call to the
+voting configuration exclusions API fails, you can safely retry it.  Only a
+successful response guarantees that the node has actually been removed from the
+voting configuration and will not be reinstated.
+
+Although the voting configuration exclusions API is most useful for down-scaling
+a two-node to a one-node cluster, it is also possible to use it to remove
+multiple master-eligible nodes all at the same time. Adding multiple nodes to
+the exclusions list has the system try to auto-reconfigure all of these nodes
+out of the voting configuration, allowing them to be safely shut down while
+keeping the cluster available. In the example described above, shrinking a
+seven-master-node cluster down to only have three master nodes, you could add
+four nodes to the exclusions list, wait for confirmation, and then shut them
+down simultaneously.
+
+NOTE: Voting exclusions are only required when removing at least half of the
+master-eligible nodes from a cluster in a short time period. They are not
+required when removing master-ineligible nodes, nor are they required when
+removing fewer than half of the master-eligible nodes.
+
+Adding an exclusion for a node creates an entry for that node in the voting
+configuration exclusions list, which has the system automatically try to
+reconfigure the voting configuration to remove that node and prevents it from
+returning to the voting configuration once it has removed. The current list of
+exclusions is stored in the cluster state and can be inspected as follows:
+
+[source,js]
+--------------------------------------------------
+GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
+--------------------------------------------------
+// CONSOLE
+
+This list is limited in size by the following setting:
+
+`cluster.max_voting_config_exclusions`::
+
+    Sets a limits on the number of voting configuration exclusions at any one
+    time.  Defaults to `10`.
+
+Since voting configuration exclusions are persistent and limited in number, they
+must be cleaned up. Normally an exclusion is added when performing some
+maintenance on the cluster, and the exclusions should be cleaned up when the
+maintenance is complete. Clusters should have no voting configuration exclusions
+in normal operation.
+
+If a node is excluded from the voting configuration because it is to be shut
+down permanently, its exclusion can be removed after it is shut down and removed
+from the cluster. Exclusions can also be cleared if they were created in error
+or were only required temporarily:
+
+[source,js]
+--------------------------------------------------
+# Wait for all the nodes with voting configuration exclusions to be removed from
+# the cluster and then remove all the exclusions, allowing any node to return to
+# the voting configuration in the future.
+DELETE /_cluster/voting_config_exclusions
+
+# Immediately remove all the voting configuration exclusions, allowing any node
+# to return to the voting configuration in the future.
+DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
+--------------------------------------------------
+// CONSOLE
diff --git a/docs/reference/modules/discovery/azure.asciidoc b/docs/reference/modules/discovery/azure.asciidoc