diff --git a/docs/plugins/discovery.asciidoc b/docs/plugins/discovery.asciidoc index 46b61146b128d..926acead09ea1 100644 --- a/docs/plugins/discovery.asciidoc +++ b/docs/plugins/discovery.asciidoc @@ -1,8 +1,8 @@ [[discovery]] == Discovery Plugins -Discovery plugins extend Elasticsearch by adding new discovery mechanisms that -can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery]. +Discovery plugins extend Elasticsearch by adding new hosts providers that can be +used to extend the {ref}/modules-discovery.html[cluster formation module]. [float] ==== Core discovery plugins @@ -11,22 +11,24 @@ The core discovery plugins are: <>:: -The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery. +The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API] +for unicast discovery. <>:: -The Azure Classic discovery plugin uses the Azure Classic API for unicast discovery. +The Azure Classic discovery plugin uses the Azure Classic API for unicast +discovery. <>:: -The Google Compute Engine discovery plugin uses the GCE API for unicast discovery. +The Google Compute Engine discovery plugin uses the GCE API for unicast +discovery. [float] ==== Community contributed discovery plugins A number of discovery plugins have been contributed by our community: -* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan) * https://github.com/fabric8io/elasticsearch-cloud-kubernetes[Kubernetes Discovery Plugin] (by Jimmi Dyson, http://fabric8.io[fabric8]) include::discovery-ec2.asciidoc[] diff --git a/docs/reference/migration/migrate_7_0.asciidoc b/docs/reference/migration/migrate_7_0.asciidoc index 45f383435e4bc..9f99604318aa9 100644 --- a/docs/reference/migration/migrate_7_0.asciidoc +++ b/docs/reference/migration/migrate_7_0.asciidoc @@ -11,6 +11,7 @@ See also <> and <>. * <> * <> +* <> * <> * <> * <> @@ -44,6 +45,7 @@ Elasticsearch 6.x in order to be readable by Elasticsearch 7.x. include::migrate_7_0/aggregations.asciidoc[] include::migrate_7_0/analysis.asciidoc[] include::migrate_7_0/cluster.asciidoc[] +include::migrate_7_0/discovery.asciidoc[] include::migrate_7_0/indices.asciidoc[] include::migrate_7_0/mappings.asciidoc[] include::migrate_7_0/search.asciidoc[] diff --git a/docs/reference/migration/migrate_7_0/cluster.asciidoc b/docs/reference/migration/migrate_7_0/cluster.asciidoc index 732270706ff3d..bfe7d5df2d094 100644 --- a/docs/reference/migration/migrate_7_0/cluster.asciidoc +++ b/docs/reference/migration/migrate_7_0/cluster.asciidoc @@ -25,12 +25,3 @@ Clusters now have soft limits on the total number of open shards in the cluster based on the number of nodes and the `cluster.max_shards_per_node` cluster setting, to prevent accidental operations that would destabilize the cluster. More information can be found in the <>. - -[float] -==== Discovery configuration is required in production -Production deployments of Elasticsearch now require at least one of the following settings -to be specified in the `elasticsearch.yml` configuration file: - -- `discovery.zen.ping.unicast.hosts` -- `discovery.zen.hosts_provider` -- `cluster.initial_master_nodes` diff --git a/docs/reference/migration/migrate_7_0/discovery.asciidoc b/docs/reference/migration/migrate_7_0/discovery.asciidoc new file mode 100644 index 0000000000000..d568e7fe32c25 --- /dev/null +++ b/docs/reference/migration/migrate_7_0/discovery.asciidoc @@ -0,0 +1,40 @@ +[float] +[[breaking_70_discovery_changes]] +=== Discovery changes + +[float] +==== Cluster bootstrapping is required if discovery is configured + +The first time a cluster is started, `cluster.initial_master_nodes` must be set +to perform cluster bootstrapping. It should contain the names of the +master-eligible nodes in the initial cluster and be defined on every +master-eligible node in the cluster. See <> for an example, and the +<> describes this setting in more detail. + +The `discovery.zen.minimum_master_nodes` setting is required during a rolling +upgrade from 6.x, but can be removed in all other circumstances. + +[float] +==== Removing master-eligible nodes sometimes requires voting exclusions + +If you wish to remove half or more of the master-eligible nodes from a cluster, +you must first exclude the affected nodes from the voting configuration using +the <>. +If you remove fewer than half of the master-eligible nodes at the same time, +voting exclusions are not required. If you remove only master-ineligible nodes +such as data-only nodes or coordinating-only nodes, voting exclusions are not +required. Likewise, if you add nodes to the cluster, voting exclusions are not +required. + +[float] +==== Discovery configuration is required in production + +Production deployments of Elasticsearch now require at least one of the +following settings to be specified in the `elasticsearch.yml` configuration +file: + +- `discovery.zen.ping.unicast.hosts` +- `discovery.zen.hosts_provider` +- `cluster.initial_master_nodes` diff --git a/docs/reference/modules.asciidoc b/docs/reference/modules.asciidoc index 1dbe72c3b86e4..f8b6c2784a075 100644 --- a/docs/reference/modules.asciidoc +++ b/docs/reference/modules.asciidoc @@ -18,13 +18,13 @@ These settings can be dynamically updated on a live cluster with the The modules in this section are: -<>:: +<>:: - Settings to control where, when, and how shards are allocated to nodes. + How nodes discover each other, elect a master and form a cluster. -<>:: +<>:: - How nodes discover each other to form a cluster. + Settings to control where, when, and how shards are allocated to nodes. <>:: @@ -85,10 +85,10 @@ The modules in this section are: -- -include::modules/cluster.asciidoc[] - include::modules/discovery.asciidoc[] +include::modules/cluster.asciidoc[] + include::modules/gateway.asciidoc[] include::modules/http.asciidoc[] diff --git a/docs/reference/modules/cluster.asciidoc b/docs/reference/modules/cluster.asciidoc index c4b6445292726..810ed7c4a34b4 100644 --- a/docs/reference/modules/cluster.asciidoc +++ b/docs/reference/modules/cluster.asciidoc @@ -1,5 +1,5 @@ [[modules-cluster]] -== Cluster +== Shard allocation and cluster-level routing One of the main roles of the master is to decide which shards to allocate to which nodes, and when to move shards between nodes in order to rebalance the diff --git a/docs/reference/modules/discovery.asciidoc b/docs/reference/modules/discovery.asciidoc index 292748d1d7b90..546c347fa3bb8 100644 --- a/docs/reference/modules/discovery.asciidoc +++ b/docs/reference/modules/discovery.asciidoc @@ -1,30 +1,75 @@ [[modules-discovery]] -== Discovery +== Discovery and cluster formation -The discovery module is responsible for discovering nodes within a -cluster, as well as electing a master node. +The discovery and cluster formation module is responsible for discovering +nodes, electing a master, forming a cluster, and publishing the cluster state +each time it changes. It is integrated with other modules. For example, all +communication between nodes is done using the <> +module. This module is divided into the following sections: -Note, Elasticsearch is a peer to peer based system, nodes communicate -with one another directly if operations are delegated / broadcast. All -the main APIs (index, delete, search) do not communicate with the master -node. The responsibility of the master node is to maintain the global -cluster state, and act if nodes join or leave the cluster by reassigning -shards. Each time a cluster state is changed, the state is made known to -the other nodes in the cluster (the manner depends on the actual -discovery implementation). +<>:: -[float] -=== Settings + Discovery is the process where nodes find each other when the master is + unknown, such as when a node has just started up or when the previous + master has failed. -The `cluster.name` allows to create separated clusters from one another. -The default value for the cluster name is `elasticsearch`, though it is -recommended to change this to reflect the logical group name of the -cluster running. +<>:: -include::discovery/azure.asciidoc[] + Bootstrapping a cluster is required when an Elasticsearch cluster starts up + for the very first time. In <>, with no + discovery settings configured, this is automatically performed by the nodes + themselves. As this auto-bootstrapping is + <>, running a node in + <> requires bootstrapping to be + explicitly configured via the + <>. -include::discovery/ec2.asciidoc[] +<>:: -include::discovery/gce.asciidoc[] + It is recommended to have a small and fixed number of master-eligible nodes + in a cluster, and to scale the cluster up and down by adding and removing + master-ineligible nodes only. However there are situations in which it may + be desirable to add or remove some master-eligible nodes to or from a + cluster. This section describes the process for adding or removing + master-eligible nodes, including the extra steps that need to be performed + when removing more than half of the master-eligible nodes at the same time. + +<>:: + + Cluster state publishing is the process by which the elected master node + updates the cluster state on all the other nodes in the cluster. + +<>:: + + The no-master block is put in place when there is no known elected master, + and can be configured to determine which operations should be rejected when + it is in place. + +Advanced settings:: + + There are settings that allow advanced users to influence the + <> and + <> processes. + +<>:: + + This section describes the detailed design behind the master election and + auto-reconfiguration logic. + +include::discovery/discovery.asciidoc[] + +include::discovery/bootstrapping.asciidoc[] + +include::discovery/adding-removing-nodes.asciidoc[] + +include::discovery/publishing.asciidoc[] + +include::discovery/no-master-block.asciidoc[] + +include::discovery/master-election.asciidoc[] + +include::discovery/fault-detection.asciidoc[] + +include::discovery/quorums.asciidoc[] -include::discovery/zen.asciidoc[] diff --git a/docs/reference/modules/discovery/adding-removing-nodes.asciidoc b/docs/reference/modules/discovery/adding-removing-nodes.asciidoc new file mode 100644 index 0000000000000..d40e903fa88f1 --- /dev/null +++ b/docs/reference/modules/discovery/adding-removing-nodes.asciidoc @@ -0,0 +1,125 @@ +[[modules-discovery-adding-removing-nodes]] +=== Adding and removing nodes + +As nodes are added or removed Elasticsearch maintains an optimal level of fault +tolerance by automatically updating the cluster's _voting configuration_, which +is the set of <> whose responses are counted +when making decisions such as electing a new master or committing a new cluster +state. + +It is recommended to have a small and fixed number of master-eligible nodes in a +cluster, and to scale the cluster up and down by adding and removing +master-ineligible nodes only. However there are situations in which it may be +desirable to add or remove some master-eligible nodes to or from a cluster. + +==== Adding master-eligible nodes + +If you wish to add some master-eligible nodes to your cluster, simply configure +the new nodes to find the existing cluster and start them up. Elasticsearch will +add the new nodes to the voting configuration if it is appropriate to do so. + +==== Removing master-eligible nodes + +When removing master-eligible nodes, it is important not to remove too many all +at the same time. For instance, if there are currently seven master-eligible +nodes and you wish to reduce this to three, it is not possible simply to stop +four of the nodes at once: to do so would leave only three nodes remaining, +which is less than half of the voting configuration, which means the cluster +cannot take any further actions. + +As long as there are at least three master-eligible nodes in the cluster, as a +general rule it is best to remove nodes one-at-a-time, allowing enough time for +the cluster to <> the voting +configuration and adapt the fault tolerance level to the new set of nodes. + +If there are only two master-eligible nodes remaining then neither node can be +safely removed since both are required to reliably make progress. You must first +inform Elasticsearch that one of the nodes should not be part of the voting +configuration, and that the voting power should instead be given to other nodes. +You can then take the excluded node offline without preventing the other node +from making progress. A node which is added to a voting configuration exclusion +list still works normally, but Elasticsearch tries to remove it from the voting +configuration so its vote is no longer required. Importantly, Elasticsearch +will never automatically move a node on the voting exclusions list back into the +voting configuration. Once an excluded node has been successfully +auto-reconfigured out of the voting configuration, it is safe to shut it down +without affecting the cluster's master-level availability. A node can be added +to the voting configuration exclusion list using the following API: + +[source,js] +-------------------------------------------------- +# Add node to voting configuration exclusions list and wait for the system to +# auto-reconfigure the node out of the voting configuration up to the default +# timeout of 30 seconds +POST /_cluster/voting_config_exclusions/node_name + +# Add node to voting configuration exclusions list and wait for +# auto-reconfiguration up to one minute +POST /_cluster/voting_config_exclusions/node_name?timeout=1m +-------------------------------------------------- +// CONSOLE +// TEST[skip:this would break the test cluster if executed] + +The node that should be added to the exclusions list is specified using +<> in place of `node_name` here. If a call to the +voting configuration exclusions API fails, you can safely retry it. Only a +successful response guarantees that the node has actually been removed from the +voting configuration and will not be reinstated. + +Although the voting configuration exclusions API is most useful for down-scaling +a two-node to a one-node cluster, it is also possible to use it to remove +multiple master-eligible nodes all at the same time. Adding multiple nodes to +the exclusions list has the system try to auto-reconfigure all of these nodes +out of the voting configuration, allowing them to be safely shut down while +keeping the cluster available. In the example described above, shrinking a +seven-master-node cluster down to only have three master nodes, you could add +four nodes to the exclusions list, wait for confirmation, and then shut them +down simultaneously. + +NOTE: Voting exclusions are only required when removing at least half of the +master-eligible nodes from a cluster in a short time period. They are not +required when removing master-ineligible nodes, nor are they required when +removing fewer than half of the master-eligible nodes. + +Adding an exclusion for a node creates an entry for that node in the voting +configuration exclusions list, which has the system automatically try to +reconfigure the voting configuration to remove that node and prevents it from +returning to the voting configuration once it has removed. The current list of +exclusions is stored in the cluster state and can be inspected as follows: + +[source,js] +-------------------------------------------------- +GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions +-------------------------------------------------- +// CONSOLE + +This list is limited in size by the following setting: + +`cluster.max_voting_config_exclusions`:: + + Sets a limits on the number of voting configuration exclusions at any one + time. Defaults to `10`. + +Since voting configuration exclusions are persistent and limited in number, they +must be cleaned up. Normally an exclusion is added when performing some +maintenance on the cluster, and the exclusions should be cleaned up when the +maintenance is complete. Clusters should have no voting configuration exclusions +in normal operation. + +If a node is excluded from the voting configuration because it is to be shut +down permanently, its exclusion can be removed after it is shut down and removed +from the cluster. Exclusions can also be cleared if they were created in error +or were only required temporarily: + +[source,js] +-------------------------------------------------- +# Wait for all the nodes with voting configuration exclusions to be removed from +# the cluster and then remove all the exclusions, allowing any node to return to +# the voting configuration in the future. +DELETE /_cluster/voting_config_exclusions + +# Immediately remove all the voting configuration exclusions, allowing any node +# to return to the voting configuration in the future. +DELETE /_cluster/voting_config_exclusions?wait_for_removal=false +-------------------------------------------------- +// CONSOLE diff --git a/docs/reference/modules/discovery/azure.asciidoc b/docs/reference/modules/discovery/azure.asciidoc deleted file mode 100644 index 1343819b02a85..0000000000000 --- a/docs/reference/modules/discovery/azure.asciidoc +++ /dev/null @@ -1,5 +0,0 @@ -[[modules-discovery-azure-classic]] -=== Azure Classic Discovery - -Azure classic discovery allows to use the Azure Classic APIs to perform automatic discovery (similar to multicast). -It is available as a plugin. See {plugins}/discovery-azure-classic.html[discovery-azure-classic] for more information. diff --git a/docs/reference/modules/discovery/bootstrapping.asciidoc b/docs/reference/modules/discovery/bootstrapping.asciidoc new file mode 100644 index 0000000000000..4b5aa532d4874 --- /dev/null +++ b/docs/reference/modules/discovery/bootstrapping.asciidoc @@ -0,0 +1,105 @@ +[[modules-discovery-bootstrap-cluster]] +=== Bootstrapping a cluster + +Starting an Elasticsearch cluster for the very first time requires the initial +set of <> to be explicitly defined on one or +more of the master-eligible nodes in the cluster. This is known as _cluster +bootstrapping_. This is only required the very first time the cluster starts +up: nodes that have already joined a cluster store this information in their +data folder and freshly-started nodes that are joining an existing cluster +obtain this information from the cluster's elected master. This information is +given using this setting: + +`cluster.initial_master_nodes`:: + + Sets a list of the <> or transport addresses of the + initial set of master-eligible nodes in a brand-new cluster. By default + this list is empty, meaning that this node expects to join a cluster that + has already been bootstrapped. + +This setting can be given on the command line or in the `elasticsearch.yml` +configuration file when starting up a master-eligible node. Once the cluster +has formed this setting is no longer required and is ignored. It need not be set +on master-ineligible nodes, nor on master-eligible nodes that are started to +join an existing cluster. Note that master-eligible nodes should use storage +that persists across restarts. If they do not, and +`cluster.initial_master_nodes` is set, and a full cluster restart occurs, then +another brand-new cluster will form and this may result in data loss. + +It is technically sufficient to set `cluster.initial_master_nodes` on a single +master-eligible node in the cluster, and only to mention that single node in the +setting's value, but this provides no fault tolerance before the cluster has +fully formed. It is therefore better to bootstrap using at least three +master-eligible nodes, each with a `cluster.initial_master_nodes` setting +containing all three nodes. + +NOTE: In alpha releases, all listed master-eligible nodes are required to be +discovered before bootstrapping can take place. This requirement will be relaxed +in production-ready releases. + +WARNING: You must set `cluster.initial_master_nodes` to the same list of nodes +on each node on which it is set in order to be sure that only a single cluster +forms during bootstrapping and therefore to avoid the risk of data loss. + +For a cluster with 3 master-eligible nodes (with <> +`master-a`, `master-b` and `master-c`) the configuration will look as follows: + +[source,yaml] +-------------------------------------------------- +cluster.initial_master_nodes: + - master-a + - master-b + - master-c +-------------------------------------------------- + +Alternatively the IP addresses or hostnames (<>) can be used. If there is more than one Elasticsearch node +with the same IP address or hostname then the transport ports must also be given +to specify exactly which node is meant: + +[source,yaml] +-------------------------------------------------- +cluster.initial_master_nodes: + - 10.0.10.101 + - 10.0.10.102:9300 + - 10.0.10.102:9301 + - master-node-hostname +-------------------------------------------------- + +Like all node settings, it is also possible to specify the initial set of master +nodes on the command-line that is used to start Elasticsearch: + +[source,bash] +-------------------------------------------------- +$ bin/elasticsearch -Ecluster.initial_master_nodes=master-a,master-b,master-c +-------------------------------------------------- + +[float] +==== Choosing a cluster name + +The <> setting enables you to create multiple +clusters which are separated from each other. Nodes verify that they agree on +their cluster name when they first connect to each other, and Elasticsearch +will only form a cluster from nodes that all have the same cluster name. The +default value for the cluster name is `elasticsearch`, but it is recommended to +change this to reflect the logical name of the cluster. + +[float] +==== Auto-bootstrapping in development mode + +If the cluster is running with a completely default configuration then it will +automatically bootstrap a cluster based on the nodes that could be discovered to +be running on the same host within a short time after startup. This means that +by default it is possible to start up several nodes on a single machine and have +them automatically form a cluster which is very useful for development +environments and experimentation. However, since nodes may not always +successfully discover each other quickly enough this automatic bootstrapping +cannot be relied upon and cannot be used in production deployments. + +If any of the following settings are configured then auto-bootstrapping will not +take place, and you must configure `cluster.initial_master_nodes` as described +in the <>: + +* `discovery.zen.hosts_provider` +* `discovery.zen.ping.unicast.hosts` +* `cluster.initial_master_nodes` diff --git a/docs/reference/modules/discovery/discovery.asciidoc b/docs/reference/modules/discovery/discovery.asciidoc new file mode 100644 index 0000000000000..dd2dc47a79dfb --- /dev/null +++ b/docs/reference/modules/discovery/discovery.asciidoc @@ -0,0 +1,195 @@ +[[modules-discovery-hosts-providers]] +=== Discovery + +Discovery is the process by which the cluster formation module finds other +nodes with which to form a cluster. This process runs when you start an +Elasticsearch node or when a node believes the master node failed and continues +until the master node is found or a new master node is elected. + +Discovery operates in two phases: First, each node probes the addresses of all +known master-eligible nodes by connecting to each address and attempting to +identify the node to which it is connected. Secondly it shares with the remote +node a list of all of its known master-eligible peers and the remote node +responds with _its_ peers in turn. The node then probes all the new nodes that +it just discovered, requests their peers, and so on. + +This process starts with a list of _seed_ addresses from one or more +<>, together with the addresses of +any master-eligible nodes that were in the last known cluster. The process +operates in two phases: First, each node probes the seed addresses by +connecting to each address and attempting to identify the node to which it is +connected. Secondly it shares with the remote node a list of all of its known +master-eligible peers and the remote node responds with _its_ peers in turn. +The node then probes all the new nodes that it just discovered, requests their +peers, and so on. + +If the node is not master-eligible then it continues this discovery process +until it has discovered an elected master node. If no elected master is +discovered then the node will retry after `discovery.find_peers_interval` which +defaults to `1s`. + +If the node is master-eligible then it continues this discovery process until it +has either discovered an elected master node or else it has discovered enough +masterless master-eligible nodes to complete an election. If neither of these +occur quickly enough then the node will retry after +`discovery.find_peers_interval` which defaults to `1s`. + +[[built-in-hosts-providers]] +==== Hosts providers + +By default the cluster formation module offers two hosts providers to configure +the list of seed nodes: a _settings_-based and a _file_-based hosts provider. +It can be extended to support cloud environments and other forms of hosts +providers via {plugins}/discovery.html[discovery plugins]. Hosts providers are +configured using the `discovery.zen.hosts_provider` setting, which defaults to +the _settings_-based hosts provider. Multiple hosts providers can be specified +as a list. + +[float] +[[settings-based-hosts-provider]] +===== Settings-based hosts provider + +The settings-based hosts provider uses a node setting to configure a static list +of hosts to use as seed nodes. These hosts can be specified as hostnames or IP +addresses; hosts specified as hostnames are resolved to IP addresses during each +round of discovery. Note that if you are in an environment where DNS resolutions +vary with time, you might need to adjust your <>. + +The list of hosts is set using the <> static +setting. For example: + +[source,yaml] +-------------------------------------------------- +discovery.zen.ping.unicast.hosts: + - 192.168.1.10:9300 + - 192.168.1.11 <1> + - seeds.mydomain.com <2> +-------------------------------------------------- +<1> The port will default to `transport.profiles.default.port` and fallback to + `transport.port` if not specified. +<2> A hostname that resolves to multiple IP addresses will try all resolved + addresses. + +Additionally, the `discovery.zen.ping.unicast.hosts.resolve_timeout` configures +the amount of time to wait for DNS lookups on each round of discovery. This is +specified as a <> and defaults to 5s. + +Unicast discovery uses the <> module to perform the +discovery. + +[float] +[[file-based-hosts-provider]] +===== File-based hosts provider + +The file-based hosts provider configures a list of hosts via an external file. +Elasticsearch reloads this file when it changes, so that the list of seed nodes +can change dynamically without needing to restart each node. For example, this +gives a convenient mechanism for an Elasticsearch instance that is run in a +Docker container to be dynamically supplied with a list of IP addresses to +connect to when those IP addresses may not be known at node startup. + +To enable file-based discovery, configure the `file` hosts provider as follows: + +[source,txt] +---------------------------------------------------------------- +discovery.zen.hosts_provider: file +---------------------------------------------------------------- + +Then create a file at `$ES_PATH_CONF/unicast_hosts.txt` in the format described +below. Any time a change is made to the `unicast_hosts.txt` file the new changes +will be picked up by Elasticsearch and the new hosts list will be used. + +Note that the file-based discovery plugin augments the unicast hosts list in +`elasticsearch.yml`. If there are valid unicast host entries in +`discovery.zen.ping.unicast.hosts`, they are used in addition to those +supplied in `unicast_hosts.txt`. + +The `discovery.zen.ping.unicast.hosts.resolve_timeout` setting also applies to +DNS lookups for nodes specified by address via file-based discovery. This is +specified as a <> and defaults to 5s. + +The format of the file is to specify one node entry per line. Each node entry +consists of the host (host name or IP address) and an optional transport port +number. If the port number is specified, is must come immediately after the +host (on the same line) separated by a `:`. If the port number is not +specified, a default value of 9300 is used. + +For example, this is an example of `unicast_hosts.txt` for a cluster with four +nodes that participate in unicast discovery, some of which are not running on +the default port: + +[source,txt] +---------------------------------------------------------------- +10.10.10.5 +10.10.10.6:9305 +10.10.10.5:10005 +# an IPv6 address +[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301 +---------------------------------------------------------------- + +Host names are allowed instead of IP addresses (similar to +`discovery.zen.ping.unicast.hosts`), and IPv6 addresses must be specified in +brackets with the port coming after the brackets. + +You can also add comments to this file. All comments must appear on +their lines starting with `#` (i.e. comments cannot start in the middle of a +line). + +[float] +[[ec2-hosts-provider]] +===== EC2 hosts provider + +The {plugins}/discovery-ec2.html[EC2 discovery plugin] adds a hosts provider +that uses the https://github.com/aws/aws-sdk-java[AWS API] to find a list of +seed nodes. + +[float] +[[azure-classic-hosts-provider]] +===== Azure Classic hosts provider + +The {plugins}/discovery-azure-classic.html[Azure Classic discovery plugin] adds +a hosts provider that uses the Azure Classic API find a list of seed nodes. + +[float] +[[gce-hosts-provider]] +===== Google Compute Engine hosts provider + +The {plugins}/discovery-gce.html[GCE discovery plugin] adds a hosts provider +that uses the GCE API find a list of seed nodes. + +[float] +==== Discovery settings + +The discovery process is controlled by the following settings. + +`discovery.find_peers_interval`:: + + Sets how long a node will wait before attempting another discovery round. + Defaults to `1s`. + +`discovery.request_peers_timeout`:: + + Sets how long a node will wait after asking its peers again before + considering the request to have failed. Defaults to `3s`. + +`discovery.probe.connect_timeout`:: + + Sets how long to wait when attempting to connect to each address. Defaults + to `3s`. + +`discovery.probe.handshake_timeout`:: + + Sets how long to wait when attempting to identify the remote node via a + handshake. Defaults to `1s`. + +`discovery.cluster_formation_warning_timeout`:: + + Sets how long a node will try to form a cluster before logging a warning + that the cluster did not form. Defaults to `10s`. + +If a cluster has not formed after `discovery.cluster_formation_warning_timeout` +has elapsed then the node will log a warning message that starts with the phrase +`master not discovered` which describes the current state of the discovery +process. + diff --git a/docs/reference/modules/discovery/ec2.asciidoc b/docs/reference/modules/discovery/ec2.asciidoc deleted file mode 100644 index ba15f6bffa4cd..0000000000000 --- a/docs/reference/modules/discovery/ec2.asciidoc +++ /dev/null @@ -1,4 +0,0 @@ -[[modules-discovery-ec2]] -=== EC2 Discovery - -EC2 discovery is available as a plugin. See {plugins}/discovery-ec2.html[discovery-ec2] for more information. diff --git a/docs/reference/modules/discovery/fault-detection.asciidoc b/docs/reference/modules/discovery/fault-detection.asciidoc new file mode 100644 index 0000000000000..0a8ff5fa2081c --- /dev/null +++ b/docs/reference/modules/discovery/fault-detection.asciidoc @@ -0,0 +1,52 @@ +[[fault-detection-settings]] +=== Cluster fault detection settings + +An elected master periodically checks each of the nodes in the cluster in order +to ensure that they are still connected and healthy, and in turn each node in +the cluster periodically checks the health of the elected master. These checks +are known respectively as _follower checks_ and _leader checks_. + +Elasticsearch allows for these checks occasionally to fail or timeout without +taking any action, and will only consider a node to be truly faulty after a +number of consecutive checks have failed. The following settings control the +behaviour of fault detection. + +`cluster.fault_detection.follower_check.interval`:: + + Sets how long the elected master waits between follower checks to each + other node in the cluster. Defaults to `1s`. + +`cluster.fault_detection.follower_check.timeout`:: + + Sets how long the elected master waits for a response to a follower check + before considering it to have failed. Defaults to `30s`. + +`cluster.fault_detection.follower_check.retry_count`:: + + Sets how many consecutive follower check failures must occur to each node + before the elected master considers that node to be faulty and removes it + from the cluster. Defaults to `3`. + +`cluster.fault_detection.leader_check.interval`:: + + Sets how long each node waits between checks of the elected master. + Defaults to `1s`. + +`cluster.fault_detection.leader_check.timeout`:: + + Sets how long each node waits for a response to a leader check from the + elected master before considering it to have failed. Defaults to `30s`. + +`cluster.fault_detection.leader_check.retry_count`:: + + Sets how many consecutive leader check failures must occur before a node + considers the elected master to be faulty and attempts to find or elect a + new master. Defaults to `3`. + +If the elected master detects that a node has disconnected then this is treated +as an immediate failure, bypassing the timeouts and retries listed above, and +the master attempts to remove the node from the cluster. Similarly, if a node +detects that the elected master has disconnected then this is treated as an +immediate failure, bypassing the timeouts and retries listed above, and the +follower restarts its discovery phase to try and find or elect a new master. + diff --git a/docs/reference/modules/discovery/gce.asciidoc b/docs/reference/modules/discovery/gce.asciidoc deleted file mode 100644 index ea367d52ceb75..0000000000000 --- a/docs/reference/modules/discovery/gce.asciidoc +++ /dev/null @@ -1,6 +0,0 @@ -[[modules-discovery-gce]] -=== Google Compute Engine Discovery - -Google Compute Engine (GCE) discovery allows to use the GCE APIs to perform automatic discovery (similar to multicast). -It is available as a plugin. See {plugins}/discovery-gce.html[discovery-gce] for more information. - diff --git a/docs/reference/modules/discovery/master-election.asciidoc b/docs/reference/modules/discovery/master-election.asciidoc new file mode 100644 index 0000000000000..60d09e5545b40 --- /dev/null +++ b/docs/reference/modules/discovery/master-election.asciidoc @@ -0,0 +1,40 @@ +[[master-election-settings]] +=== Master election settings + +The following settings control the scheduling of elections. + +`cluster.election.initial_timeout`:: + + Sets the upper bound on how long a node will wait initially, or after the + elected master fails, before attempting its first election. This defaults + to `100ms`. + +`cluster.election.back_off_time`:: + + Sets the amount to increase the upper bound on the wait before an election + on each election failure. Note that this is _linear_ backoff. This defaults + to `100ms` + +`cluster.election.max_timeout`:: + + Sets the maximum upper bound on how long a node will wait before attempting + an first election, so that an network partition that lasts for a long time + does not result in excessively sparse elections. This defaults to `10s` + +`cluster.election.duration`:: + + Sets how long each election is allowed to take before a node considers it to + have failed and schedules a retry. This defaults to `500ms`. + +[float] +==== Joining an elected master + +During master election, or when joining an existing formed cluster, a node will +send a join request to the master in order to be officially added to the +cluster. This join process can be configured with the following settings. + +`cluster.join.timeout`:: + + Sets how long a node will wait after sending a request to join a cluster + before it considers the request to have failed and retries. Defaults to + `60s`. diff --git a/docs/reference/modules/discovery/no-master-block.asciidoc b/docs/reference/modules/discovery/no-master-block.asciidoc new file mode 100644 index 0000000000000..3099aaf66796d --- /dev/null +++ b/docs/reference/modules/discovery/no-master-block.asciidoc @@ -0,0 +1,22 @@ +[[no-master-block]] +=== No master block settings + +For the cluster to be fully operational, it must have an active master. The +`discovery.zen.no_master_block` settings controls what operations should be +rejected when there is no active master. + +The `discovery.zen.no_master_block` setting has two valid values: + +[horizontal] +`all`:: All operations on the node--i.e. both read & writes--will be rejected. +This also applies for api cluster state read or write operations, like the get +index settings, put mapping and cluster state api. +`write`:: (default) Write operations will be rejected. Read operations will +succeed, based on the last known cluster configuration. This may result in +partial reads of stale data as this node may be isolated from the rest of the +cluster. + +The `discovery.zen.no_master_block` setting doesn't apply to nodes-based APIs +(for example cluster stats, node info, and node stats APIs). Requests to these +APIs will not be blocked and can run on any available node. + diff --git a/docs/reference/modules/discovery/publishing.asciidoc b/docs/reference/modules/discovery/publishing.asciidoc new file mode 100644 index 0000000000000..8c69290edc706 --- /dev/null +++ b/docs/reference/modules/discovery/publishing.asciidoc @@ -0,0 +1,42 @@ +[[cluster-state-publishing]] +=== Publishing the cluster state + +The master node is the only node in a cluster that can make changes to the +cluster state. The master node processes one batch of cluster state updates at +a time, computing the required changes and publishing the updated cluster state +to all the other nodes in the cluster. Each publication starts with the master +broadcasting the updated cluster state to all nodes in the cluster. Each node +responds with an acknowledgement but does not yet apply the newly-received +state. Once the master has collected acknowledgements from enough +master-eligible nodes, the new cluster state is said to be _committed_ and the +master broadcasts another message instructing nodes to apply the now-committed +state. Each node receives this message, applies the updated state, and then +sends a second acknowledgement back to the master. + +The master allows a limited amount of time for each cluster state update to be +completely published to all nodes. It is defined by the +`cluster.publish.timeout` setting, which defaults to `30s`, measured from the +time the publication started. If this time is reached before the new cluster +state is committed then the cluster state change is rejected and the master +considers itself to have failed. It stands down and starts trying to elect a +new master. + +If the new cluster state is committed before `cluster.publish.timeout` has +elapsed, the master node considers the change to have succeeded. It waits until +the timeout has elapsed or until it has received acknowledgements that each +node in the cluster has applied the updated state, and then starts processing +and publishing the next cluster state update. If some acknowledgements have not +been received (i.e. some nodes have not yet confirmed that they have applied +the current update), these nodes are said to be _lagging_ since their cluster +states have fallen behind the master's latest state. The master waits for the +lagging nodes to catch up for a further time, `cluster.follower_lag.timeout`, +which defaults to `90s`. If a node has still not successfully applied the +cluster state update within this time then it is considered to have failed and +is removed from the cluster. + +NOTE: Elasticsearch is a peer to peer based system, in which nodes communicate +with one another directly. The high-throughput APIs (index, delete, search) do +not normally interact with the master node. The responsibility of the master +node is to maintain the global cluster state and reassign shards when nodes join or leave +the cluster. Each time the cluster state is changed, the +new state is published to all nodes in the cluster as described above. diff --git a/docs/reference/modules/discovery/quorums.asciidoc b/docs/reference/modules/discovery/quorums.asciidoc new file mode 100644 index 0000000000000..5642083b63b0b --- /dev/null +++ b/docs/reference/modules/discovery/quorums.asciidoc @@ -0,0 +1,193 @@ +[[modules-discovery-quorums]] +=== Quorum-based decision making + +Electing a master node and changing the cluster state are the two fundamental +tasks that master-eligible nodes must work together to perform. It is important +that these activities work robustly even if some nodes have failed. +Elasticsearch achieves this robustness by considering each action to have +succeeded on receipt of responses from a _quorum_, which is a subset of the +master-eligible nodes in the cluster. The advantage of requiring only a subset +of the nodes to respond is that it means some of the nodes can fail without +preventing the cluster from making progress. The quorums are carefully chosen so +the cluster does not have a "split brain" scenario where it's partitioned into +two pieces such that each piece may make decisions that are inconsistent with +those of the other piece. + +Elasticsearch allows you to add and remove master-eligible nodes to a running +cluster. In many cases you can do this simply by starting or stopping the nodes +as required. See <>. + +As nodes are added or removed Elasticsearch maintains an optimal level of fault +tolerance by updating the cluster's _voting configuration_, which is the set of +master-eligible nodes whose responses are counted when making decisions such as +electing a new master or committing a new cluster state. A decision is made only +after more than half of the nodes in the voting configuration have responded. +Usually the voting configuration is the same as the set of all the +master-eligible nodes that are currently in the cluster. However, there are some +situations in which they may be different. + +To be sure that the cluster remains available you **must not stop half or more +of the nodes in the voting configuration at the same time**. As long as more +than half of the voting nodes are available the cluster can still work normally. +This means that if there are three or four master-eligible nodes, the cluster +can tolerate one of them being unavailable. If there are two or fewer +master-eligible nodes, they must all remain available. + +After a node has joined or left the cluster the elected master must issue a +cluster-state update that adjusts the voting configuration to match, and this +can take a short time to complete. It is important to wait for this adjustment +to complete before removing more nodes from the cluster. + +[float] +==== Setting the initial quorum + +When a brand-new cluster starts up for the first time, it must elect its first +master node. To do this election, it needs to know the set of master-eligible +nodes whose votes should count. This initial voting configuration is known as +the _bootstrap configuration_ and is set in the +<>. + +It is important that the bootstrap configuration identifies exactly which nodes +should vote in the first election. It is not sufficient to configure each node +with an expectation of how many nodes there should be in the cluster. It is also +important to note that the bootstrap configuration must come from outside the +cluster: there is no safe way for the cluster to determine the bootstrap +configuration correctly on its own. + +If the bootstrap configuration is not set correctly, when you start a brand-new +cluster there is a risk that you will accidentally form two separate clusters +instead of one. This situation can lead to data loss: you might start using both +clusters before you notice that anything has gone wrong and it is impossible to +merge them together later. + +NOTE: To illustrate the problem with configuring each node to expect a certain +cluster size, imagine starting up a three-node cluster in which each node knows +that it is going to be part of a three-node cluster. A majority of three nodes +is two, so normally the first two nodes to discover each other form a cluster +and the third node joins them a short time later. However, imagine that four +nodes were erroneously started instead of three. In this case, there are enough +nodes to form two separate clusters. Of course if each node is started manually +then it's unlikely that too many nodes are started. If you're using an automated +orchestrator, however, it's certainly possible to get into this situation-- +particularly if the orchestrator is not resilient to failures such as network +partitions. + +The initial quorum is only required the very first time a whole cluster starts +up. New nodes joining an established cluster can safely obtain all the +information they need from the elected master. Nodes that have previously been +part of a cluster will have stored to disk all the information that is required +when they restart. + +[float] +==== Master elections + +Elasticsearch uses an election process to agree on an elected master node, both +at startup and if the existing elected master fails. Any master-eligible node +can start an election, and normally the first election that takes place will +succeed. Elections only usually fail when two nodes both happen to start their +elections at about the same time, so elections are scheduled randomly on each +node to reduce the probability of this happening. Nodes will retry elections +until a master is elected, backing off on failure, so that eventually an +election will succeed (with arbitrarily high probability). The scheduling of +master elections are controlled by the <>. + +[float] +==== Cluster maintenance, rolling restarts and migrations + +Many cluster maintenance tasks involve temporarily shutting down one or more +nodes and then starting them back up again. By default Elasticsearch can remain +available if one of its master-eligible nodes is taken offline, such as during a +<>. Furthermore, if multiple nodes are stopped +and then started again then it will automatically recover, such as during a +<>. There is no need to take any further +action with the APIs described here in these cases, because the set of master +nodes is not changing permanently. + +[float] +==== Automatic changes to the voting configuration + +Nodes may join or leave the cluster, and Elasticsearch reacts by automatically +making corresponding changes to the voting configuration in order to ensure that +the cluster is as resilient as possible. The default auto-reconfiguration +behaviour is expected to give the best results in most situations. The current +voting configuration is stored in the cluster state so you can inspect its +current contents as follows: + +[source,js] +-------------------------------------------------- +GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config +-------------------------------------------------- +// CONSOLE + +NOTE: The current voting configuration is not necessarily the same as the set of +all available master-eligible nodes in the cluster. Altering the voting +configuration involves taking a vote, so it takes some time to adjust the +configuration as nodes join or leave the cluster. Also, there are situations +where the most resilient configuration includes unavailable nodes, or does not +include some available nodes, and in these situations the voting configuration +differs from the set of available master-eligible nodes in the cluster. + +Larger voting configurations are usually more resilient, so Elasticsearch +normally prefers to add master-eligible nodes to the voting configuration after +they join the cluster. Similarly, if a node in the voting configuration +leaves the cluster and there is another master-eligible node in the cluster that +is not in the voting configuration then it is preferable to swap these two nodes +over. The size of the voting configuration is thus unchanged but its +resilience increases. + +It is not so straightforward to automatically remove nodes from the voting +configuration after they have left the cluster. Different strategies have +different benefits and drawbacks, so the right choice depends on how the cluster +will be used. You can control whether the voting configuration automatically shrinks by using the following setting: + +`cluster.auto_shrink_voting_configuration`:: + + Defaults to `true`, meaning that the voting configuration will automatically + shrink, shedding departed nodes, as long as it still contains at least 3 + nodes. If set to `false`, the voting configuration never automatically + shrinks; departed nodes must be removed manually using the + <>. + +NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the +recommended and default setting, and there are at least three master-eligible +nodes in the cluster, then Elasticsearch remains capable of processing +cluster-state updates as long as all but one of its master-eligible nodes are +healthy. + +There are situations in which Elasticsearch might tolerate the loss of multiple +nodes, but this is not guaranteed under all sequences of failures. If this +setting is set to `false` then departed nodes must be removed from the voting +configuration manually, using the +<>, to achieve +the desired level of resilience. + +No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency. +The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the +event of the failure of some of its nodes, and the administrative tasks that +must be performed as nodes join and leave the cluster. + +[float] +==== Even numbers of master-eligible nodes + +There should normally be an odd number of master-eligible nodes in a cluster. +If there is an even number, Elasticsearch leaves one of them out of the voting +configuration to ensure that it has an odd size. This omission does not decrease +the failure-tolerance of the cluster. In fact, improves it slightly: if the +cluster suffers from a network partition that divides it into two equally-sized +halves then one of the halves will contain a majority of the voting +configuration and will be able to keep operating. If all of the master-eligible +nodes' votes were counted, neither side would contain a strict majority of the +nodes and so the cluster would not be able to make any progress. + +For instance if there are four master-eligible nodes in the cluster and the +voting configuration contained all of them, any quorum-based decision would +require votes from at least three of them. This situation means that the cluster +can tolerate the loss of only a single master-eligible node. If this cluster +were split into two equal halves, neither half would contain three +master-eligible nodes and the cluster would not be able to make any progress. +If the voting configuration contains only three of the four master-eligible +nodes, however, the cluster is still only fully tolerant to the loss of one +node, but quorum-based decisions require votes from two of the three voting +nodes. In the event of an even split, one half will contain two of the three +voting nodes so that half will remain available. diff --git a/docs/reference/modules/discovery/zen.asciidoc b/docs/reference/modules/discovery/zen.asciidoc deleted file mode 100644 index 98967bf7ebaf4..0000000000000 --- a/docs/reference/modules/discovery/zen.asciidoc +++ /dev/null @@ -1,226 +0,0 @@ -[[modules-discovery-zen]] -=== Zen Discovery - -Zen discovery is the built-in, default, discovery module for Elasticsearch. It -provides unicast and file-based discovery, and can be extended to support cloud -environments and other forms of discovery via plugins. - -Zen discovery is integrated with other modules, for example, all communication -between nodes is done using the <> module. - -It is separated into several sub modules, which are explained below: - -[float] -[[ping]] -==== Ping - -This is the process where a node uses the discovery mechanisms to find other -nodes. - -[float] -[[discovery-seed-nodes]] -==== Seed nodes - -Zen discovery uses a list of _seed_ nodes in order to start off the discovery -process. At startup, or when electing a new master, Elasticsearch tries to -connect to each seed node in its list, and holds a gossip-like conversation with -them to find other nodes and to build a complete picture of the cluster. By -default there are two methods for configuring the list of seed nodes: _unicast_ -and _file-based_. It is recommended that the list of seed nodes comprises the -list of master-eligible nodes in the cluster. - -[float] -[[unicast]] -===== Unicast - -Unicast discovery configures a static list of hosts for use as seed nodes. -These hosts can be specified as hostnames or IP addresses; hosts specified as -hostnames are resolved to IP addresses during each round of pinging. Note that -if you are in an environment where DNS resolutions vary with time, you might -need to adjust your <>. - -The list of hosts is set using the `discovery.zen.ping.unicast.hosts` static -setting. This is either an array of hosts or a comma-delimited string. Each -value should be in the form of `host:port` or `host` (where `port` defaults to -the setting `transport.profiles.default.port` falling back to -`transport.port` if not set). Note that IPv6 hosts must be bracketed. The -default for this setting is `127.0.0.1, [::1]` - -Additionally, the `discovery.zen.ping.unicast.resolve_timeout` configures the -amount of time to wait for DNS lookups on each round of pinging. This is -specified as a <> and defaults to 5s. - -Unicast discovery uses the <> module to perform the -discovery. - -[float] -[[file-based-hosts-provider]] -===== File-based - -In addition to hosts provided by the static `discovery.zen.ping.unicast.hosts` -setting, it is possible to provide a list of hosts via an external file. -Elasticsearch reloads this file when it changes, so that the list of seed nodes -can change dynamically without needing to restart each node. For example, this -gives a convenient mechanism for an Elasticsearch instance that is run in a -Docker container to be dynamically supplied with a list of IP addresses to -connect to for Zen discovery when those IP addresses may not be known at node -startup. - -To enable file-based discovery, configure the `file` hosts provider as follows: - -[source,txt] ----------------------------------------------------------------- -discovery.zen.hosts_provider: file ----------------------------------------------------------------- - -Then create a file at `$ES_PATH_CONF/unicast_hosts.txt` in the format described -below. Any time a change is made to the `unicast_hosts.txt` file the new -changes will be picked up by Elasticsearch and the new hosts list will be used. - -Note that the file-based discovery plugin augments the unicast hosts list in -`elasticsearch.yml`: if there are valid unicast host entries in -`discovery.zen.ping.unicast.hosts` then they will be used in addition to those -supplied in `unicast_hosts.txt`. - -The `discovery.zen.ping.unicast.resolve_timeout` setting also applies to DNS -lookups for nodes specified by address via file-based discovery. This is -specified as a <> and defaults to 5s. - -The format of the file is to specify one node entry per line. Each node entry -consists of the host (host name or IP address) and an optional transport port -number. If the port number is specified, is must come immediately after the -host (on the same line) separated by a `:`. If the port number is not -specified, a default value of 9300 is used. - -For example, this is an example of `unicast_hosts.txt` for a cluster with four -nodes that participate in unicast discovery, some of which are not running on -the default port: - -[source,txt] ----------------------------------------------------------------- -10.10.10.5 -10.10.10.6:9305 -10.10.10.5:10005 -# an IPv6 address -[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301 ----------------------------------------------------------------- - -Host names are allowed instead of IP addresses (similar to -`discovery.zen.ping.unicast.hosts`), and IPv6 addresses must be specified in -brackets with the port coming after the brackets. - -It is also possible to add comments to this file. All comments must appear on -their lines starting with `#` (i.e. comments cannot start in the middle of a -line). - -[float] -[[master-election]] -==== Master Election - -As part of the ping process a master of the cluster is either elected or joined -to. This is done automatically. The `discovery.zen.ping_timeout` (which defaults -to `3s`) determines how long the node will wait before deciding on starting an -election or joining an existing cluster. Three pings will be sent over this -timeout interval. In case where no decision can be reached after the timeout, -the pinging process restarts. In slow or congested networks, three seconds -might not be enough for a node to become aware of the other nodes in its -environment before making an election decision. Increasing the timeout should -be done with care in that case, as it will slow down the election process. Once -a node decides to join an existing formed cluster, it will send a join request -to the master (`discovery.zen.join_timeout`) with a timeout defaulting at 20 -times the ping timeout. - -When the master node stops or has encountered a problem, the cluster nodes start -pinging again and will elect a new master. This pinging round also serves as a -protection against (partial) network failures where a node may unjustly think -that the master has failed. In this case the node will simply hear from other -nodes about the currently active master. - -If `discovery.zen.master_election.ignore_non_master_pings` is `true`, pings from -nodes that are not master eligible (nodes where `node.master` is `false`) are -ignored during master election; the default value is `false`. - -Nodes can be excluded from becoming a master by setting `node.master` to -`false`. - -The `discovery.zen.minimum_master_nodes` sets the minimum number of master -eligible nodes that need to join a newly elected master in order for an election -to complete and for the elected node to accept its mastership. The same setting -controls the minimum number of active master eligible nodes that should be a -part of any active cluster. If this requirement is not met the active master -node will step down and a new master election will begin. - -This setting must be set to a <> of your master -eligible nodes. It is recommended to avoid having only two master eligible -nodes, since a quorum of two is two. Therefore, a loss of either master eligible -node will result in an inoperable cluster. - -[float] -[[fault-detection]] -==== Fault Detection - -There are two fault detection processes running. The first is by the master, to -ping all the other nodes in the cluster and verify that they are alive. And on -the other end, each node pings to master to verify if its still alive or an -election process needs to be initiated. - -The following settings control the fault detection process using the -`discovery.zen.fd` prefix: - -[cols="<,<",options="header",] -|======================================================================= -|Setting |Description -|`ping_interval` |How often a node gets pinged. Defaults to `1s`. - -|`ping_timeout` |How long to wait for a ping response, defaults to -`30s`. - -|`ping_retries` |How many ping failures / timeouts cause a node to be -considered failed. Defaults to `3`. -|======================================================================= - -[float] -==== Cluster state updates - -The master node is the only node in a cluster that can make changes to the -cluster state. The master node processes one cluster state update at a time, -applies the required changes and publishes the updated cluster state to all the -other nodes in the cluster. Each node receives the publish message, acknowledges -it, but does *not* yet apply it. If the master does not receive acknowledgement -from at least `discovery.zen.minimum_master_nodes` nodes within a certain time -(controlled by the `discovery.zen.commit_timeout` setting and defaults to 30 -seconds) the cluster state change is rejected. - -Once enough nodes have responded, the cluster state is committed and a message -will be sent to all the nodes. The nodes then proceed to apply the new cluster -state to their internal state. The master node waits for all nodes to respond, -up to a timeout, before going ahead processing the next updates in the queue. -The `discovery.zen.publish_timeout` is set by default to 30 seconds and is -measured from the moment the publishing started. Both timeout settings can be -changed dynamically through the <> - -[float] -[[no-master-block]] -==== No master block - -For the cluster to be fully operational, it must have an active master and the -number of running master eligible nodes must satisfy the -`discovery.zen.minimum_master_nodes` setting if set. The -`discovery.zen.no_master_block` settings controls what operations should be -rejected when there is no active master. - -The `discovery.zen.no_master_block` setting has two valid options: - -[horizontal] -`all`:: All operations on the node--i.e. both read & writes--will be rejected. -This also applies for api cluster state read or write operations, like the get -index settings, put mapping and cluster state api. -`write`:: (default) Write operations will be rejected. Read operations will -succeed, based on the last known cluster configuration. This may result in -partial reads of stale data as this node may be isolated from the rest of the -cluster. - -The `discovery.zen.no_master_block` setting doesn't apply to nodes-based apis -(for example cluster stats, node info and node stats apis). Requests to these -apis will not be blocked and can run on any available node. diff --git a/docs/reference/modules/node.asciidoc b/docs/reference/modules/node.asciidoc index 9287e171129ff..a94f76c55de1f 100644 --- a/docs/reference/modules/node.asciidoc +++ b/docs/reference/modules/node.asciidoc @@ -19,7 +19,7 @@ purpose: <>:: A node that has `node.master` set to `true` (default), which makes it eligible -to be <>, which controls +to be <>, which controls the cluster. <>:: @@ -69,7 +69,7 @@ and deciding which shards to allocate to which nodes. It is important for cluster health to have a stable master node. Any master-eligible node (all nodes by default) may be elected to become the -master node by the <>. +master node by the <>. IMPORTANT: Master nodes must have access to the `data/` directory (just like `data` nodes) as this is where the cluster state is persisted between node restarts. @@ -105,74 +105,6 @@ NOTE: These settings apply only when {xpack} is not installed. To create a dedicated master-eligible node when {xpack} is installed, see <>. endif::include-xpack[] - -[float] -[[split-brain]] -==== Avoiding split brain with `minimum_master_nodes` - -To prevent data loss, it is vital to configure the -`discovery.zen.minimum_master_nodes` setting (which defaults to `1`) so that -each master-eligible node knows the _minimum number of master-eligible nodes_ -that must be visible in order to form a cluster. - -To explain, imagine that you have a cluster consisting of two master-eligible -nodes. A network failure breaks communication between these two nodes. Each -node sees one master-eligible node... itself. With `minimum_master_nodes` set -to the default of `1`, this is sufficient to form a cluster. Each node elects -itself as the new master (thinking that the other master-eligible node has -died) and the result is two clusters, or a _split brain_. These two nodes -will never rejoin until one node is restarted. Any data that has been written -to the restarted node will be lost. - -Now imagine that you have a cluster with three master-eligible nodes, and -`minimum_master_nodes` set to `2`. If a network split separates one node from -the other two nodes, the side with one node cannot see enough master-eligible -nodes and will realise that it cannot elect itself as master. The side with -two nodes will elect a new master (if needed) and continue functioning -correctly. As soon as the network split is resolved, the single node will -rejoin the cluster and start serving requests again. - -This setting should be set to a _quorum_ of master-eligible nodes: - - (master_eligible_nodes / 2) + 1 - -In other words, if there are three master-eligible nodes, then minimum master -nodes should be set to `(3 / 2) + 1` or `2`: - -[source,yaml] ----------------------------- -discovery.zen.minimum_master_nodes: 2 <1> ----------------------------- -<1> Defaults to `1`. - -To be able to remain available when one of the master-eligible nodes fails, -clusters should have at least three master-eligible nodes, with -`minimum_master_nodes` set accordingly. A <>, -performed without any downtime, also requires at least three master-eligible -nodes to avoid the possibility of data loss if a network split occurs while the -upgrade is in progress. - -This setting can also be changed dynamically on a live cluster with the -<>: - -[source,js] ----------------------------- -PUT _cluster/settings -{ - "transient": { - "discovery.zen.minimum_master_nodes": 2 - } -} ----------------------------- -// CONSOLE -// TEST[skip:Test use Zen2 now so we can't test Zen1 behaviour here] - -TIP: An advantage of splitting the master and data roles between dedicated -nodes is that you can have just three master-eligible nodes and set -`minimum_master_nodes` to `2`. You never have to change this setting, no -matter how many dedicated data nodes you add to the cluster. - - [float] [[data-node]] === Data Node diff --git a/docs/reference/modules/snapshots.asciidoc b/docs/reference/modules/snapshots.asciidoc index 48ae41ded2a78..7ee545d66cf0f 100644 --- a/docs/reference/modules/snapshots.asciidoc +++ b/docs/reference/modules/snapshots.asciidoc @@ -597,9 +597,8 @@ if the new cluster doesn't contain nodes with appropriate attributes that a rest index will not be successfully restored unless these index allocation settings are changed during restore operation. The restore operation also checks that restored persistent settings are compatible with the current cluster to avoid accidentally -restoring an incompatible settings such as `discovery.zen.minimum_master_nodes` and as a result disable a smaller cluster until the -required number of master eligible nodes is added. If you need to restore a snapshot with incompatible persistent settings, try -restoring it without the global cluster state. +restoring incompatible settings. If you need to restore a snapshot with incompatible persistent settings, try restoring it without +the global cluster state. [float] === Snapshot status diff --git a/docs/reference/redirects.asciidoc b/docs/reference/redirects.asciidoc index f07d1d09747e7..fe2954b015a02 100644 --- a/docs/reference/redirects.asciidoc +++ b/docs/reference/redirects.asciidoc @@ -560,3 +560,19 @@ See <>. The standard token filter has been removed. +[role="exclude",id="modules-discovery-azure-classic"] + +See <>. + +[role="exclude",id="modules-discovery-ec2"] + +See <>. + +[role="exclude",id="modules-discovery-gce"] + +See <>. + +[role="exclude",id="modules-discovery-zen"] + +Zen discovery is replaced by the <>. diff --git a/docs/reference/security/securing-communications/configuring-tls-docker.asciidoc b/docs/reference/security/securing-communications/configuring-tls-docker.asciidoc index 074ad03a96873..c9d645e4fa234 100644 --- a/docs/reference/security/securing-communications/configuring-tls-docker.asciidoc +++ b/docs/reference/security/securing-communications/configuring-tls-docker.asciidoc @@ -108,7 +108,7 @@ services: image: {docker-image} environment: - node.name=es01 - - discovery.zen.minimum_master_nodes=2 + - cluster.initial_master_nodes=es01,es02 - ELASTIC_PASSWORD=$ELASTIC_PASSWORD <1> - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial <2> @@ -133,9 +133,9 @@ services: image: {docker-image} environment: - node.name=es02 - - discovery.zen.minimum_master_nodes=2 - - ELASTIC_PASSWORD=$ELASTIC_PASSWORD - discovery.zen.ping.unicast.hosts=es01 + - cluster.initial_master_nodes=es01,es02 + - ELASTIC_PASSWORD=$ELASTIC_PASSWORD - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true diff --git a/docs/reference/setup/bootstrap-checks.asciidoc b/docs/reference/setup/bootstrap-checks.asciidoc index 9cf3620636a41..34b39546324d1 100644 --- a/docs/reference/setup/bootstrap-checks.asciidoc +++ b/docs/reference/setup/bootstrap-checks.asciidoc @@ -21,6 +21,7 @@ Elasticsearch from running with incompatible settings. These checks are documented individually. [float] +[[dev-vs-prod-mode]] === Development vs. production mode By default, Elasticsearch binds to loopback addresses for <> diff --git a/docs/reference/setup/important-settings/discovery-settings.asciidoc b/docs/reference/setup/important-settings/discovery-settings.asciidoc index 0587484f50c61..9c62f2da1af25 100644 --- a/docs/reference/setup/important-settings/discovery-settings.asciidoc +++ b/docs/reference/setup/important-settings/discovery-settings.asciidoc @@ -1,22 +1,43 @@ [[discovery-settings]] -=== Discovery settings +=== Discovery and cluster formation settings -Elasticsearch uses a custom discovery implementation called "Zen Discovery" for -node-to-node clustering and master election. There are two important discovery -settings that should be configured before going to production. +There are two important discovery and cluster formation settings that should be +configured before going to production so that nodes in the cluster can discover +each other and elect a master node. [float] [[unicast.hosts]] ==== `discovery.zen.ping.unicast.hosts` Out of the box, without any network configuration, Elasticsearch will bind to -the available loopback addresses and will scan ports 9300 to 9305 to try to -connect to other nodes running on the same server. This provides an auto- +the available loopback addresses and will scan local ports 9300 to 9305 to try +to connect to other nodes running on the same server. This provides an auto- clustering experience without having to do any configuration. -When the moment comes to form a cluster with nodes on other servers, you have to -provide a seed list of other nodes in the cluster that are likely to be live and -contactable. This can be specified as follows: +When the moment comes to form a cluster with nodes on other servers, you must +use the `discovery.zen.ping.unicast.hosts` setting to provide a seed list of +other nodes in the cluster that are master-eligible and likely to be live and +contactable. This setting should normally contain the addresses of all the +master-eligible nodes in the cluster. +This setting contains either an array of hosts or a comma-delimited string. Each +value should be in the form of `host:port` or `host` (where `port` defaults to +the setting `transport.profiles.default.port` falling back to `transport.port` +if not set). Note that IPv6 hosts must be bracketed. The default for this +setting is `127.0.0.1, [::1]` +[float] +[[initial_master_nodes]] +==== `cluster.initial_master_nodes` + +When you start a brand new Elasticsearch cluster for the very first time, there +is a <> step, which +determines the set of master-eligible nodes whose votes are counted in the very +first election. In <>, with no discovery +settings configured, this step is automatically performed by the nodes +themselves. As this auto-bootstrapping is <>, when you start a brand new cluster in <>, you must explicitly list the names or IP addresses of the +master-eligible nodes whose votes should be counted in the very first election. +This list is set using the `cluster.initial_master_nodes` setting. [source,yaml] -------------------------------------------------- @@ -24,35 +45,16 @@ discovery.zen.ping.unicast.hosts: - 192.168.1.10:9300 - 192.168.1.11 <1> - seeds.mydomain.com <2> +cluster.initial_master_nodes: + - master-node-a <3> + - 192.168.1.12 <4> + - 192.168.1.13:9301 <5> -------------------------------------------------- <1> The port will default to `transport.profiles.default.port` and fallback to `transport.port` if not specified. -<2> A hostname that resolves to multiple IP addresses will try all resolved - addresses. - -[float] -[[minimum_master_nodes]] -==== `discovery.zen.minimum_master_nodes` - -To prevent data loss, it is vital to configure the -`discovery.zen.minimum_master_nodes` setting so that each master-eligible node -knows the _minimum number of master-eligible nodes_ that must be visible in -order to form a cluster. - -Without this setting, a cluster that suffers a network failure is at risk of -having the cluster split into two independent clusters -- a split brain -- which -will lead to data loss. A more detailed explanation is provided in -<>. - -To avoid a split brain, this setting should be set to a _quorum_ of -master-eligible nodes: - - (master_eligible_nodes / 2) + 1 - -In other words, if there are three master-eligible nodes, then minimum master -nodes should be set to `(3 / 2) + 1` or `2`: - -[source,yaml] --------------------------------------------------- -discovery.zen.minimum_master_nodes: 2 --------------------------------------------------- +<2> If a hostname resolves to multiple IP addresses then the node will attempt to + discover other nodes at all resolved addresses. +<3> Initial master nodes can be identified by their <>. +<4> Initial master nodes can also be identified by their IP address. +<5> If multiple master nodes share an IP address then the port must be used to + disambiguate them. diff --git a/docs/reference/setup/install/docker.asciidoc b/docs/reference/setup/install/docker.asciidoc index 6eba32ba33202..267ea14420921 100644 --- a/docs/reference/setup/install/docker.asciidoc +++ b/docs/reference/setup/install/docker.asciidoc @@ -142,12 +142,12 @@ endif::[] Instructions for installing it can be found on the https://docs.docker.com/compose/install/#install-using-pip[Docker Compose webpage]. -The node `elasticsearch` listens on `localhost:9200` while `elasticsearch2` -talks to `elasticsearch` over a Docker network. +The node `es01` listens on `localhost:9200` while `es02` +talks to `es01` over a Docker network. This example also uses https://docs.docker.com/engine/tutorials/dockervolumes[Docker named volumes], -called `esdata1` and `esdata2` which will be created if not already present. +called `esdata01` and `esdata02` which will be created if not already present. [[docker-prod-cluster-composefile]] `docker-compose.yml`: @@ -163,10 +163,12 @@ ifeval::["{release-state}"!="unreleased"] -------------------------------------------- version: '2.2' services: - elasticsearch: + es01: image: {docker-image} - container_name: elasticsearch + container_name: es01 environment: + - node.name=es01 + - cluster.initial_master_nodes=es01,es02 - cluster.name=docker-cluster - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" @@ -175,32 +177,34 @@ services: soft: -1 hard: -1 volumes: - - esdata1:/usr/share/elasticsearch/data + - esdata01:/usr/share/elasticsearch/data ports: - 9200:9200 networks: - esnet - elasticsearch2: + es02: image: {docker-image} - container_name: elasticsearch2 + container_name: es02 environment: + - node.name=es02 + - discovery.zen.ping.unicast.hosts=es01 + - cluster.initial_master_nodes=es01,es02 - cluster.name=docker-cluster - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - - "discovery.zen.ping.unicast.hosts=elasticsearch" ulimits: memlock: soft: -1 hard: -1 volumes: - - esdata2:/usr/share/elasticsearch/data + - esdata02:/usr/share/elasticsearch/data networks: - esnet volumes: - esdata1: + esdata01: driver: local - esdata2: + esdata02: driver: local networks: diff --git a/docs/reference/upgrade/cluster_restart.asciidoc b/docs/reference/upgrade/cluster_restart.asciidoc index 85b6fffdb2eb3..4c229e373f505 100644 --- a/docs/reference/upgrade/cluster_restart.asciidoc +++ b/docs/reference/upgrade/cluster_restart.asciidoc @@ -59,10 +59,14 @@ If you have dedicated master nodes, start them first and wait for them to form a cluster and elect a master before proceeding with your data nodes. You can check progress by looking at the logs. -As soon as the <> -have discovered each other, they form a cluster and elect a master. At -that point, you can use <> and -<> to monitor nodes joining the cluster: +If upgrading from a 6.x cluster, you must +<> by +setting the `cluster.initial_master_nodes` setting. + +As soon as enough master-eligible nodes have discovered each other, they form a +cluster and elect a master. At that point, you can use +<> and <> to monitor nodes +joining the cluster: [source,sh] --------------------------------------------------