-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formalize the concept of data tiers in Elasticsearch #60848
Labels
Comments
Pinging @elastic/es-core-features (:Core/Features/Features) |
Merged
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Aug 11, 2020
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Aug 12, 2020
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Aug 12, 2020
…c#60994) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Aug 12, 2020
…60994) (#61045) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Aug 19, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include._tier` setting - The index is created with an `index.routing.allocation.exclude._tier` setting - The index is created with an `index.routing.allocation.require._tier` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Aug 27, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Aug 27, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Aug 27, 2020
…61650) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848
This was referenced Sep 8, 2020
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Sep 10, 2020
Similar to the work in elastic#60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to elastic#60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Sep 14, 2020
Similar to the work in elastic#60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to elastic#60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Sep 14, 2020
…eam inclusion This commit changes the default allocation on the "hot" tier to allocating the newly created index to the "hot" tier if it is part of a new or existing data stream, and to the "content" tier if it is not part of a data stream. Overriding any of the `index.routing.allocation.(include|exclude|require).*` settings continues to cause the initial allocation not to be set (no change in behavior). Relates to elastic#60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Sep 15, 2020
With the differentiation between searchable snapshots on the cold phase and searchable snapshots on the frozen phase not implemented, there is no need to have a separate phase/tier for now. This commit removes the frozen phase and tier, which can be added back at a later time. (this tier was never in a released version, so this is not a breaking change) Relates to elastic#60983 Relates to elastic#60994 Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Sep 18, 2020
…62589) (#62667) This commit adds the `index.routing.allocation.prefer._tier` setting to the `DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a preference-based list of tiers for an index to be assigned to. For example, if the setting were set to: ``` "index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content" ``` If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and `data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes. This allows us to specify an index's preference for tier(s) without causing the index to be unassigned if no nodes of a preferred tier are available. Subsequent work will change the ILM migration to make additional use of this setting. Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Sep 29, 2020
This commit adds telemetry for our data tier formalization. This telemetry helps determine the topology of the cluster with regard to the content, data, hot, warm, & cold tiers/roles. An example of the telemetry looks like: ``` GET /_xpack/usage?human { ... "data_tiers" : { "available" : true, "enabled" : true, "data_warm" : { ... }, "data" : { ... }, "data_cold" : { ... }, "data_content" : { "node_count" : 1, "index_count" : 6, "total_shard_count" : 6, "primary_shard_count" : 6, "doc_count" : 71, "total_size" : "59.6kb", "total_size_bytes" : 61110, "primary_size" : "59.6kb", "primary_size_bytes" : 61110, "primary_shard_size_avg" : "9.9kb", "primary_shard_size_avg_bytes" : 10185, "primary_shard_size_median_bytes" : "8kb", "primary_shard_size_median_bytes" : 8254, "primary_shard_size_mad_bytes" : "7.2kb", "primary_shard_size_mad_bytes" : 7391 }, "data_hot" : { ... } } } ``` The fields are as follows: - node_count :: number of nodes with this tier/role - index_count :: number of indices on this tier - total_shard_count :: total number of shards for all nodes in this tier - primary_shard_count :: number of primary shards for all nodes in this tier - doc_count :: number of documents for all nodes in this tier - total_size_bytes :: total number of bytes for all shards for all nodes in this tier - primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier - primary_shard_size_avg_bytes :: average shard size for primary shard in this tier - primary_shard_size_median_bytes :: median shard size for primary shard in this tier - primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Oct 1, 2020
This commit adds telemetry for our data tier formalization. This telemetry helps determine the topology of the cluster with regard to the content, hot, warm, & cold tiers/roles. An example of the telemetry looks like: ``` GET /_xpack/usage?human { ... "data_tiers" : { "available" : true, "enabled" : true, "data_warm" : { ... }, "data_cold" : { ... }, "data_content" : { "node_count" : 1, "index_count" : 6, "total_shard_count" : 6, "primary_shard_count" : 6, "doc_count" : 71, "total_size" : "59.6kb", "total_size_bytes" : 61110, "primary_size" : "59.6kb", "primary_size_bytes" : 61110, "primary_shard_size_avg" : "9.9kb", "primary_shard_size_avg_bytes" : 10185, "primary_shard_size_median" : "8kb", "primary_shard_size_median_bytes" : 8254, "primary_shard_size_mad" : "7.2kb", "primary_shard_size_mad_bytes" : 7391 }, "data_hot" : { ... } } } ``` The fields are as follows: - node_count :: number of nodes with this tier/role - index_count :: number of indices on this tier - total_shard_count :: total number of shards for all nodes in this tier - primary_shard_count :: number of primary shards for all nodes in this tier - doc_count :: number of documents for all nodes in this tier - total_size_bytes :: total number of bytes for all shards for all nodes in this tier - primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier - primary_shard_size_avg_bytes :: average shard size for primary shard in this tier - primary_shard_size_median_bytes :: median shard size for primary shard in this tier - primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 1, 2020
This commit adds telemetry for our data tier formalization. This telemetry helps determine the topology of the cluster with regard to the content, hot, warm, & cold tiers/roles. An example of the telemetry looks like: ``` GET /_xpack/usage?human { ... "data_tiers" : { "available" : true, "enabled" : true, "data_warm" : { ... }, "data_cold" : { ... }, "data_content" : { "node_count" : 1, "index_count" : 6, "total_shard_count" : 6, "primary_shard_count" : 6, "doc_count" : 71, "total_size" : "59.6kb", "total_size_bytes" : 61110, "primary_size" : "59.6kb", "primary_size_bytes" : 61110, "primary_shard_size_avg" : "9.9kb", "primary_shard_size_avg_bytes" : 10185, "primary_shard_size_median" : "8kb", "primary_shard_size_median_bytes" : 8254, "primary_shard_size_mad" : "7.2kb", "primary_shard_size_mad_bytes" : 7391 }, "data_hot" : { ... } } } ``` The fields are as follows: - node_count :: number of nodes with this tier/role - index_count :: number of indices on this tier - total_shard_count :: total number of shards for all nodes in this tier - primary_shard_count :: number of primary shards for all nodes in this tier - doc_count :: number of documents for all nodes in this tier - total_size_bytes :: total number of bytes for all shards for all nodes in this tier - primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier - primary_shard_size_avg_bytes :: average shard size for primary shard in this tier - primary_shard_size_median_bytes :: median shard size for primary shard in this tier - primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier Relates to elastic#60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 7, 2020
This adds release note highlights for the data tiers formalization feature. Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Oct 8, 2020
This adds release note highlights for the data tiers formalization feature. Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 12, 2020
When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Oct 13, 2020
…63581) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 13, 2020
…lastic#63581) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to elastic#60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Oct 13, 2020
…lastic#63581) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Oct 13, 2020
…tion (#63581) (#63612) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to #60848
Going to close this as it has been added and will be available in 7.10+. Any future work we can open as individual issues. |
ghost
mentioned this issue
Oct 31, 2020
61 tasks
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to #60848
dakrone
added a commit
to dakrone/elasticsearch
that referenced
this issue
Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to elastic#60848
dakrone
added a commit
that referenced
this issue
Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to #60848
34 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We currently have the ability for users to split their deployments into tiers based on thing like node attributes, and manually move data between the tiers within ILM. We'd like to take this one step further and formalize the concept of data tiers within Elasticsearch.
Tasks
frozen
phase to ILM (@andreidan) ILM: add frozen phase #60983[UI] Add UI for frozen phase[UI] Support "frozen" phase in ILM UI #61345[UI] Add opt-out UI for automatic data relocationdata_content
tier (@dakrone) Add "content" tier as new "data_content" role #62247[ ] Add opt-out index level setting to bypass initial hot allocation and ILM phase migration (@andreidan) Add index setting to bypass auto allocation to hot nodes #62114migrate
step to correctly set list of possible tiers for each phase (@andreidan) ILM: migrate action configures the _tier_preference setting #62829Context
So why formalize tiers into Elasticsearch (and beyond)? There are a number of advantages to doing this.
Minimum Viable Product
There are a set of things that we’d like to provide for the MVP for formalizing data tiers. This includes functionality for the tiering itself as well as uses within other parts of ES (like ILM). While the features can be expanded at a later time, this is a good starting place for the MVP.
Add tiers to Elasticsearch
The first step will be adding tiers to Elasticsearch itself. We can add the following roles to Elasticsearch:
These roles are not mutually exclusive. When a user doesn’t specify any of these roles, but does specify the “data” role (or uses the default node role which includes “data”), we will treat the node as if it has all of the
data_*
roles.Not only do we need to make these tiers available for setting, we need to make them accessible for allocation, we currently have a set of built-in attributes that users can specify in our allocation APIs:
_name, _host_ip, _publish_ip, _ip, _host, and _id
. I propose that we add another:_tier
. This new attribute could be used manually for both the cluster and index level allocation as well as within ILM. This way we could avoid having to introduce a new set of allocation deciders specifically for moving data within the different tiers, we also already have the infrastructure for include, exclude, and require for a given set of _tier attributes.An example configuration for this would include the following in elasticsearch.yml:
One of the first uses of the new tiers will be ILM. Currently ILM has a lifecycle that includes the hot, warm, and cold phases and their actions. Making ILM aware of our tiers is a two step process: adding the tier as a new phase, and then making ILM perform the automatic migration.
Adding a “frozen” phase to ILM
Adding a frozen phase also includes adding a set of actions that are allowed as well as the parsing for the phase itself. The “frozen” phase will occur after the “cold” phase but before the “delete” phase. The list of allowed actions for the frozen phase in their execution order will be:
Migrating data between tiers automatically
Currently ILM doesn’t migrate any data between tiers automatically, though this is something that has tripped up users in the past (they expect it to move the data, but it doesn’t). The plan is to make ILM automatically move data to the tier corresponding to the ILM phase, unless there is an existing allocate action in the phase with an allocation set (not just a replica change)
This migration should be implemented as an injected step (similar to the way we inject the “unfollow” step in phases) that happens as the first step in a phase, that way the user can monitor it through the existing ILM explain API as well as allowing it to be re-run when a user moves back to a phase. This injected step should fail fast if there are no nodes corresponding to the given phase available in the cluster, and then be retried the next time the ILM policy is executed.
We should add a way to opt-out of this automatic migration, rather than requiring a user to have a custom allocation as the only way to opt out.
Allocate new indices on hot nodes
In addition to making tiers something a user manages, we want new data to automatically be allocated to “hot” nodes by default. This will not affect the out-of-the-box case where each node is of type “data”, because those are considered hot nodes.
This should be implemented as default settings for the index that set:
As the settings for a brand new index. This has the nice benefit of easily allowing a user to override these default settings in their template, or manually when creating the index. These are the same settings that will be updated by ILM when migrating between phases.
The text was updated successfully, but these errors were encountered: