Formalize the concept of data tiers in Elasticsearch #60848

dakrone · 2020-08-06T19:51:26Z

We currently have the ability for users to split their deployments into tiers based on thing like node attributes, and manually move data between the tiers within ILM. We'd like to take this one step further and formalize the concept of data tiers within Elasticsearch.

Tasks

Context

So why formalize tiers into Elasticsearch (and beyond)? There are a number of advantages to doing this.

By formalizing this inside of Elasticsearch itself we shift from descriptive best practices to prescriptive best practices. Instead of a million ways to configure hot/warm/cold, we prescribe our preferred solution.
This allows us to be consistent in our documentation for on-prem as well as on Cloud, we don’t need to make up attributes that may differ, as we can refer to the actual role names and configuration.
This solution allows us to tell a story not only in our documentation, but also in our out-of-the-box configuration. The idea of data having a lifecycle is concrete instead of abstract based on general purpose constructs.
A data stream already encapsulates some of the lifecycle of data in that we prevent certain actions to the write index, allowing them only to non-write indices in the stream. This would only be strengthened by having tiers available as a first class feature.
A better out of the box experience for users using time-series data
A user now has less to configure in their ILM policy and templates, as data can shift tiers automatically.
Since we have a distinction between tiers, we have the freedom to be more aggressive with our default ILM policies. For example, we can start to include policies that automatically freeze indices on a frozen tier, or use searchable snapshots by default, because tiers are now a first class idea.
Autoscaling can be tier-aware. Rather than having to scale based on a node attribute and not knowing whether data is even respecting that attribute by default (since we don’t respect attribute-based allocation by default), autoscaling can differentiate between the different tiers, scaling only a specific part up or down as needed.

Minimum Viable Product

There are a set of things that we’d like to provide for the MVP for formalizing data tiers. This includes functionality for the tiering itself as well as uses within other parts of ES (like ILM). While the features can be expanded at a later time, this is a good starting place for the MVP.

Add tiers to Elasticsearch

The first step will be adding tiers to Elasticsearch itself. We can add the following roles to Elasticsearch:

data_hot
data_warm
data_cold
data_frozen

These roles are not mutually exclusive. When a user doesn’t specify any of these roles, but does specify the “data” role (or uses the default node role which includes “data”), we will treat the node as if it has all of the data_* roles.

Not only do we need to make these tiers available for setting, we need to make them accessible for allocation, we currently have a set of built-in attributes that users can specify in our allocation APIs: _name, _host_ip, _publish_ip, _ip, _host, and _id. I propose that we add another: _tier. This new attribute could be used manually for both the cluster and index level allocation as well as within ILM. This way we could avoid having to introduce a new set of allocation deciders specifically for moving data within the different tiers, we also already have the infrastructure for include, exclude, and require for a given set of _tier attributes.

An example configuration for this would include the following in elasticsearch.yml:

node.roles: [“master”, “data_hot”, “ingest”]

One of the first uses of the new tiers will be ILM. Currently ILM has a lifecycle that includes the hot, warm, and cold phases and their actions. Making ILM aware of our tiers is a two step process: adding the tier as a new phase, and then making ILM perform the automatic migration.

Adding a “frozen” phase to ILM

Adding a frozen phase also includes adding a set of actions that are allowed as well as the parsing for the phase itself. The “frozen” phase will occur after the “cold” phase but before the “delete” phase. The list of allowed actions for the frozen phase in their execution order will be:

set_priority
unfollow
allocate
freeze
searchable_snapshot

Migrating data between tiers automatically

Currently ILM doesn’t migrate any data between tiers automatically, though this is something that has tripped up users in the past (they expect it to move the data, but it doesn’t). The plan is to make ILM automatically move data to the tier corresponding to the ILM phase, unless there is an existing allocate action in the phase with an allocation set (not just a replica change)

This migration should be implemented as an injected step (similar to the way we inject the “unfollow” step in phases) that happens as the first step in a phase, that way the user can monitor it through the existing ILM explain API as well as allowing it to be re-run when a user moves back to a phase. This injected step should fail fast if there are no nodes corresponding to the given phase available in the cluster, and then be retried the next time the ILM policy is executed.

We should add a way to opt-out of this automatic migration, rather than requiring a user to have a custom allocation as the only way to opt out.

Allocate new indices on hot nodes

In addition to making tiers something a user manages, we want new data to automatically be allocated to “hot” nodes by default. This will not affect the out-of-the-box case where each node is of type “data”, because those are considered hot nodes.

This should be implemented as default settings for the index that set:

{
  "index.routing.allocation.include._role": "data_hot"
}

As the settings for a brand new index. This has the nice benefit of easily allowing a user to override these default settings in their template, or manually when creating the index. These are the same settings that will be updated by ILM when migrating between phases.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-08-06T19:51:28Z

Pinging @elastic/es-core-features (:Core/Features/Features)

This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to elastic#60848

This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848

…c#60994) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to elastic#60848

…60994) (#61045) This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the x-pack plugin. These roles are intended to be the base for the formalization of data tiers in Elasticsearch. These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing `data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and frozen node). This also includes a custom `AllocationDecider` that allows the user to configure the following settings on a cluster level: - `cluster.routing.allocation.require._tier` - `cluster.routing.allocation.include._tier` - `cluster.routing.allocation.exclude._tier` And in index settings: - `index.routing.allocation.require._tier` - `index.routing.allocation.include._tier` - `index.routing.allocation.exclude._tier` Relates to #60848

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include._tier` setting - The index is created with an `index.routing.allocation.exclude._tier` setting - The index is created with an `index.routing.allocation.require._tier` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to elastic#60848

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to elastic#60848

…61650) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848

Similar to the work in elastic#60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to elastic#60848

Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to #60848

Similar to the work in elastic#60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to elastic#60848

Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This introduces a new `data_content` node role to be used for the Content tier. Currently this tier is not used anywhere, but subsequent work will use this tier. Relates to #60848

…eam inclusion This commit changes the default allocation on the "hot" tier to allocating the newly created index to the "hot" tier if it is part of a new or existing data stream, and to the "content" tier if it is not part of a data stream. Overriding any of the `index.routing.allocation.(include|exclude|require).*` settings continues to cause the initial allocation not to be set (no change in behavior). Relates to elastic#60848

With the differentiation between searchable snapshots on the cold phase and searchable snapshots on the frozen phase not implemented, there is no need to have a separate phase/tier for now. This commit removes the frozen phase and tier, which can be added back at a later time. (this tier was never in a released version, so this is not a breaking change) Relates to elastic#60983 Relates to elastic#60994 Relates to elastic#60848

…62589) (#62667) This commit adds the `index.routing.allocation.prefer._tier` setting to the `DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a preference-based list of tiers for an index to be assigned to. For example, if the setting were set to: ``` "index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content" ``` If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and `data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes. This allows us to specify an index's preference for tier(s) without causing the index to be unassigned if no nodes of a preferred tier are available. Subsequent work will change the ILM migration to make additional use of this setting. Relates to #60848

This commit adds telemetry for our data tier formalization. This telemetry helps determine the topology of the cluster with regard to the content, data, hot, warm, & cold tiers/roles. An example of the telemetry looks like: ``` GET /_xpack/usage?human { ... "data_tiers" : { "available" : true, "enabled" : true, "data_warm" : { ... }, "data" : { ... }, "data_cold" : { ... }, "data_content" : { "node_count" : 1, "index_count" : 6, "total_shard_count" : 6, "primary_shard_count" : 6, "doc_count" : 71, "total_size" : "59.6kb", "total_size_bytes" : 61110, "primary_size" : "59.6kb", "primary_size_bytes" : 61110, "primary_shard_size_avg" : "9.9kb", "primary_shard_size_avg_bytes" : 10185, "primary_shard_size_median_bytes" : "8kb", "primary_shard_size_median_bytes" : 8254, "primary_shard_size_mad_bytes" : "7.2kb", "primary_shard_size_mad_bytes" : 7391 }, "data_hot" : { ... } } } ``` The fields are as follows: - node_count :: number of nodes with this tier/role - index_count :: number of indices on this tier - total_shard_count :: total number of shards for all nodes in this tier - primary_shard_count :: number of primary shards for all nodes in this tier - doc_count :: number of documents for all nodes in this tier - total_size_bytes :: total number of bytes for all shards for all nodes in this tier - primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier - primary_shard_size_avg_bytes :: average shard size for primary shard in this tier - primary_shard_size_median_bytes :: median shard size for primary shard in this tier - primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier Relates to elastic#60848

This commit adds telemetry for our data tier formalization. This telemetry helps determine the topology of the cluster with regard to the content, hot, warm, & cold tiers/roles. An example of the telemetry looks like: ``` GET /_xpack/usage?human { ... "data_tiers" : { "available" : true, "enabled" : true, "data_warm" : { ... }, "data_cold" : { ... }, "data_content" : { "node_count" : 1, "index_count" : 6, "total_shard_count" : 6, "primary_shard_count" : 6, "doc_count" : 71, "total_size" : "59.6kb", "total_size_bytes" : 61110, "primary_size" : "59.6kb", "primary_size_bytes" : 61110, "primary_shard_size_avg" : "9.9kb", "primary_shard_size_avg_bytes" : 10185, "primary_shard_size_median" : "8kb", "primary_shard_size_median_bytes" : 8254, "primary_shard_size_mad" : "7.2kb", "primary_shard_size_mad_bytes" : 7391 }, "data_hot" : { ... } } } ``` The fields are as follows: - node_count :: number of nodes with this tier/role - index_count :: number of indices on this tier - total_shard_count :: total number of shards for all nodes in this tier - primary_shard_count :: number of primary shards for all nodes in this tier - doc_count :: number of documents for all nodes in this tier - total_size_bytes :: total number of bytes for all shards for all nodes in this tier - primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier - primary_shard_size_avg_bytes :: average shard size for primary shard in this tier - primary_shard_size_median_bytes :: median shard size for primary shard in this tier - primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier Relates to #60848

This commit adds telemetry for our data tier formalization. This telemetry helps determine the topology of the cluster with regard to the content, hot, warm, & cold tiers/roles. An example of the telemetry looks like: ``` GET /_xpack/usage?human { ... "data_tiers" : { "available" : true, "enabled" : true, "data_warm" : { ... }, "data_cold" : { ... }, "data_content" : { "node_count" : 1, "index_count" : 6, "total_shard_count" : 6, "primary_shard_count" : 6, "doc_count" : 71, "total_size" : "59.6kb", "total_size_bytes" : 61110, "primary_size" : "59.6kb", "primary_size_bytes" : 61110, "primary_shard_size_avg" : "9.9kb", "primary_shard_size_avg_bytes" : 10185, "primary_shard_size_median" : "8kb", "primary_shard_size_median_bytes" : 8254, "primary_shard_size_mad" : "7.2kb", "primary_shard_size_mad_bytes" : 7391 }, "data_hot" : { ... } } } ``` The fields are as follows: - node_count :: number of nodes with this tier/role - index_count :: number of indices on this tier - total_shard_count :: total number of shards for all nodes in this tier - primary_shard_count :: number of primary shards for all nodes in this tier - doc_count :: number of documents for all nodes in this tier - total_size_bytes :: total number of bytes for all shards for all nodes in this tier - primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier - primary_shard_size_avg_bytes :: average shard size for primary shard in this tier - primary_shard_size_median_bytes :: median shard size for primary shard in this tier - primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier Relates to elastic#60848

This adds release note highlights for the data tiers formalization feature. Relates to elastic#60848

This adds release note highlights for the data tiers formalization feature. Relates to #60848

When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to elastic#60848

…63581) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to #60848

…lastic#63581) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to elastic#60848

…tion (#63581) (#63612) When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+ with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since the older node has no concept of these roles). This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used for serializing to an older Elasicsearch version. The new formalized data tier roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to still contain data during a rolling upgrade where the master is upgraded last. Relates to #60848

dakrone · 2020-10-13T15:18:08Z

Going to close this as it has been added and will be available in 7.10+. Any future work we can open as individual issues.

This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to elastic#60848

This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to #60848

This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to elastic#60848

This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase. The frozen phase is intended to be used for data even less frequently searched than the cold phase, and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full searchable snapshots in the cold phase). Relates to #60848

dakrone added >feature Meta :Core/Features/Features labels Aug 6, 2020

dakrone self-assigned this Aug 6, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Aug 6, 2020

andreidan mentioned this issue Aug 11, 2020

ILM: add frozen phase #60983

Merged

dakrone mentioned this issue Aug 11, 2020

Add data tiers (hot, warm, cold, frozen) as custom node roles #60994

Merged

nerophon mentioned this issue Aug 12, 2020

Hot-warm setup confusing, docs could explain better #61019

Closed

dakrone assigned andreidan Aug 13, 2020

dakrone mentioned this issue Aug 19, 2020

Allocate newly created indices on data_hot tier nodes #61342

Merged

This was referenced Sep 8, 2020

ILM migrate data between tiers #61377

Merged

Add index setting to bypass auto allocation to hot nodes #62114

Closed

dakrone mentioned this issue Sep 10, 2020

Add "content" tier as new "data_content" role #62247

Merged

dakrone mentioned this issue Sep 14, 2020

Allocate new indices on "hot" or "content" tier depending on data stream inclusion #62338

Merged

andreidan mentioned this issue Sep 23, 2020

ILM: migrate action configures the _tier_preference setting #62829

Merged

dakrone mentioned this issue Sep 29, 2020

ILM and defining node types #56876

Closed

dakrone mentioned this issue Sep 29, 2020

Add telemetry for data tiers #63031

Merged

andreidan mentioned this issue Sep 30, 2020

DOCS: general overview of data tiers and roles #63086

Merged

dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 7, 2020

Add release note highlights for data tiers

e5b84a2

This adds release note highlights for the data tiers formalization feature. Relates to elastic#60848

dakrone mentioned this issue Oct 7, 2020

Add release note highlights for data tiers #63427

Merged

dakrone added a commit that referenced this issue Oct 8, 2020

Add release note highlights for data tiers (#63427)

27f1326

This adds release note highlights for the data tiers formalization feature. Relates to #60848

dakrone mentioned this issue Oct 12, 2020

Add DiscoveryNodeRole compatibility role for bwc tier serialization #63581

Merged

dakrone closed this as completed Oct 13, 2020

ghost mentioned this issue Oct 31, 2020

Add data-tier concepts into the monitoring app elastic/kibana#82216

Open

stevejgordon mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

dakrone mentioned this issue Feb 5, 2021

Add the frozen tier node role and ILM phase #68605

Merged

stevejgordon mentioned this issue Feb 22, 2021

7.12.0 Meta Ticket elastic/elasticsearch-net#5337

Closed

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize the concept of data tiers in Elasticsearch #60848

Formalize the concept of data tiers in Elasticsearch #60848

dakrone commented Aug 6, 2020 •

edited

Loading

elasticmachine commented Aug 6, 2020

dakrone commented Oct 13, 2020

Formalize the concept of data tiers in Elasticsearch #60848

Formalize the concept of data tiers in Elasticsearch #60848

Comments

dakrone commented Aug 6, 2020 • edited Loading

Tasks

Context

Minimum Viable Product

Add tiers to Elasticsearch

Adding a “frozen” phase to ILM

Migrating data between tiers automatically

Allocate new indices on hot nodes

elasticmachine commented Aug 6, 2020

dakrone commented Oct 13, 2020

dakrone commented Aug 6, 2020 •

edited

Loading