Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize the concept of data tiers in Elasticsearch #60848

Closed
15 of 18 tasks
dakrone opened this issue Aug 6, 2020 · 2 comments
Closed
15 of 18 tasks

Formalize the concept of data tiers in Elasticsearch #60848

dakrone opened this issue Aug 6, 2020 · 2 comments
Assignees
Labels
>feature Meta Team:Data Management Meta label for data/management team

Comments

@dakrone
Copy link
Member

dakrone commented Aug 6, 2020

We currently have the ability for users to split their deployments into tiers based on thing like node attributes, and manually move data between the tiers within ILM. We'd like to take this one step further and formalize the concept of data tiers within Elasticsearch.

Tasks


Context

So why formalize tiers into Elasticsearch (and beyond)? There are a number of advantages to doing this.

  • By formalizing this inside of Elasticsearch itself we shift from descriptive best practices to prescriptive best practices. Instead of a million ways to configure hot/warm/cold, we prescribe our preferred solution.
  • This allows us to be consistent in our documentation for on-prem as well as on Cloud, we don’t need to make up attributes that may differ, as we can refer to the actual role names and configuration.
  • This solution allows us to tell a story not only in our documentation, but also in our out-of-the-box configuration. The idea of data having a lifecycle is concrete instead of abstract based on general purpose constructs.
  • A data stream already encapsulates some of the lifecycle of data in that we prevent certain actions to the write index, allowing them only to non-write indices in the stream. This would only be strengthened by having tiers available as a first class feature.
  • A better out of the box experience for users using time-series data
  • A user now has less to configure in their ILM policy and templates, as data can shift tiers automatically.
  • Since we have a distinction between tiers, we have the freedom to be more aggressive with our default ILM policies. For example, we can start to include policies that automatically freeze indices on a frozen tier, or use searchable snapshots by default, because tiers are now a first class idea.
  • Autoscaling can be tier-aware. Rather than having to scale based on a node attribute and not knowing whether data is even respecting that attribute by default (since we don’t respect attribute-based allocation by default), autoscaling can differentiate between the different tiers, scaling only a specific part up or down as needed.

Minimum Viable Product

There are a set of things that we’d like to provide for the MVP for formalizing data tiers. This includes functionality for the tiering itself as well as uses within other parts of ES (like ILM). While the features can be expanded at a later time, this is a good starting place for the MVP.

Add tiers to Elasticsearch

The first step will be adding tiers to Elasticsearch itself. We can add the following roles to Elasticsearch:

  • data_hot
  • data_warm
  • data_cold
  • data_frozen

These roles are not mutually exclusive. When a user doesn’t specify any of these roles, but does specify the “data” role (or uses the default node role which includes “data”), we will treat the node as if it has all of the data_* roles.

Not only do we need to make these tiers available for setting, we need to make them accessible for allocation, we currently have a set of built-in attributes that users can specify in our allocation APIs: _name, _host_ip, _publish_ip, _ip, _host, and _id. I propose that we add another: _tier. This new attribute could be used manually for both the cluster and index level allocation as well as within ILM. This way we could avoid having to introduce a new set of allocation deciders specifically for moving data within the different tiers, we also already have the infrastructure for include, exclude, and require for a given set of _tier attributes.

An example configuration for this would include the following in elasticsearch.yml:

node.roles: [“master”, “data_hot”, “ingest”]

One of the first uses of the new tiers will be ILM. Currently ILM has a lifecycle that includes the hot, warm, and cold phases and their actions. Making ILM aware of our tiers is a two step process: adding the tier as a new phase, and then making ILM perform the automatic migration.

Adding a “frozen” phase to ILM

Adding a frozen phase also includes adding a set of actions that are allowed as well as the parsing for the phase itself. The “frozen” phase will occur after the “cold” phase but before the “delete” phase. The list of allowed actions for the frozen phase in their execution order will be:

  • set_priority
  • unfollow
  • allocate
  • freeze
  • searchable_snapshot

Migrating data between tiers automatically

Currently ILM doesn’t migrate any data between tiers automatically, though this is something that has tripped up users in the past (they expect it to move the data, but it doesn’t). The plan is to make ILM automatically move data to the tier corresponding to the ILM phase, unless there is an existing allocate action in the phase with an allocation set (not just a replica change)

This migration should be implemented as an injected step (similar to the way we inject the “unfollow” step in phases) that happens as the first step in a phase, that way the user can monitor it through the existing ILM explain API as well as allowing it to be re-run when a user moves back to a phase. This injected step should fail fast if there are no nodes corresponding to the given phase available in the cluster, and then be retried the next time the ILM policy is executed.

We should add a way to opt-out of this automatic migration, rather than requiring a user to have a custom allocation as the only way to opt out.

Allocate new indices on hot nodes

In addition to making tiers something a user manages, we want new data to automatically be allocated to “hot” nodes by default. This will not affect the out-of-the-box case where each node is of type “data”, because those are considered hot nodes.

This should be implemented as default settings for the index that set:

{
  "index.routing.allocation.include._role": "data_hot"
}

As the settings for a brand new index. This has the nice benefit of easily allowing a user to override these default settings in their template, or manually when creating the index. These are the same settings that will be updated by ILM when migrating between phases.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Features)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Aug 6, 2020
dakrone added a commit to dakrone/elasticsearch that referenced this issue Aug 11, 2020
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to elastic#60848
dakrone added a commit that referenced this issue Aug 12, 2020
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Aug 12, 2020
…c#60994)

This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to elastic#60848
dakrone added a commit that referenced this issue Aug 12, 2020
…60994) (#61045)

This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Aug 19, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include._tier` setting
- The index is created with an `index.routing.allocation.exclude._tier` setting
- The index is created with an `index.routing.allocation.require._tier` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to elastic#60848
dakrone added a commit that referenced this issue Aug 27, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Aug 27, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to elastic#60848
dakrone added a commit that referenced this issue Aug 27, 2020
…61650)

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Sep 10, 2020
Similar to the work in elastic#60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.

Currently this tier is not used anywhere, but subsequent work will use this tier.

Relates to elastic#60848
dakrone added a commit that referenced this issue Sep 14, 2020
Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.

Currently this tier is not used anywhere, but subsequent work will use this tier.

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Sep 14, 2020
Similar to the work in elastic#60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.

Currently this tier is not used anywhere, but subsequent work will use this tier.

Relates to elastic#60848
dakrone added a commit that referenced this issue Sep 14, 2020
Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.

Currently this tier is not used anywhere, but subsequent work will use this tier.

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Sep 14, 2020
…eam inclusion

This commit changes the default allocation on the "hot" tier to allocating the newly created index
to the "hot" tier if it is part of a new or existing data stream, and to the "content" tier if it is
not part of a data stream.

Overriding any of the `index.routing.allocation.(include|exclude|require).*` settings continues to
cause the initial allocation not to be set (no change in behavior).

Relates to elastic#60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Sep 15, 2020
With the differentiation between searchable snapshots on the cold phase and searchable snapshots on
the frozen phase not implemented, there is no need to have a separate phase/tier for now. This
commit removes the frozen phase and tier, which can be added back at a later time.

(this tier was never in a released version, so this is not a breaking change)

Relates to elastic#60983
Relates to elastic#60994
Relates to elastic#60848
dakrone added a commit that referenced this issue Sep 18, 2020
…62589) (#62667)

This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:

```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```

If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.

This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.

Subsequent work will change the ILM migration to make additional use of this setting.

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Sep 29, 2020
This commit adds telemetry for our data tier formalization. This telemetry helps determine the
topology of the cluster with regard to the content, data, hot, warm, & cold tiers/roles.

An example of the telemetry looks like:

```
GET /_xpack/usage?human
{
  ...
  "data_tiers" : {
    "available" : true,
    "enabled" : true,
    "data_warm" : {
      ...
    },
    "data" : {
      ...
    },
    "data_cold" : {
      ...
    },
    "data_content" : {
      "node_count" : 1,
      "index_count" : 6,
      "total_shard_count" : 6,
      "primary_shard_count" : 6,
      "doc_count" : 71,
      "total_size" : "59.6kb",
      "total_size_bytes" : 61110,
      "primary_size" : "59.6kb",
      "primary_size_bytes" : 61110,
      "primary_shard_size_avg" : "9.9kb",
      "primary_shard_size_avg_bytes" : 10185,
      "primary_shard_size_median_bytes" : "8kb",
      "primary_shard_size_median_bytes" : 8254,
      "primary_shard_size_mad_bytes" : "7.2kb",
      "primary_shard_size_mad_bytes" : 7391
    },
    "data_hot" : {
       ...
    }
  }
}
```

The fields are as follows:

- node_count :: number of nodes with this tier/role
- index_count :: number of indices on this tier
- total_shard_count :: total number of shards for all nodes in this tier
- primary_shard_count :: number of primary shards for all nodes in this tier
- doc_count :: number of documents for all nodes in this tier
- total_size_bytes :: total number of bytes for all shards for all nodes in this tier
- primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
- primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
- primary_shard_size_median_bytes :: median shard size for primary shard in this tier
- primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier

Relates to elastic#60848
dakrone added a commit that referenced this issue Oct 1, 2020
This commit adds telemetry for our data tier formalization. This telemetry helps determine the
topology of the cluster with regard to the content, hot, warm, & cold tiers/roles.

An example of the telemetry looks like:

```
GET /_xpack/usage?human
{
  ...
  "data_tiers" : {
    "available" : true,
    "enabled" : true,
    "data_warm" : {
      ...
    },
    "data_cold" : {
      ...
    },
    "data_content" : {
      "node_count" : 1,
      "index_count" : 6,
      "total_shard_count" : 6,
      "primary_shard_count" : 6,
      "doc_count" : 71,
      "total_size" : "59.6kb",
      "total_size_bytes" : 61110,
      "primary_size" : "59.6kb",
      "primary_size_bytes" : 61110,
      "primary_shard_size_avg" : "9.9kb",
      "primary_shard_size_avg_bytes" : 10185,
      "primary_shard_size_median" : "8kb",
      "primary_shard_size_median_bytes" : 8254,
      "primary_shard_size_mad" : "7.2kb",
      "primary_shard_size_mad_bytes" : 7391
    },
    "data_hot" : {
       ...
    }
  }
}
```

The fields are as follows:

- node_count :: number of nodes with this tier/role
- index_count :: number of indices on this tier
- total_shard_count :: total number of shards for all nodes in this tier
- primary_shard_count :: number of primary shards for all nodes in this tier
- doc_count :: number of documents for all nodes in this tier
- total_size_bytes :: total number of bytes for all shards for all nodes in this tier
- primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
- primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
- primary_shard_size_median_bytes :: median shard size for primary shard in this tier
- primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 1, 2020
This commit adds telemetry for our data tier formalization. This telemetry helps determine the
topology of the cluster with regard to the content, hot, warm, & cold tiers/roles.

An example of the telemetry looks like:

```
GET /_xpack/usage?human
{
  ...
  "data_tiers" : {
    "available" : true,
    "enabled" : true,
    "data_warm" : {
      ...
    },
    "data_cold" : {
      ...
    },
    "data_content" : {
      "node_count" : 1,
      "index_count" : 6,
      "total_shard_count" : 6,
      "primary_shard_count" : 6,
      "doc_count" : 71,
      "total_size" : "59.6kb",
      "total_size_bytes" : 61110,
      "primary_size" : "59.6kb",
      "primary_size_bytes" : 61110,
      "primary_shard_size_avg" : "9.9kb",
      "primary_shard_size_avg_bytes" : 10185,
      "primary_shard_size_median" : "8kb",
      "primary_shard_size_median_bytes" : 8254,
      "primary_shard_size_mad" : "7.2kb",
      "primary_shard_size_mad_bytes" : 7391
    },
    "data_hot" : {
       ...
    }
  }
}
```

The fields are as follows:

- node_count :: number of nodes with this tier/role
- index_count :: number of indices on this tier
- total_shard_count :: total number of shards for all nodes in this tier
- primary_shard_count :: number of primary shards for all nodes in this tier
- doc_count :: number of documents for all nodes in this tier
- total_size_bytes :: total number of bytes for all shards for all nodes in this tier
- primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
- primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
- primary_shard_size_median_bytes :: median shard size for primary shard in this tier
- primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier

Relates to elastic#60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 7, 2020
This adds release note highlights for the data tiers formalization feature.

Relates to elastic#60848
dakrone added a commit that referenced this issue Oct 8, 2020
This adds release note highlights for the data tiers formalization feature.

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 12, 2020
When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can
lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+
with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since
the older node has no concept of these roles).

This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used
for serializing to an older Elasicsearch version. The new formalized data tier
roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as
regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to
still contain data during a rolling upgrade where the master is upgraded last.

Relates to elastic#60848
dakrone added a commit that referenced this issue Oct 13, 2020
…63581)

When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can
lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+
with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since
the older node has no concept of these roles).

This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used
for serializing to an older Elasicsearch version. The new formalized data tier
roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as
regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to
still contain data during a rolling upgrade where the master is upgraded last.

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 13, 2020
…lastic#63581)

When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can
lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+
with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since
the older node has no concept of these roles).

This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used
for serializing to an older Elasicsearch version. The new formalized data tier
roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as
regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to
still contain data during a rolling upgrade where the master is upgraded last.

Relates to elastic#60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 13, 2020
…lastic#63581)

When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can
lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+
with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since
the older node has no concept of these roles).

This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used
for serializing to an older Elasicsearch version. The new formalized data tier
roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as
regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to
still contain data during a rolling upgrade where the master is upgraded last.

Relates to elastic#60848
dakrone added a commit that referenced this issue Oct 13, 2020
…tion (#63581) (#63612)

When introducing new roles, older versions of Elasticsearch aren't aware of these roles. This can
lead to a situation where an old (say 7.9.x) master node sees nodes in the cluster of version 7.10+
with "data_hot" or "data_content" roles, and thinks those roles are not eligible to hold data (since
the older node has no concept of these roles).

This adds a method to `DiscoveryNodeRole` where a role can return a compatibility role to be used
for serializing to an older Elasicsearch version. The new formalized data tier
roles (`data_content`, `data_hot`, `data_warm`, `data_cold`) uses this mechanism to serialize as
regular "data" roles when talking to an older Elasticsearch node. This will allow these nodes to
still contain data during a rolling upgrade where the master is upgraded last.

Relates to #60848
@dakrone
Copy link
Member Author

dakrone commented Oct 13, 2020

Going to close this as it has been added and will be available in 7.10+. Any future work we can open as individual issues.

@dakrone dakrone closed this as completed Oct 13, 2020
dakrone added a commit to dakrone/elasticsearch that referenced this issue Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.

The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).

Relates to elastic#60848
dakrone added a commit that referenced this issue Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.

The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).

Relates to #60848
dakrone added a commit to dakrone/elasticsearch that referenced this issue Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.

The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).

Relates to elastic#60848
dakrone added a commit that referenced this issue Feb 5, 2021
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.

The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).

Relates to #60848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Meta Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

3 participants