Allocate newly created indices on data_hot tier nodes #61342

dakrone · 2020-08-19T16:50:25Z

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the data role are considered to be part of
the hot tier. Users that separate their deployments by using the data_hot (and data_warm,
data_cold, data_frozen) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
index.routing.allocation.include._tier from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings

Returns the default settings now of:

{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is not set on a new index if any of the following is true:

The index is created with an index.routing.allocation.include.<anything> setting
The index is created with an index.routing.allocation.exclude.<anything> setting
The index is created with an index.routing.allocation.require.<anything> setting
The index is created with a null index.routing.allocation.include._tier value
The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include._tier` setting - The index is created with an `index.routing.allocation.exclude._tier` setting - The index is created with an `index.routing.allocation.require._tier` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to elastic#60848

elasticmachine · 2020-08-19T16:50:27Z

Pinging @elastic/es-core-features (:Core/Features/Features)

jasontedor

I wonder if the pluggable mechanism should not be to hook into when an index is being created, but merely for a plugin to return a list of default index settings that are explicitly applied to an index upon index creation unless overridden by the create index request (template, source index from a shrink/split/clone). This at least reduces the scope of pluggability.

dakrone · 2020-08-19T22:11:11Z

@jasontedor Ryan and I had a conversation and agreed that this sounds like a good solution and a reasonable addition for plugin extensibility. I'm going to work on an implementation that will change it to this behavior

…etting in place

dakrone · 2020-08-24T22:59:49Z

@elasticmachine run elasticsearch-ci/packaging-sample-windows

(build timed out downloading gradle)

dakrone · 2020-08-24T23:14:24Z

@elasticmachine update branch

dakrone · 2020-08-24T23:59:48Z

Okay, I've updated this to use the discussed plugin method, and this is ready for review.

@rjernst please take a look at the plugin integration point if you're interested

It is also worth mentioning that @andreidan and I discussed adding an index-level setting that will opt-out of this behavior in #61377 (comment) but since the setting does not yet exist (to be added either in Andrei's PR on subsequent work), I cannot make use of it for some of our built-in indices (yet!).

jasontedor · 2020-08-25T00:22:43Z

Why do we need a setting for that? Automatic allocation should only occur if no explicit allocation is defined? So if a user wants to opt out, it’s sufficient to define explicit allocation?

dakrone · 2020-08-25T00:56:16Z

The idea would be that it would be a single unified setting to opt out of not only the initial hot allocation by default, but also the automatic tier migration that ILM will do.

jakelandis

I like the direction and just a few comments inline with code...

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/DataTier.java

jakelandis · 2020-08-26T19:49:13Z

docs/reference/api-conventions.asciidoc

@@ -391,6 +391,7 @@ Returns:
      "index.creation_date": "1474389951325",


probably want to update https://www.elastic.co/guide/en/elasticsearch/reference/7.9/shard-allocation-filtering.html with or soon after this PR. (also https://www.elastic.co/guide/en/elasticsearch/reference/7.9/modules-node.html#data-node could use an update, but is outside the scope of this PR)

Definitely, I am leaving documentation for a subsequent PR though, in the chance that these end up having a name change. I will make sure they are handled in the documentation PR.

server/src/main/java/org/elasticsearch/index/shard/ExplicitIndexSettingProvider.java

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/DataTier.java

...plugin/core/src/test/java/org/elasticsearch/xpack/cluster/routing/allocation/DataTierIT.java

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java

…n-hot

dakrone · 2020-08-26T20:37:44Z

Thanks @jakelandis, I think I addressed your comments so far

rjernst

I left a couple comments on the plugin portion

server/src/main/java/org/elasticsearch/index/shard/ExplicitIndexSettingProvider.java

rjernst · 2020-08-26T21:05:10Z

server/src/main/java/org/elasticsearch/index/shard/ExplicitIndexSettingProvider.java

+    /**
+     * Returns explicitly set default index {@link Settings} for the given index.
+     */
+    default Settings getExplicitIndexSettings(String indexName, Settings templateAndRequestSettings) {


Can we do without the passed in settings? I had imagined the caller would do the merging/defaulting logic, so it's unclear why we would need the user passed in settings.

We need the settings to be able to not set mutually exclusive settings, for example, if a user created an index with:

POST /myindex { "settings": { "index.routing.allocation.require._name": "mynode" } }

We need to not set index.routing.allocation.include._tier automatically, because otherwise we're constraining a new index to a tier when the user specifically wanted it constrained to single node (we check the all index level filtering settings). These settings are the only way we can make explicit default index settings reactive to other index level settings (such as adding an opt-out index level setting in the future)

server/src/main/java/org/elasticsearch/plugins/IndexSettingsProviderPlugin.java

server/src/main/java/org/elasticsearch/index/shard/ExplicitIndexSettingProvider.java

jakelandis

LGMT. I don't have any strong opinion on the naming .. but thanks for comments you added, it helped alot to understand the purpose.

dakrone · 2020-08-27T16:25:33Z

@elasticmachine run elasticsearch-ci/1

(timed out downloading something from archive.ubuntu.com)

dakrone · 2020-08-27T17:06:14Z

@elasticmachine update branch

rjernst

LGTM

server/src/main/java/org/elasticsearch/plugins/Plugin.java

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to elastic#60848

…61650) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is *not* set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848

dakrone added :Core/Features/Features v8.0.0 v7.10.0 labels Aug 19, 2020

dakrone requested a review from andreidan August 19, 2020 16:50

elasticmachine added the Team:Data Management Meta label for data/management team label Aug 19, 2020

dakrone mentioned this pull request Aug 19, 2020

Formalize the concept of data tiers in Elasticsearch #60848

Closed

18 tasks

Fix docs

3b60e93

jasontedor reviewed Aug 19, 2020

View reviewed changes

dakrone added 2 commits August 20, 2020 14:04

Switch from setting listener to explicit index setting provider

5005a73

Unset tier include for JDBC tests

b4183a7

andreidan mentioned this pull request Aug 21, 2020

ILM migrate data between tiers #61377

Merged

dakrone added 3 commits August 24, 2020 09:21

Use Settings instead of a Map<String, String>

dfe922b

Check for all index level allocation settings when putting explicit s…

08e4ab5

…etting in place

Fix some tests

f1c44a2

Fix ILM test

53a5ed7

Merge branch 'master' into dt-default-deploy-on-hot

ed6c3dc

dakrone requested a review from jakelandis August 26, 2020 15:48

jakelandis reviewed Aug 26, 2020

View reviewed changes

dakrone added 4 commits August 26, 2020 14:27

Merge remote-tracking branch 'origin/master' into dt-default-deploy-o…

7c001cf

…n-hot

Fix typos and listener -> provider after refactor

5039087

Pass exceptions through, preventing index creation if provider fails

d659748

Add debug log message when skipping hot tier allocation

055c642

dakrone added 2 commits August 26, 2020 14:34

Rename node starting helpers in DataTierIT

7b29826

Add debug log for explicit setting cancelled out by template/request

7a70b0c

dakrone requested a review from jakelandis August 26, 2020 20:37

rjernst reviewed Aug 26, 2020

View reviewed changes

jakelandis reviewed Aug 27, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/index/shard/ExplicitIndexSettingProvider.java Outdated Show resolved Hide resolved

jakelandis approved these changes Aug 27, 2020

View reviewed changes

Rename Explicit -> Additional, collapse to Plugin instead of new plugin

17bde03

Merge branch 'master' into dt-default-deploy-on-hot

e95e922

rjernst approved these changes Aug 27, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/plugins/Plugin.java Outdated Show resolved Hide resolved

getAdditionalSettingProviders -> getAdditionalIndexSettingProviders

2aa7b4f

dakrone merged commit 28cec56 into elastic:master Aug 27, 2020

dakrone deleted the dt-default-deploy-on-hot branch August 27, 2020 18:51

dakrone added the backport pending label Aug 27, 2020

dakrone mentioned this pull request Aug 27, 2020

[7.x] Allocate newly created indices on data_hot tier nodes (#61342) #61650

Merged

dakrone removed the backport pending label Sep 8, 2020

andreidan added the >feature label Oct 8, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocate newly created indices on data_hot tier nodes #61342

Allocate newly created indices on data_hot tier nodes #61342

dakrone commented Aug 19, 2020 •

edited

Loading

elasticmachine commented Aug 19, 2020

jasontedor left a comment

dakrone commented Aug 19, 2020

dakrone commented Aug 24, 2020

dakrone commented Aug 24, 2020

dakrone commented Aug 24, 2020 •

edited

Loading

jasontedor commented Aug 25, 2020

dakrone commented Aug 25, 2020 via email •

edited

Loading

jakelandis left a comment

jakelandis Aug 26, 2020

dakrone Aug 26, 2020 •

edited

Loading

dakrone commented Aug 26, 2020

rjernst left a comment

rjernst Aug 26, 2020

dakrone Aug 26, 2020

jakelandis left a comment •

edited

Loading

dakrone commented Aug 27, 2020

dakrone commented Aug 27, 2020

rjernst left a comment

		@@ -391,6 +391,7 @@ Returns:
		"index.creation_date": "1474389951325",

Allocate newly created indices on data_hot tier nodes #61342

Allocate newly created indices on data_hot tier nodes #61342

Conversation

dakrone commented Aug 19, 2020 • edited Loading

elasticmachine commented Aug 19, 2020

jasontedor left a comment

Choose a reason for hiding this comment

dakrone commented Aug 19, 2020

dakrone commented Aug 24, 2020

dakrone commented Aug 24, 2020

dakrone commented Aug 24, 2020 • edited Loading

jasontedor commented Aug 25, 2020

dakrone commented Aug 25, 2020 via email • edited Loading

jakelandis left a comment

Choose a reason for hiding this comment

jakelandis Aug 26, 2020

Choose a reason for hiding this comment

dakrone Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

dakrone commented Aug 26, 2020

rjernst left a comment

Choose a reason for hiding this comment

rjernst Aug 26, 2020

Choose a reason for hiding this comment

dakrone Aug 26, 2020

Choose a reason for hiding this comment

jakelandis left a comment • edited Loading

Choose a reason for hiding this comment

dakrone commented Aug 27, 2020

dakrone commented Aug 27, 2020

rjernst left a comment

Choose a reason for hiding this comment

dakrone commented Aug 19, 2020 •

edited

Loading

dakrone commented Aug 24, 2020 •

edited

Loading

dakrone commented Aug 25, 2020 via email •

edited

Loading

dakrone Aug 26, 2020 •

edited

Loading

jakelandis left a comment •

edited

Loading