Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate newly created indices on data_hot tier nodes #61342

Merged
merged 18 commits into from
Aug 27, 2020

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Aug 19, 2020

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the data role are considered to be part of
the hot tier. Users that separate their deployments by using the data_hot (and data_warm,
data_cold, data_frozen) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
index.routing.allocation.include._tier from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings

Returns the default settings now of:

{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is not set on a new index if any of the following is true:

  • The index is created with an index.routing.allocation.include.<anything> setting
  • The index is created with an index.routing.allocation.exclude.<anything> setting
  • The index is created with an index.routing.allocation.require.<anything> setting
  • The index is created with a null index.routing.allocation.include._tier value
  • The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include._tier` setting
- The index is created with an `index.routing.allocation.exclude._tier` setting
- The index is created with an `index.routing.allocation.require._tier` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to elastic#60848
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Features)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Aug 19, 2020
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the pluggable mechanism should not be to hook into when an index is being created, but merely for a plugin to return a list of default index settings that are explicitly applied to an index upon index creation unless overridden by the create index request (template, source index from a shrink/split/clone). This at least reduces the scope of pluggability.

@dakrone
Copy link
Member Author

dakrone commented Aug 19, 2020

@jasontedor Ryan and I had a conversation and agreed that this sounds like a good solution and a reasonable addition for plugin extensibility. I'm going to work on an implementation that will change it to this behavior

@dakrone
Copy link
Member Author

dakrone commented Aug 24, 2020

@elasticmachine run elasticsearch-ci/packaging-sample-windows

(build timed out downloading gradle)

@dakrone
Copy link
Member Author

dakrone commented Aug 24, 2020

@elasticmachine update branch

@dakrone
Copy link
Member Author

dakrone commented Aug 24, 2020

Okay, I've updated this to use the discussed plugin method, and this is ready for review.

@rjernst please take a look at the plugin integration point if you're interested

It is also worth mentioning that @andreidan and I discussed adding an index-level setting that will opt-out of this behavior in #61377 (comment) but since the setting does not yet exist (to be added either in Andrei's PR on subsequent work), I cannot make use of it for some of our built-in indices (yet!).

@jasontedor
Copy link
Member

Why do we need a setting for that? Automatic allocation should only occur if no explicit allocation is defined? So if a user wants to opt out, it’s sufficient to define explicit allocation?

@dakrone
Copy link
Member Author

dakrone commented Aug 25, 2020 via email

@dakrone dakrone requested a review from jakelandis August 26, 2020 15:48
Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction and just a few comments inline with code...

@@ -391,6 +391,7 @@ Returns:
"index.creation_date": "1474389951325",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to update https://www.elastic.co/guide/en/elasticsearch/reference/7.9/shard-allocation-filtering.html with or soon after this PR. (also https://www.elastic.co/guide/en/elasticsearch/reference/7.9/modules-node.html#data-node could use an update, but is outside the scope of this PR)

Copy link
Member Author

@dakrone dakrone Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I am leaving documentation for a subsequent PR though, in the chance that these end up having a name change. I will make sure they are handled in the documentation PR.

@dakrone
Copy link
Member Author

dakrone commented Aug 26, 2020

Thanks @jakelandis, I think I addressed your comments so far

@dakrone dakrone requested a review from jakelandis August 26, 2020 20:37
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple comments on the plugin portion

/**
* Returns explicitly set default index {@link Settings} for the given index.
*/
default Settings getExplicitIndexSettings(String indexName, Settings templateAndRequestSettings) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do without the passed in settings? I had imagined the caller would do the merging/defaulting logic, so it's unclear why we would need the user passed in settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the settings to be able to not set mutually exclusive settings, for example, if a user created an index with:

POST /myindex
{
  "settings": {
    "index.routing.allocation.require._name": "mynode"
  }
}

We need to not set index.routing.allocation.include._tier automatically, because otherwise we're constraining a new index to a tier when the user specifically wanted it constrained to single node (we check the all index level filtering settings). These settings are the only way we can make explicit default index settings reactive to other index level settings (such as adding an opt-out index level setting in the future)

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. I don't have any strong opinion on the naming .. but thanks for comments you added, it helped alot to understand the purpose.

@dakrone
Copy link
Member Author

dakrone commented Aug 27, 2020

@elasticmachine run elasticsearch-ci/1

(timed out downloading something from archive.ubuntu.com)

@dakrone
Copy link
Member Author

dakrone commented Aug 27, 2020

@elasticmachine update branch

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

server/src/main/java/org/elasticsearch/plugins/Plugin.java Outdated Show resolved Hide resolved
@dakrone dakrone merged commit 28cec56 into elastic:master Aug 27, 2020
@dakrone dakrone deleted the dt-default-deploy-on-hot branch August 27, 2020 18:51
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 27, 2020
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to elastic#60848
dakrone added a commit that referenced this pull request Aug 27, 2020
…61650)

This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants