Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable _all by default, disallow configuring _all on 6.0+ indices #22144

Merged
merged 1 commit into from
Jan 12, 2017

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Dec 13, 2016

This change disables the _all meta field by default.

Now that we have the "all-fields" method of query execution, we can save both
indexing time and disk space by disabling it.

_all can no longer be configured for indices created after 6.0.

Relates to #20925 and #21341
Resolves #19784

@dakrone dakrone added :Search Foundations/Mapping Index mappings, including merging and defining field types >breaking v6.0.0-alpha1 labels Dec 13, 2016
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left some minor things that need clarifications.
Other than that this change looks good. I am surprised that it did not require more changes in the tests but I guess that the all_fields mode covers most of the use case.

@@ -270,9 +272,6 @@ public void testRandom() throws Exception {
booleanOptionList.add(new Tuple<>("store_term_vectors", tv_stored = randomBoolean()));
}
if (randomBoolean()) {
booleanOptionList.add(new Tuple<>("enabled", enabled = randomBoolean()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it was random before why do you need to force enabled to true ?

@@ -2435,6 +2435,7 @@ public void testPostingsHighlighterCommonTermsQuery() throws IOException {

private static XContentBuilder type1PostingsffsetsMapping() throws IOException {
return XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("_all").field("enabled", true).endObject()
.startObject("properties")
.startObject("field1").field("type", "text").field("index_options", "offsets").endObject()
.startObject("field2").field("type", "text").field("index_options", "offsets").endObject()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain very briefly why it fails the test with _all disabled ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testPostingsHighlighter has a test that queries on the _all field and highlights other fields. Without this, it would fail.

I can remove the particular clause if you think that would be a better solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for the explanation. BTW are we planning to add support for all_fields in the multi match query ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we planning to add support for all_fields in the multi_match query ?

I don't know, do you think we should? Someone could just as easily use "fields": ["*"] with lenient: true since they already have to specify fields in the request? I'm totally open to adding it though.

assertAcked(prepareCreate("test").addMapping("type1", "nested1", "type=nested").addMapping("type2", "nested1", "type=nested"));
assertAcked(prepareCreate("test")
.addMapping("type1", "nested1", "type=nested", "_all", "enabled=true")
.addMapping("type2", "nested1", "type=nested", "_all", "enabled=true"));
ensureGreen();

// check on no data, see it works
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an actual check, see a few lines down for:

// check that _all is working on nested docs

Which requires _all for the test to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks

@@ -252,8 +252,7 @@ public void testSimpleQueryStringFlags() throws ExecutionException, InterruptedE
searchResponse = client()
.prepareSearch()
.setQuery(
simpleQueryStringQuery("baz | egg*").defaultOperator(Operator.AND).flags(SimpleQueryStringFlag.WHITESPACE,
SimpleQueryStringFlag.PREFIX)).get();
simpleQueryStringQuery("blah | egg*").flags(SimpleQueryStringFlag.WHITESPACE, SimpleQueryStringFlag.PREFIX)).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to remove this .defaultOperator(Operator.AND). It looks like a test use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this to be a bit more transparent of a test. It's not a very good check right now.

==== The `_all` meta field is now disabled by default

On new mappings, the `_all` meta field that contains a copy of the text from
each field is not disabled by default. The `query_string` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not enabled by default ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, "not" should be "now". I'll fix.

check if `_all` is enabled/disabled and switch to executing the query across all
fields if `_all` is disabled. `_all` may be re-enabled by setting `enabled`:
`true` in the mapping configuration for `_all`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a sentence that says that a reindex is required when _all is re-enabled ?

"query": {
"query_string": {
"query": "post_date:foo",
"lenient": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false is still the default value, so why is this needed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_string defaults to all-fields mode if _all is disabled, which turns on lenient automatically, so this is needed in order to make the query non-valid.

"query_string": {
"query": "post_date:foo",
"lenient": false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above :)

@dakrone
Copy link
Member Author

dakrone commented Dec 13, 2016

I pushed some commits, thanks for looking @jimczi!

@jpountz
Copy link
Contributor

jpountz commented Dec 14, 2016

I don't like having a feature that has such a large impact on the code base disabled by default, as it means the maintenance cost remains the same (=high), while the benefit to our users decreases significantly, as most of them will not opt in. Two possible paths that I can think of are either to remove it entirely (ie. forbid 6.x indices from enabling it and removing in 7), or keep _all as a simple text field that is used as a name convention for a catch-all field that other fields can copy to.

@jimczi
Copy link
Contributor

jimczi commented Dec 14, 2016

I don't like having a feature that has such a large impact on the code base disabled by default

The impact is pretty contained and should not change the OOB significantly. Though I agree that keeping the feature is costly so I think that forbids in 6 and removing in 7 is a good plan.

or keep _all as a simple text field that is used as a name convention for a catch-all field that other fields can copy to.

Not sure it would be useful if we don't copy to it by default and if we always use the all_fields mode in the query_string. It's really an OOB issue which seems to be resolved by the all_fields mode.

@dakrone
Copy link
Member Author

dakrone commented Dec 14, 2016

Alright, sounds like the consensus is to disallow in 6.0+ and we can eventually remove it in 7.0, I'll update the PR and title.

@dakrone dakrone changed the title Disable _all by default Disable _all by default, disallow configuring _all on 6.0+ indices Jan 8, 2017
@dakrone
Copy link
Member Author

dakrone commented Jan 8, 2017

@jpountz @jimczi okay, I've changed this to disallow configuring _all for indices created 6.0+ in addition to it being disabled by default. Sorry for the large PR, but it was mostly removing tests that are no longer needed.

@dakrone
Copy link
Member Author

dakrone commented Jan 10, 2017

retest this please

each field is now disabled by default. The `query_string` and
`simple_query_string` queries that previously used `_all` to search will now
check if `_all` is enabled/disabled and switch to executing the query across all
fields if `_all` is disabled. `_all` may be re-enabled by setting `enabled`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch to executing the query across all fields if _all is disabled

I hope not all fields, but only the fields that I specified the include_in_all: True option? Can you please clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't specify anything in the fields parameter, then simple_query_string and query_string will automatically try to execute the search across all searchable fields (see: #20925). If you want to limit it, then you can specify the "fields" parameter in the query.

Additionally, you could use copy_to and create your own all-type field as a composite of whatever fields you want, and then set that as the default_field for the index.

Copy link

@vincentfretin vincentfretin Jan 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your prompt response and the link to #20925. Currently in elasticsearch 5.0 I use include_in_all: False on everything except on text fields, so what I'll have to do with this change is indeed to create my own field with copy_to, or specify all the text fields I want in fields when I do the query.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dakrone
This looks good. I can imagine how painful it was to understand why all this tests were relying on _all and how to remove it without loosing the test case.
Though I think we should at least keep some test around regarding the AllFieldType and maybe the copyToAll functionality. It will surely be removed but in the mean time we may add things that breaks it.

IMO we should completely disallow enabling _all even if the index was created < 6.x. We don't want to create more _all users ;)

assertWarnings("Deprecated field [template] used, replaced by [index_patterns]");
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe keep this test around and just remove _all from the templates ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though then it won't be the real templates from Logstash and Beats, but I guess it is still testing the other features.

`true` in the mapping configuration for `_all`. If `_all` is enabled on an
existing index data that was previously indexed will need to be reindexed in
order to populate the `_all` field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is complicated and maybe not needed. If we don't accept new mapping with _all enabled maybe we should also not allow to enable _all on an index that was created in older versions ? So we would completely disallow enabling _all but continue to handle indices created in older versions with _all already indexed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do disallow this for 5.0 indices, see UpdateMappingOnClusterIT which tests this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good then you can remove this sentence ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, good catch!

@dakrone
Copy link
Member Author

dakrone commented Jan 11, 2017

@jimczi I re-added some tests based on your feedback, thanks for taking a look!

regarding the AllFieldType and maybe the copyToAll functionality

Can you describe what you mean here? Do you mean just that fields are copied to the _all field automatically?

@jimczi
Copy link
Contributor

jimczi commented Jan 11, 2017

Can you describe what you mean here? Do you mean just that fields are copied to the _all field automatically?

Yes and the AllFieldTypeTest. Why is it removed ?

@dakrone
Copy link
Member Author

dakrone commented Jan 11, 2017

the AllFieldTypeTest. Why is it removed ?

My accidental over-zealousness, I've re-added it :)

As far as testing that fields are copied, the _all field is still used for the query_string and simple_query_string tests, which tests that the fields are copied to the _all field (since it queries on them)

@dakrone
Copy link
Member Author

dakrone commented Jan 11, 2017

retest this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This change disables the _all meta field by default.

Now that we have the "all-fields" method of query execution, we can save both
indexing time and disk space by disabling it.

_all can no longer be configured for indices created after 6.0.

Relates to elastic#20925 and elastic#21341
Resolves elastic#19784
exekias added a commit to exekias/beats that referenced this pull request Mar 22, 2017
ES 6.0 deprecates `_all`, see elastic/elasticsearch#22144.

`*.template-es6x.json` has been added in case Elasticsearch 6.X is
detected
tsg pushed a commit to elastic/beats that referenced this pull request Mar 22, 2017
ES 6.0 deprecates `_all`, see elastic/elasticsearch#22144.

`*.template-es6x.json` has been added in case Elasticsearch 6.X is
detected
@lukas-vlcek
Copy link
Contributor

If I want to index documents that have _all field in the top level would it be possible to configure _all field in mapping? AFAIK it is currently not possible to index documents having _all field but in the future, when the historical function of _all field is removed, wouldn't it make sense to threat this field as any other document field?

In other words, will it be possible to index documents like the following one OOB?

{
  "foo": "bar",
  "_all": "dummy"
}

@dakrone
Copy link
Member Author

dakrone commented Apr 3, 2017

In other words, will it be possible to index documents like the following one OOB?

We strongly recommend not naming user-created fields started with an _, as those are reserved for internal use.

You could, however, name it my_all (or something else) and use the copy_to feature on all other mapped fields to create your own version of an _all field.

dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 15, 2017
In 6.0 `_all` is not configurable and disabled by default.

Relates to elastic#22144
dakrone added a commit that referenced this pull request Aug 16, 2017
* Add deprecation logging when _all is enabled

In 6.0 `_all` is not configurable and disabled by default.

Relates to #22144

* Log deprecation regardless of whether _all is enabled or disabled

* Add expected warnings in REST tests

* Assert warnings in unit tests

* Add warning check to various documentation tests

* Add assertWarnings if randomBoolean() is true for enabled
dakrone added a commit that referenced this pull request Aug 16, 2017
* Add deprecation logging when _all is enabled

In 6.0 `_all` is not configurable and disabled by default.

Relates to #22144

* Log deprecation regardless of whether _all is enabled or disabled

* Add expected warnings in REST tests

* Assert warnings in unit tests

* Add warning check to various documentation tests

* Add assertWarnings if randomBoolean() is true for enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search Foundations/Mapping Index mappings, including merging and defining field types v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants