-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Adds release highlights for search for 6.4 #32095
Conversation
Pinging @elastic/es-search-aggs |
|
||
=== Search | ||
|
||
* Cross Cluster Search will no longer use dedicated master nodes as gatway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This causes problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/gatway/gateway
Also This "may" cause problems ?
|
||
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lucene fieldto allow
-> typo
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) | ||
* Korean analysis tools - A new module has been added which provides analysis tools for the Korean language. The new `nori` analyzercan be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397) | ||
* Add multiplexing token filter - This new token filter allows you to run tokens through multiple different tokenfilters and stack the results. For example, you can now easily index the original form of a token, its lowercase form and a stemmed form all at the same position, allowing you to search for stemmed and unstemmed tokens in the same field. (https://github.com/elastic/elasticsearch/pull/31208) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
=== Rank Eval API | ||
|
||
* Expected Reciprocal Rank metric for Rank Eval API - The Expected Reciprocal Rank has been added to the available metrics int he Rank Eval API. ERR is an extension of the classical reciprocal rank which in order to determine the usefulness of a document at position K in the results, it uses the degree of relevance of the document at posiitons less than K as well. (https://github.com/elastic/elasticsearch/pull/31891) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good from my side about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, just found a typo: s/int he/in the/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, lgtm otherwise
|
||
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/fieldto/field to/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say "with the trade-off of consuming more disk space in the index" rather than "a bit more" since we don't try to limit the shingles and they are expensive in terms of disk space.
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) | ||
* Korean analysis tools - A new module has been added which provides analysis tools for the Korean language. The new `nori` analyzercan be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/A new module/A new plugin/
nit: s/analyzercan/analyzer can/
|
||
=== Search | ||
|
||
* Cross Cluster Search will no longer use dedicated master nodes as gatway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This causes problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/as gatway/as gateway/
|
||
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field to
I would also remove "a bit" as the overhead may be significant
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) | ||
* Korean analysis tools - A new module has been added which provides analysis tools for the Korean language. The new `nori` analyzercan be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/analyzercan/analyzer can/
|
||
* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926) | ||
* Format option for doc_value fields - `doc_value` fields in the Search API can now specify a `format` field to control the format of the value in the response. (https://github.com/elastic/elasticsearch/pull/29639) | ||
* Second level of field collapse (https://github.com/elastic/elasticsearch/pull/31808) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either expand on this with a paragraph or remove from the highlights. Perhaps something like: Support second level of field collapse, which allows users to retrieve the top item for two fields, such as retrieving top scored tweets by country, and for each country, top scored tweets for each user. This can be an alternative to using nested terms aggregations along with top hits on the inner hits.
probably @eskibars or @zuketo can write something better, but, hoping we can expand on this one a bit if we leave it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I must have missed this when I was expanding all the points from my initial list. I'll address this tomorrow and expand upon it
@pcsanwald I pushed a commit which expands on teh field collapse bullet that I missed before, could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
=== Aggregations | ||
|
||
* Auto-interval Date Histogram - A new `auto_date_histogram` aggregaiton has been added which instead of taking an `interval` takes a `buckets` option which defines the maximum number of buckets it should return. The aggregation internally determines the best interval to use to get as close to the `bucket` option as possible without exceeding it. (https://github.com/elastic/elasticsearch/pull/28993) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little typo here on "aggregaiton"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a link to the documentation (e.g. https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-autodatehistogram-aggregation.html), though I don't see that page in the 6.x version yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto-interval date histogram has not quite been backported to 6.x yet. @pcsanwald is working on back porting it and I'll add this link when that is done
@@ -7,3 +7,27 @@ | |||
coming[6.4..0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be 6.4.0 not 6.4..0
|
||
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene field to allow faster, more efficient, phrase searches on that field with the trade-off of consuming more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be sensing a pattern here, but I think for more information, folks could also be directed to this link: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/text.html
=== Analysis | ||
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene field to allow faster, more efficient, phrase searches on that field with the trade-off of consuming more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) | ||
* Korean analysis tools - A new plugin has been added which provides analysis tools for the Korean language. The new `nori` analyzer can be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding a link to https://www.elastic.co/guide/en/elasticsearch/plugins/6.x/analysis-nori.html
|
||
* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene field to allow faster, more efficient, phrase searches on that field with the trade-off of consuming more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450) | ||
* Korean analysis tools - A new plugin has been added which provides analysis tools for the Korean language. The new `nori` analyzer can be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397) | ||
* Add multiplexing token filter - This new token filter allows you to run tokens through multiple different tokenfilters and stack the results. For example, you can now easily index the original form of a token, its lowercase form and a stemmed form all at the same position, allowing you to search for stemmed and unstemmed tokens in the same field. (https://github.com/elastic/elasticsearch/pull/31208) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link for this one should be: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/analysis-multiplexer-tokenfilter.html
|
||
=== Mappings | ||
|
||
* `_ignored` meta field - A new meta field has been added to documents. The `_ignored` field will contain the field names of any fields that were ignored at index time due to the `ignore_malformed` option. This means that malformed documents can be more easily discovered by using `exists` or `term` queries on this new meta field. (https://github.com/elastic/elasticsearch/pull/29658) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=== Rank Eval API | ||
|
||
* Expected Reciprocal Rank metric for Rank Eval API - The Expected Reciprocal Rank has been added to the available metrics in the Rank Eval API. ERR is an extension of the classical reciprocal rank which in order to determine the usefulness of a document at position K in the results, it uses the degree of relevance of the document at posiitons less than K as well. (https://github.com/elastic/elasticsearch/pull/31891) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be documented? Couldn't see it in https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-rank-eval.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher could you raise a PR to add documentation for the ERR metric please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I opened #32314
|
||
=== Search | ||
|
||
* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this info be added to https://www.elastic.co/guide/en/elasticsearch/reference/6.x/modules-cross-cluster-search.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna do we need to add anything to the documentation for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current docs don't go into how and which nodes are selected, hence I didn't add this explanation when making the change. We can probably explain more of the internals to the docs, but that should be a separate issue/PR.
=== Search | ||
|
||
* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926) | ||
* Format option for doc_value fields - `doc_value` fields in the Search API can now specify a `format` field to control the format of the value in the response. (https://github.com/elastic/elasticsearch/pull/29639) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend adding a link to https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-request-docvalue-fields.html
|
||
* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926) | ||
* Format option for doc_value fields - `doc_value` fields in the Search API can now specify a `format` field to control the format of the value in the response. (https://github.com/elastic/elasticsearch/pull/29639) | ||
* Support second level of field collapse - This extends the field collapsing feature to allow the top item for two fields to be retrieved. For example retrieving top scored tweets by country, and for each country, top scored tweets for each user. This can be an alternative to using multiple levels of terms aggregations along with top hits.(https://github.com/elastic/elasticsearch/pull/31808) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link for this one would be https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-request-collapse.html
@lcawl thanks for the review. I agree it would be good to add links to the relevant documentation pages for each feature. Though I am wondering how this should be done? For example we could:
Which would you prefer? or alternatively do you have another suggestion? I also wonder about the PR links. At the moment I have them at the end of each item in brackets mostly so reviewers can see the change the item relates to. I'm not sure if we want to have the PR links in this document or whether the PR links are only relevant in the changelog document (which lists all changes not just highlights)? If we do want the PR linked here how do you think we should do it? Maybe as a |
In my opinion, a combination of the second and third options (i.e. "For more information" and linked text) is good. That's along the lines of what they did in the Kibana Release Highlights (e.g. https://www.elastic.co/guide/en/kibana/current/release-highlights-6.3.0.html) and I think they look good. My inclination would be to leave the PR links in the Release Notes only, but @debadair and @Sue-Gallagher might want to weigh in too. |
Some of the comments require follow up
@lcawl I've pushed a commit that removes the PR links and adds most of the doc links. #32095 (comment), #32095 (comment) and #32095 (comment) require some follow up before we can add links which will hopefully come in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is because it will not make 6.4
Just waiting for #32184 to be merged and then I'll merge this |
* 6.x: Security: revert to old way of merging automata (#32254) Fix a test bug in RangeQueryBuilderTests introduced in the field aliases backport. Introduce Application Privileges with support for Kibana RBAC (#32309) Undo a debugging change that snuck in during the field aliases merge. [test] port linux package packaging tests (#31943) Painless: Update More Methods to New Naming Scheme (#32305) Tribe: Add error with secure settings copied to tribe (#32298) Add V_6_3_3 version constant Add ERR to ranking evaluation documentation (#32314) [DOCS] Added link to 6.3.2 RNs [DOCS] Updates 6.3.2 release notes with PRs from ml-cpp repo (#32334) [Kerberos] Add Kerberos authentication support (#32263) [ML] Extract persistent task methods from MlMetadata (#32319) Backport - Add Snapshots Status API to High Level Rest Client (#32295) Make release notes ignore the `>test-failure` label. (#31309) [DOCS] Adds release highlights for search for 6.4 (#32095) Allow Integ Tests to run in a FIPS-140 JVM (#32316) Add support for field aliases to 6.x. (#32184) Register ERR metric with NamedXContentRegistry (#32320) fixes broken build for third-party-tests (#32315) Relates #31918 / Closes infra/issues/6085 [DOCS] Rollup Caps API incorrectly mentions GET Jobs API (#32280) Rest HL client: Add put watch action (#32026) (#32191) Add WeightedAvg metric aggregation (#31037) Consistent encoder names (#29492) Switch monitoring to new style Requests (#32255) specify subdirs of lib, bin, modules in package (#32253) Rename ranking evaluation `quality_level` to `metric_score` (#32168) Add new permission for JDK11 to load JAAS libraries (#32132) Switch x-pack:core to new style Requests (#32252) Watcher: Store username on watch execution (#31873) Silence SSL reload test that fails on JDK 11 Painless: Clean up add methods in PainlessLookup (#32258) CCE when re-throwing "shard not available" exception in TransportShardMultiGetAction (#32185) Fail shard if IndexShard#storeStats runs into an IOException (#32241) Fix `range` queries on `_type` field for singe type indices (#31756) (#32161) AwaitsFix RecoveryIT#testHistoryUUIDIsGenerated Add new fields to monitoring template for Beats state (#32085) (#32273) [TEST] improve REST high-level client naming conventions check (#32244) Check that client methods match API defined in the REST spec (#31825)
I'm not sure how we want to layout this document so my focus so far is on content rather than style and layout. Hopefully @debadair and/or @lcawl can either advise on layout here or can tweak the layout later.