Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Adds release highlights for search for 6.4 #32095

Merged
merged 9 commits into from
Jul 24, 2018
Merged

Conversation

colings86
Copy link
Contributor

@colings86 colings86 commented Jul 16, 2018

I'm not sure how we want to layout this document so my focus so far is on content rather than style and layout. Hopefully @debadair and/or @lcawl can either advise on layout here or can tweak the layout later.

@colings86 colings86 added >docs General docs changes review :Search/Search Search-related issues that do not fall into other categories v6.4.0 labels Jul 16, 2018
@colings86 colings86 self-assigned this Jul 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs


=== Search

* Cross Cluster Search will no longer use dedicated master nodes as gatway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This causes problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gatway/gateway

Also This "may" cause problems ?


=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lucene fieldto allow -> typo


* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
* Korean analysis tools - A new module has been added which provides analysis tools for the Korean language. The new `nori` analyzercan be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397)
* Add multiplexing token filter - This new token filter allows you to run tokens through multiple different tokenfilters and stack the results. For example, you can now easily index the original form of a token, its lowercase form and a stemmed form all at the same position, allowing you to search for stemmed and unstemmed tokens in the same field. (https://github.com/elastic/elasticsearch/pull/31208)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

=== Rank Eval API

* Expected Reciprocal Rank metric for Rank Eval API - The Expected Reciprocal Rank has been added to the available metrics int he Rank Eval API. ERR is an extension of the classical reciprocal rank which in order to determine the usefulness of a document at position K in the results, it uses the degree of relevance of the document at posiitons less than K as well. (https://github.com/elastic/elasticsearch/pull/31891)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good from my side about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, just found a typo: s/int he/in the/

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, lgtm otherwise


=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fieldto/field to/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "with the trade-off of consuming more disk space in the index" rather than "a bit more" since we don't try to limit the shingles and they are expensive in terms of disk space.

=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
* Korean analysis tools - A new module has been added which provides analysis tools for the Korean language. The new `nori` analyzercan be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/A new module/A new plugin/
nit: s/analyzercan/analyzer can/


=== Search

* Cross Cluster Search will no longer use dedicated master nodes as gatway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This causes problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/as gatway/as gateway/


=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field to
I would also remove "a bit" as the overhead may be significant

=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene fieldto allow faster, more efficient, phrase searches on that field with the trade-off of consuming a bit more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
* Korean analysis tools - A new module has been added which provides analysis tools for the Korean language. The new `nori` analyzercan be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/analyzercan/analyzer can/


* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926)
* Format option for doc_value fields - `doc_value` fields in the Search API can now specify a `format` field to control the format of the value in the response. (https://github.com/elastic/elasticsearch/pull/29639)
* Second level of field collapse (https://github.com/elastic/elasticsearch/pull/31808)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either expand on this with a paragraph or remove from the highlights. Perhaps something like: Support second level of field collapse, which allows users to retrieve the top item for two fields, such as retrieving top scored tweets by country, and for each country, top scored tweets for each user. This can be an alternative to using nested terms aggregations along with top hits on the inner hits.

probably @eskibars or @zuketo can write something better, but, hoping we can expand on this one a bit if we leave it in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I must have missed this when I was expanding all the points from my initial list. I'll address this tomorrow and expand upon it

@colings86
Copy link
Contributor Author

@pcsanwald I pushed a commit which expands on teh field collapse bullet that I missed before, could you take another look?

Copy link
Contributor

@pcsanwald pcsanwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@colings86
Copy link
Contributor Author

@debadair @lcawl do you want to review this as well before I push or will you tweak at a later point?


=== Aggregations

* Auto-interval Date Histogram - A new `auto_date_histogram` aggregaiton has been added which instead of taking an `interval` takes a `buckets` option which defines the maximum number of buckets it should return. The aggregation internally determines the best interval to use to get as close to the `bucket` option as possible without exceeding it. (https://github.com/elastic/elasticsearch/pull/28993)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little typo here on "aggregaiton"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a link to the documentation (e.g. https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-autodatehistogram-aggregation.html), though I don't see that page in the 6.x version yet.

Copy link
Contributor Author

@colings86 colings86 Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-interval date histogram has not quite been backported to 6.x yet. @pcsanwald is working on back porting it and I'll add this link when that is done

@@ -7,3 +7,27 @@
coming[6.4..0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be 6.4.0 not 6.4..0


=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene field to allow faster, more efficient, phrase searches on that field with the trade-off of consuming more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be sensing a pattern here, but I think for more information, folks could also be directed to this link: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/text.html

=== Analysis

* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene field to allow faster, more efficient, phrase searches on that field with the trade-off of consuming more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
* Korean analysis tools - A new plugin has been added which provides analysis tools for the Korean language. The new `nori` analyzer can be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


* Option to index phrases on text fields - A new `index_phrases` option has been added to `text` fields. When enabled this option will index 2-shingles of the field in a separate Lucene field to allow faster, more efficient, phrase searches on that field with the trade-off of consuming more disk space in the index. (https://github.com/elastic/elasticsearch/pull/30450)
* Korean analysis tools - A new plugin has been added which provides analysis tools for the Korean language. The new `nori` analyzer can be used to analyze Korean text "out of the box" and custom analyzers can use a tokenizer, part of speech token filter and a Hanja reading form token filter. (https://github.com/elastic/elasticsearch/pull/30397)
* Add multiplexing token filter - This new token filter allows you to run tokens through multiple different tokenfilters and stack the results. For example, you can now easily index the original form of a token, its lowercase form and a stemmed form all at the same position, allowing you to search for stemmed and unstemmed tokens in the same field. (https://github.com/elastic/elasticsearch/pull/31208)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


=== Mappings

* `_ignored` meta field - A new meta field has been added to documents. The `_ignored` field will contain the field names of any fields that were ignored at index time due to the `ignore_malformed` option. This means that malformed documents can be more easily discovered by using `exists` or `term` queries on this new meta field. (https://github.com/elastic/elasticsearch/pull/29658)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== Rank Eval API

* Expected Reciprocal Rank metric for Rank Eval API - The Expected Reciprocal Rank has been added to the available metrics in the Rank Eval API. ERR is an extension of the classical reciprocal rank which in order to determine the usefulness of a document at position K in the results, it uses the degree of relevance of the document at posiitons less than K as well. (https://github.com/elastic/elasticsearch/pull/31891)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbuescher could you raise a PR to add documentation for the ERR metric please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I opened #32314


=== Search

* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna do we need to add anything to the documentation for this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current docs don't go into how and which nodes are selected, hence I didn't add this explanation when making the change. We can probably explain more of the internals to the docs, but that should be a separate issue/PR.

=== Search

* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926)
* Format option for doc_value fields - `doc_value` fields in the Search API can now specify a `format` field to control the format of the value in the response. (https://github.com/elastic/elasticsearch/pull/29639)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


* Cross Cluster Search will no longer use dedicated master nodes as gateway nodes - Previously the gateway node on a remote cluster used by Cross Cluster search was selected based only on the node's version and node attributes set in the `search.remote.node.attr` setting. This meant that unless carefully configured any node in the cluster could potentially be used as a gateway node for a cross cluster search. This may cause problems when running with dedicated master nodes as it is undesirable for master eligible nodes to be used for any search activity. Starting from 6.4.0 cross cluster search will no longer consider dedicated master eligible nodes as potential gateway nodes providing a better out of the box default for running cross cluster searches. (https://github.com/elastic/elasticsearch/pull/30926)
* Format option for doc_value fields - `doc_value` fields in the Search API can now specify a `format` field to control the format of the value in the response. (https://github.com/elastic/elasticsearch/pull/29639)
* Support second level of field collapse - This extends the field collapsing feature to allow the top item for two fields to be retrieved. For example retrieving top scored tweets by country, and for each country, top scored tweets for each user. This can be an alternative to using multiple levels of terms aggregations along with top hits.(https://github.com/elastic/elasticsearch/pull/31808)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colings86
Copy link
Contributor Author

colings86 commented Jul 18, 2018

@lcawl thanks for the review. I agree it would be good to add links to the relevant documentation pages for each feature. Though I am wondering how this should be done? For example we could:

  • Make the title of the change (e.g. "Option to index phrases on text fields") a link to the documentation
  • Add a sentence to the end of each item saying "For more information please see <>"
  • Try to work the link into the text somewhere, for example in the index phrases item I could add the link to the mention of the "text fields"

Which would you prefer? or alternatively do you have another suggestion?

I also wonder about the PR links. At the moment I have them at the end of each item in brackets mostly so reviewers can see the change the item relates to. I'm not sure if we want to have the PR links in this document or whether the PR links are only relevant in the changelog document (which lists all changes not just highlights)? If we do want the PR linked here how do you think we should do it? Maybe as a #12345 link next to the item title?

@lcawl
Copy link
Contributor

lcawl commented Jul 18, 2018

In my opinion, a combination of the second and third options (i.e. "For more information" and linked text) is good. That's along the lines of what they did in the Kibana Release Highlights (e.g. https://www.elastic.co/guide/en/kibana/current/release-highlights-6.3.0.html) and I think they look good.

My inclination would be to leave the PR links in the Release Notes only, but @debadair and @Sue-Gallagher might want to weigh in too.

Some of the comments require follow up
@colings86
Copy link
Contributor Author

@lcawl I've pushed a commit that removes the PR links and adds most of the doc links. #32095 (comment), #32095 (comment) and #32095 (comment) require some follow up before we can add links which will hopefully come in the next few days.

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colings86
Copy link
Contributor Author

Just waiting for #32184 to be merged and then I'll merge this

@colings86 colings86 merged commit d9f1eb4 into 6.x Jul 24, 2018
@colings86 colings86 deleted the search-highlights-6-4 branch July 24, 2018 15:35
dnhatn added a commit that referenced this pull request Jul 25, 2018
* 6.x:
  Security: revert to old way of merging automata (#32254)
  Fix a test bug in RangeQueryBuilderTests introduced in the field aliases backport.
  Introduce Application Privileges with support for Kibana RBAC (#32309)
  Undo a debugging change that snuck in during the field aliases merge.
  [test] port linux package packaging tests (#31943)
  Painless: Update More Methods to New Naming Scheme (#32305)
  Tribe: Add error with secure settings copied to tribe (#32298)
  Add V_6_3_3 version constant
  Add ERR to ranking evaluation documentation (#32314)
  [DOCS] Added link to 6.3.2 RNs
  [DOCS] Updates 6.3.2 release notes with PRs from ml-cpp repo (#32334)
  [Kerberos] Add Kerberos authentication support (#32263)
  [ML] Extract persistent task methods from MlMetadata (#32319)
  Backport - Add Snapshots Status API to High Level Rest Client (#32295)
  Make release notes ignore the `>test-failure` label. (#31309)
  [DOCS] Adds release highlights for search for 6.4 (#32095)
  Allow Integ Tests to run in a FIPS-140 JVM (#32316)
  Add support for field aliases to 6.x. (#32184)
  Register ERR metric with NamedXContentRegistry (#32320)
  fixes broken build for third-party-tests (#32315) Relates #31918 / Closes infra/issues/6085
  [DOCS] Rollup Caps API incorrectly mentions GET Jobs API (#32280)
  Rest HL client: Add put watch action (#32026) (#32191)
  Add WeightedAvg metric aggregation (#31037)
  Consistent encoder names (#29492)
  Switch monitoring to new style Requests (#32255)
  specify subdirs of lib, bin, modules in package (#32253)
  Rename ranking evaluation `quality_level` to `metric_score` (#32168)
  Add new permission for JDK11 to load JAAS libraries (#32132)
  Switch x-pack:core to new style Requests (#32252)
  Watcher: Store username on watch execution (#31873)
  Silence SSL reload test that fails on JDK 11
  Painless: Clean up add methods in PainlessLookup (#32258)
  CCE when re-throwing "shard not available" exception in TransportShardMultiGetAction (#32185)
  Fail shard if IndexShard#storeStats runs into an IOException (#32241)
  Fix `range` queries on `_type` field for singe type indices (#31756) (#32161)
  AwaitsFix RecoveryIT#testHistoryUUIDIsGenerated
  Add new fields to monitoring template for Beats state (#32085) (#32273)
  [TEST] improve REST high-level client naming conventions check (#32244)
  Check that client methods match API defined in the REST spec (#31825)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search/Search Search-related issues that do not fall into other categories v6.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants