Merge branch 'main' into synthetic-source/single-element-arrays

kkrik-es · Oct 8, 2024 · f95c5dc · f95c5dc
2 parents e626324 + 10f6f25
commit f95c5dc
Show file tree

Hide file tree

Showing 189 changed files with 6,188 additions and 2,656 deletions.
diff --git a/docs/changelog/111336.yaml b/docs/changelog/111336.yaml
@@ -0,0 +1,5 @@
+pr: 111336
+summary: Use the same chunking configurations for models in the Elasticsearch service
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/changelog/112933.yaml b/docs/changelog/112933.yaml
@@ -0,0 +1,5 @@
+pr: 112933
+summary: "Allow incubating Panama Vector in simdvec, and add vectorized `ipByteBin`"
+area: Search
+type: enhancement
+issues: []
diff --git a/docs/changelog/113812.yaml b/docs/changelog/113812.yaml
@@ -0,0 +1,5 @@
+pr: 113812
+summary: Add Streaming Inference spec
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/changelog/114002.yaml b/docs/changelog/114002.yaml
@@ -0,0 +1,5 @@
+pr: 114002
+summary: Add a `mustache.max_output_size_bytes` setting to limit the length of results from mustache scripts
+area: Infra/Scripting
+type: enhancement
+issues: []
diff --git a/docs/changelog/114080.yaml b/docs/changelog/114080.yaml
@@ -0,0 +1,5 @@
+pr: 114080
+summary: Stream Cohere Completion
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/changelog/114177.yaml b/docs/changelog/114177.yaml
@@ -0,0 +1,5 @@
+pr: 114177
+summary: "Make `randomInstantBetween` always return value in range [minInstant, `maxInstant]`"
+area: Infra/Metrics
+type: bug
+issues: []
diff --git a/docs/changelog/114231.yaml b/docs/changelog/114231.yaml
@@ -0,0 +1,17 @@
+pr: 114231
+summary: Remove cluster state from `/_cluster/reroute` response
+area: Allocation
+type: breaking
+issues:
+ - 88978
+breaking:
+  title: Remove cluster state from `/_cluster/reroute` response
+  area: REST API
+  details: >-
+    The `POST /_cluster/reroute` API no longer returns the cluster state in its
+    response. The `?metric` query parameter to this API now has no effect and
+    its use will be forbidden in a future version.
+  impact: >-
+    Cease usage of the `?metric` query parameter when calling the
+    `POST /_cluster/reroute` API.
+  notable: false
diff --git a/docs/reference/cluster/reroute.asciidoc b/docs/reference/cluster/reroute.asciidoc
@@ -10,7 +10,7 @@ Changes the allocation of shards in a cluster.
 [[cluster-reroute-api-request]]
 ==== {api-request-title}
 
-`POST /_cluster/reroute?metric=none`
+`POST /_cluster/reroute`
 
 [[cluster-reroute-api-prereqs]]
 ==== {api-prereq-title}
@@ -193,7 +193,7 @@ This is a short example of a simple reroute API call:
 
 [source,console]
 --------------------------------------------------
-POST /_cluster/reroute?metric=none
+POST /_cluster/reroute
 {
   "commands": [
     {

diff --git a/docs/reference/commands/shard-tool.asciidoc b/docs/reference/commands/shard-tool.asciidoc
@@ -95,7 +95,7 @@ Changing allocation id V8QXk-QXSZinZMT-NvEq4w to tjm9Ve6uTBewVFAlfUMWjA
 
 You should run the following command to allocate this shard:
 
-POST /_cluster/reroute?metric=none
+POST /_cluster/reroute
 {
   "commands" : [
     {

diff --git a/docs/reference/connector/apis/create-connector-api.asciidoc b/docs/reference/connector/apis/create-connector-api.asciidoc
@@ -116,7 +116,7 @@ PUT _connector/my-connector
   "name": "My Connector",
   "description": "My Connector to sync data to Elastic index from Google Drive",
   "service_type": "google_drive",
-  "language": "english"
+  "language": "en"
 }
 ----
 

diff --git a/docs/reference/connector/docs/connectors-zoom.asciidoc b/docs/reference/connector/docs/connectors-zoom.asciidoc
@@ -63,18 +63,22 @@ To connect to Zoom you need to https://developers.zoom.us/docs/internal-apps/s2s
 6. Click on the "Create" button to create the app registration.
 7. After the registration is complete, you will be redirected to the app's overview page. Take note of the "App Credentials" value, as you'll need it later.
 8. Navigate to the "Scopes" section and click on the "Add Scopes" button.
-9. The following scopes need to be added to the app.
+9. The following granular scopes need to be added to the app.
 +
 [source,bash]
 ----
-user:read:admin
-meeting:read:admin
-chat_channel:read:admin
-recording:read:admin
-chat_message:read:admin
-report:read:admin
+user:read:list_users:admin
+meeting:read:list_meetings:admin
+meeting:read:list_past_participants:admin
+cloud_recording:read:list_user_recordings:admin
+team_chat:read:list_user_channels:admin
+team_chat:read:list_user_messages:admin
 ----
-
+[NOTE]
+====
+The connector requires a minimum scope of `user:read:list_users:admin` to ingest data into Elasticsearch.
+====
++
 10. Click on the "Done" button to add the selected scopes to your app.
 11. Navigate to the "Activation" section and input the necessary information to activate the app.
 
@@ -220,18 +224,22 @@ To connect to Zoom you need to https://developers.zoom.us/docs/internal-apps/s2s
 6. Click on the "Create" button to create the app registration.
 7. After the registration is complete, you will be redirected to the app's overview page. Take note of the "App Credentials" value, as you'll need it later.
 8. Navigate to the "Scopes" section and click on the "Add Scopes" button.
-9. The following scopes need to be added to the app.
+9. The following granular scopes need to be added to the app.
 +
 [source,bash]
 ----
-user:read:admin
-meeting:read:admin
-chat_channel:read:admin
-recording:read:admin
-chat_message:read:admin
-report:read:admin
+user:read:list_users:admin
+meeting:read:list_meetings:admin
+meeting:read:list_past_participants:admin
+cloud_recording:read:list_user_recordings:admin
+team_chat:read:list_user_channels:admin
+team_chat:read:list_user_messages:admin
 ----
-
+[NOTE]
+====
+The connector requires a minimum scope of `user:read:list_users:admin` to ingest data into Elasticsearch.
+====
++
 10. Click on the "Done" button to add the selected scopes to your app.
 11. Navigate to the "Activation" section and input the necessary information to activate the app.
 

diff --git a/docs/reference/intro.asciidoc b/docs/reference/intro.asciidoc
@@ -204,7 +204,7 @@ For general content, you have the following options for adding data to {es} indi
 If you're building a website or app, then you can call Elasticsearch APIs using an https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} client] in the programming language of your choice. If you use the Python client, then check out the `elasticsearch-labs` repo for various https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/search/python-examples[example notebooks]. 
 * {kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[File upload]: Use the {kib} file uploader to index single files for one-off testing and exploration. The GUI guides you through setting up your index and field mappings.
 * https://github.com/elastic/crawler[Web crawler]: Extract and index web page content into {es} documents.
-* {enterprise-search-ref}/connectors.html[Connectors]: Sync data from various third-party data sources to create searchable, read-only replicas in {es}.
+* <<es-connectors,Connectors>>: Sync data from various third-party data sources to create searchable, read-only replicas in {es}.
 
 [discrete]
 [[es-ingestion-overview-timestamped]]
@@ -492,4 +492,4 @@ and restrictions. You can review the following guides to learn how to tune your
 * <<use-elasticsearch-for-time-series-data,Tune for time series data>>
 
 Many {es} options come with different performance considerations and trade-offs. The best way to determine the
-optimal configuration for your use case is through https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[testing with your own data and queries].
+optimal configuration for your use case is through https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[testing with your own data and queries].
diff --git a/docs/reference/mapping/runtime.asciidoc b/docs/reference/mapping/runtime.asciidoc
@@ -821,8 +821,6 @@ address.
 [[lookup-runtime-fields]]
 ==== Retrieve fields from related indices
 
-experimental[]
-
 The <<search-fields,`fields`>> parameter on the `_search` API can also be used to retrieve fields from
 the related indices via runtime fields with a type of `lookup`.
 

diff --git a/docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc b/docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc
@@ -225,6 +225,17 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, string)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
 =======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`truncate`::::
+(Optional, string)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
+=======
 `roberta`::::
 (Optional, object)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
@@ -301,6 +312,17 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, string)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
 =======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`truncate`::::
+(Optional, string)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
+=======
 `roberta`::::
 (Optional, object)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
@@ -397,6 +419,21 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, string)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
 =======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`span`::::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
+
+`truncate`::::
+(Optional, string)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
+=======
 `roberta`::::
 (Optional, object)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
@@ -517,6 +554,21 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, string)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
 =======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`span`::::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
+
+`truncate`::::
+(Optional, string)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
+=======
 `roberta`::::
 (Optional, object)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
@@ -608,6 +660,17 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, string)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
 =======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`truncate`::::
+(Optional, string)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
+=======
 `roberta`::::
 (Optional, object)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
@@ -687,6 +750,21 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, integer)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
 
+`with_special_tokens`::::
+(Optional, boolean)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
+=======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`span`::::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
+
 `with_special_tokens`::::
 (Optional, boolean)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-with-special-tokens]
@@ -790,6 +868,17 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizatio
 (Optional, string)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
 =======
+`deberta_v2`::::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-deberta-v2]
++
+.Properties of deberta_v2
+[%collapsible%open]
+=======
+`truncate`::::
+(Optional, string)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate-deberta-v2]
+=======
 `roberta`::::
 (Optional, object)
 include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]

diff --git a/docs/reference/reranking/index.asciidoc b/docs/reference/reranking/index.asciidoc
@@ -1,12 +1,12 @@
 [[re-ranking-overview]]
 = Re-ranking
 
-Many search systems are built on two-stage retrieval pipelines.
+Many search systems are built on multi-stage retrieval pipelines.
 
-The first stage uses cheap, fast algorithms to find a broad set of possible matches.
+Earlier stages use cheap, fast algorithms to find a broad set of possible matches.
 
-The second stage uses a more powerful model, often machine learning-based, to reorder the documents.
-This second step is called re-ranking.
+Later stages use more powerful models, often machine learning-based, to reorder the documents.
+This step is called re-ranking.
 Because the resource-intensive model is only applied to the smaller set of pre-filtered results, this approach returns more relevant results while still optimizing for search performance and computational costs.
 
 {es} supports various ranking and re-ranking techniques to optimize search relevance and performance.
@@ -18,7 +18,7 @@ Because the resource-intensive model is only applied to the smaller set of pre-f
 
 [float]
 [[re-ranking-first-stage-pipeline]]
-=== First stage: initial retrieval
+=== Initial retrieval
 
 [float]
 [[re-ranking-ranking-overview-bm25]]
@@ -45,7 +45,7 @@ Hybrid search techniques combine results from full-text and vector search pipeli
 
 [float]
 [[re-ranking-overview-second-stage]]
-=== Second stage: Re-ranking
+=== Re-ranking
 
 When using the following advanced re-ranking pipelines, first-stage retrieval mechanisms effectively generate a set of candidates.
 These candidates are funneled into the re-ranker to perform more computationally expensive re-ranking tasks.
@@ -67,4 +67,4 @@ Learning To Rank involves training a machine learning model to build a ranking f
 LTR is best suited for when you have ample training data and need highly customized relevance tuning.
 
 include::semantic-reranking.asciidoc[]
-include::learning-to-rank.asciidoc[]
+include::learning-to-rank.asciidoc[]
diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc
@@ -1298,10 +1298,11 @@ tag::wait_for_active_shards[]
 `wait_for_active_shards`::
 +
 --
-(Optional, string) The number of shard copies that must be active before
-proceeding with the operation. Set to `all` or any positive integer up
-to the total number of shards in the index (`number_of_replicas+1`).
-Default: 1, the primary shard.
+(Optional, string) The number of copies of each shard that must be active
+before proceeding with the operation. Set to `all` or any non-negative integer
+up to the total number of copies of each shard in the index
+(`number_of_replicas+1`). Defaults to `1`, meaning to wait just for each
+primary shard to be active.
 
 See <<index-wait-for-active-shards>>.
 --