Merge branch 'main' into ps240724-update-get-snapshot-status-api-doc

pxsalehi · Jul 25, 2024 · 7cefe9a · 7cefe9a
2 parents 08873c5 + 05060f8
commit 7cefe9a
Show file tree

Hide file tree

Showing 199 changed files with 4,313 additions and 674 deletions.
diff --git a/README.asciidoc b/README.asciidoc
@@ -1,6 +1,6 @@
 = Elasticsearch
 
-Elasticsearch is a distributed search and analytics engine optimized for speed and relevance on production-scale workloads. Elasticsearch is the foundation of Elastic's open Stack platform. Search in near real-time over massive datasets, perform vector searches, integrate with generative AI applications, and much more.
+Elasticsearch is a distributed search and analytics engine, scalable data store and vector database optimized for speed and relevance on production-scale workloads. Elasticsearch is the foundation of Elastic's open Stack platform. Search in near real-time over massive datasets, perform vector searches, integrate with generative AI applications, and much more.
 
 Use cases enabled by Elasticsearch include:
 

diff --git a/docs/changelog/109583.yaml b/docs/changelog/109583.yaml
@@ -0,0 +1,29 @@
+pr: 109583
+summary: "ESQL: INLINESTATS"
+area: ES|QL
+type: feature
+issues:
+ - 107589
+highlight:
+  title: "ESQL: INLINESTATS"
+  body: |-
+    This adds the `INLINESTATS` command to ESQL which performs a STATS and
+    then enriches the results into the output stream. So, this query:
+
+    [source,esql]
+    ----
+    FROM test
+    | INLINESTATS m=MAX(a * b) BY b
+    | WHERE m == a * b
+    | SORT a DESC, b DESC
+    | LIMIT 3
+    ----
+
+    Produces output like:
+
+    |  a  |  b  |   m   |
+    | --- | --- | ----- |
+    |  99 | 999 | 98901 |
+    |  99 | 998 | 98802 |
+    |  99 | 997 | 98703 |
+  notable: true
diff --git a/docs/changelog/110974.yaml b/docs/changelog/110974.yaml
@@ -0,0 +1,5 @@
+pr: 110974
+summary: Add custom rule parameters to force time shift
+area: Machine Learning
+type: enhancement
+issues: []
diff --git a/docs/changelog/111274.yaml b/docs/changelog/111274.yaml
@@ -0,0 +1,5 @@
+pr: 111274
+summary: Include account name in Azure settings exceptions
+area: Snapshot/Restore
+type: enhancement
+issues: []
diff --git a/docs/reference/esql/esql-commands.asciidoc b/docs/reference/esql/esql-commands.asciidoc
@@ -37,6 +37,9 @@ image::images/esql/processing-command.svg[A processing command changing an input
 * <<esql-enrich>>
 * <<esql-eval>>
 * <<esql-grok>>
+ifeval::["{release-state}"=="unreleased"]
+* experimental:[] <<esql-inlinestats-by>>
+endif::[]
 * <<esql-keep>>
 * <<esql-limit>>
 ifeval::["{release-state}"=="unreleased"]
@@ -59,6 +62,9 @@ include::processing-commands/drop.asciidoc[]
 include::processing-commands/enrich.asciidoc[]
 include::processing-commands/eval.asciidoc[]
 include::processing-commands/grok.asciidoc[]
+ifeval::["{release-state}"=="unreleased"]
+include::processing-commands/inlinestats.asciidoc[]
+endif::[]
 include::processing-commands/keep.asciidoc[]
 include::processing-commands/limit.asciidoc[]
 ifeval::["{release-state}"=="unreleased"]

diff --git a/docs/reference/esql/processing-commands/inlinestats.asciidoc b/docs/reference/esql/processing-commands/inlinestats.asciidoc
@@ -0,0 +1,102 @@
+[discrete]
+[[esql-inlinestats-by]]
+=== `INLINESTATS ... BY`
+
+experimental::["INLINESTATS is highly experimental and only available in SNAPSHOT versions."]
+
+The `INLINESTATS` command calculates an aggregate result and adds new columns
+with the result to the stream of input data.
+
+**Syntax**
+
+[source,esql]
+----
+INLINESTATS [column1 =] expression1[, ..., [columnN =] expressionN]
+[BY grouping_expression1[, ..., grouping_expressionN]]
+----
+
+*Parameters*
+
+`columnX`::
+The name by which the aggregated value is returned. If omitted, the name is
+equal to the corresponding expression (`expressionX`). If multiple columns
+have the same name, all but the rightmost column with this name will be ignored.
+
+`expressionX`::
+An expression that computes an aggregated value. If its name coincides with one
+of the computed columns, that column will be ignored.
+
+`grouping_expressionX`::
+An expression that outputs the values to group by.
+
+NOTE: Individual `null` values are skipped when computing aggregations.
+
+*Description*
+
+The `INLINESTATS` command calculates an aggregate result and merges that result
+back into the stream of input data. Without the optional `BY` clause this will
+produce a single result which is appended to each row. With a `BY` clause this
+will produce one result per grouping and merge the result into the stream based on
+matching group keys.
+
+All of the <<esql-agg-functions,aggregation functions>> are supported.
+
+*Examples*
+
+Find the employees that speak the most languages (it's a tie!):
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/inlinestats.csv-spec[tag=max-languages]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/inlinestats.csv-spec[tag=max-languages-result]
+|===
+
+Find the longest tenured employee who's last name starts with each letter of the alphabet:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/inlinestats.csv-spec[tag=longest-tenured-by-first]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/inlinestats.csv-spec[tag=longest-tenured-by-first-result]
+|===
+
+Find the northern and southern most airports:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/inlinestats.csv-spec[tag=extreme-airports]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/inlinestats.csv-spec[tag=extreme-airports-result]
+|===
+
+NOTE: Our test data doesn't have many "small" airports.
+
+If a `BY` field is multivalued then `INLINESTATS` will put the row in *each*
+bucket like <<esql-stats-by>>:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/inlinestats.csv-spec[tag=mv-group]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/inlinestats.csv-spec[tag=mv-group-result]
+|===
+
+To treat each group key as its own row use <<esql-mv_expand>> before `INLINESTATS`:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/inlinestats.csv-spec[tag=mv-expand]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/inlinestats.csv-spec[tag=mv-expand-result]
+|===
diff --git a/docs/reference/esql/processing-commands/lookup.asciidoc b/docs/reference/esql/processing-commands/lookup.asciidoc
@@ -2,7 +2,7 @@
 [[esql-lookup]]
 === `LOOKUP`
 
-experimental::["LOOKUP is a highly experimental and only available in SNAPSHOT versions."]
+experimental::["LOOKUP is highly experimental and only available in SNAPSHOT versions."]
 
 `LOOKUP` matches values from the input against a `table` provided in the request,
 adding the other fields from the `table` to the output.

diff --git a/docs/reference/how-to/size-your-shards.asciidoc b/docs/reference/how-to/size-your-shards.asciidoc
@@ -152,9 +152,10 @@ same data. However, very large shards can also cause slower searches and will
 take longer to recover after a failure.
 
 There is no hard limit on the physical size of a shard, and each shard can in
-theory contain up to just over two billion documents. However, experience shows
-that shards between 10GB and 50GB typically work well for many use cases, as
-long as the per-shard document count is kept below 200 million.
+theory contain up to <<troubleshooting-max-docs-limit,just over two billion 
+documents>>. However, experience shows that shards between 10GB and 50GB 
+typically work well for many use cases, as long as the per-shard document count 
+is kept below 200 million.
 
 You may be able to use larger shards depending on your network and use case,
 and smaller shards may be appropriate for
@@ -184,6 +185,29 @@ index                                 prirep shard store
 // TESTRESPONSE[s/\.ds-my-data-stream-2099\.05\.06-000001/my-index-000001/]
 // TESTRESPONSE[s/50gb/.*/]
 
+If an index's shard is experiencing degraded performance from surpassing the 
+recommended 50GB size, you may consider fixing the index's shards' sizing. 
+Shards are immutable and therefore their size is fixed in place, 
+so indices must be copied with corrected settings. This requires first ensuring 
+sufficient disk to copy the data. Afterwards, you can copy the index's data 
+with corrected settings via one of the following options:
+
+* running <<indices-split-index,Split Index>> to increase number of primary 
+shards 
+
+* creating a destination index with corrected settings and then running 
+<<docs-reindex,Reindex>> 
+
+Kindly note performing a <<restore-snapshot-api,Restore Snapshot>> and/or 
+<<indices-clone-index,Clone Index>> would be insufficient to resolve shards' 
+sizing. 
+
+Once a source index's data is copied into its destination index, the source 
+index can be <<indices-delete-index,removed>>. You may then consider setting 
+<<indices-add-alias,Create Alias>> against the destination index for the source 
+index's name to point to it for continuity. 
+
+
 [discrete]
 [[shard-count-recommendation]]
 ==== Master-eligible nodes should have at least 1GB of heap per 3000 indices

diff --git a/docs/reference/ml/anomaly-detection/functions/ml-geo-functions.asciidoc b/docs/reference/ml/anomaly-detection/functions/ml-geo-functions.asciidoc
@@ -52,6 +52,12 @@ detects anomalies where the geographic location of a credit card transaction is
 unusual for a particular customer’s credit card. An anomaly might indicate 
 fraud.
 
+A "typical" value indicates a centroid of a cluster of previously observed 
+locations that is closest to the "actual" location at that time. For example, 
+there may be one centroid near the person's home that is associated with the 
+cluster of local grocery stores and restaurants, and another centroid near the 
+person's work associated with the cluster of lunch and coffee places.   
+
 IMPORTANT: The `field_name` that you supply must be a single string that 
 contains two comma-separated numbers of the form `latitude,longitude`, a 
 `geo_point` field, a `geo_shape` field that contains point values, or a 

diff --git a/docs/reference/modules/gateway.asciidoc b/docs/reference/modules/gateway.asciidoc
@@ -4,11 +4,11 @@
 The local gateway stores the cluster state and shard data across full
 cluster restarts.
 
-The following _static_ settings, which must be set on every master node,
+The following _static_ settings, which must be set on every <<master-node,master-eligible node>>,
 control how long a freshly elected master should wait before it tries to
-recover the cluster state and the cluster's data.
+recover the <<cluster-state,cluster state>> and the cluster's data.
 
-NOTE: These settings only take effect on a full cluster restart.
+NOTE: These settings only take effect during a <<restart-cluster-full,full cluster restart>>.
 
 `gateway.expected_data_nodes`::
 (<<static-cluster-setting,Static>>)

diff --git a/docs/reference/setup/restart-cluster.asciidoc b/docs/reference/setup/restart-cluster.asciidoc
@@ -11,7 +11,7 @@ time, so the service remains uninterrupted.
 [WARNING]
 ====
 Nodes exceeding the low watermark threshold will be slow to restart. Reduce the disk
-usage below the <<cluster-routing-watermark-low,low watermark>> before to restarting nodes.
+usage below the <<cluster-routing-watermark-low,low watermark>> before restarting nodes.
 ====
 
 [discrete]

diff --git a/docs/reference/upgrade/disable-shard-alloc.asciidoc b/docs/reference/upgrade/disable-shard-alloc.asciidoc
@@ -17,3 +17,7 @@ PUT _cluster/settings
 }
 --------------------------------------------------
 // TEST[skip:indexes don't assign]
+
+You can also consider <<modules-gateway,gateway settings>> when restarting 
+large clusters to reduce initial strain while nodes are processing 
+<<modules-discovery,through discovery>>.