Skip to content

Commit

Permalink
Merge branch 'main' into ps240724-update-get-snapshot-status-api-doc
Browse files Browse the repository at this point in the history
  • Loading branch information
elasticmachine authored Jul 25, 2024
2 parents 08873c5 + 05060f8 commit 7cefe9a
Show file tree
Hide file tree
Showing 199 changed files with 4,313 additions and 674 deletions.
2 changes: 1 addition & 1 deletion README.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Elasticsearch

Elasticsearch is a distributed search and analytics engine optimized for speed and relevance on production-scale workloads. Elasticsearch is the foundation of Elastic's open Stack platform. Search in near real-time over massive datasets, perform vector searches, integrate with generative AI applications, and much more.
Elasticsearch is a distributed search and analytics engine, scalable data store and vector database optimized for speed and relevance on production-scale workloads. Elasticsearch is the foundation of Elastic's open Stack platform. Search in near real-time over massive datasets, perform vector searches, integrate with generative AI applications, and much more.

Use cases enabled by Elasticsearch include:

Expand Down
29 changes: 29 additions & 0 deletions docs/changelog/109583.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
pr: 109583
summary: "ESQL: INLINESTATS"
area: ES|QL
type: feature
issues:
- 107589
highlight:
title: "ESQL: INLINESTATS"
body: |-
This adds the `INLINESTATS` command to ESQL which performs a STATS and
then enriches the results into the output stream. So, this query:
[source,esql]
----
FROM test
| INLINESTATS m=MAX(a * b) BY b
| WHERE m == a * b
| SORT a DESC, b DESC
| LIMIT 3
----
Produces output like:
| a | b | m |
| --- | --- | ----- |
| 99 | 999 | 98901 |
| 99 | 998 | 98802 |
| 99 | 997 | 98703 |
notable: true
5 changes: 5 additions & 0 deletions docs/changelog/110974.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 110974
summary: Add custom rule parameters to force time shift
area: Machine Learning
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/111274.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 111274
summary: Include account name in Azure settings exceptions
area: Snapshot/Restore
type: enhancement
issues: []
6 changes: 6 additions & 0 deletions docs/reference/esql/esql-commands.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ image::images/esql/processing-command.svg[A processing command changing an input
* <<esql-enrich>>
* <<esql-eval>>
* <<esql-grok>>
ifeval::["{release-state}"=="unreleased"]
* experimental:[] <<esql-inlinestats-by>>
endif::[]
* <<esql-keep>>
* <<esql-limit>>
ifeval::["{release-state}"=="unreleased"]
Expand All @@ -59,6 +62,9 @@ include::processing-commands/drop.asciidoc[]
include::processing-commands/enrich.asciidoc[]
include::processing-commands/eval.asciidoc[]
include::processing-commands/grok.asciidoc[]
ifeval::["{release-state}"=="unreleased"]
include::processing-commands/inlinestats.asciidoc[]
endif::[]
include::processing-commands/keep.asciidoc[]
include::processing-commands/limit.asciidoc[]
ifeval::["{release-state}"=="unreleased"]
Expand Down
102 changes: 102 additions & 0 deletions docs/reference/esql/processing-commands/inlinestats.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
[discrete]
[[esql-inlinestats-by]]
=== `INLINESTATS ... BY`

experimental::["INLINESTATS is highly experimental and only available in SNAPSHOT versions."]

The `INLINESTATS` command calculates an aggregate result and adds new columns
with the result to the stream of input data.

**Syntax**

[source,esql]
----
INLINESTATS [column1 =] expression1[, ..., [columnN =] expressionN]
[BY grouping_expression1[, ..., grouping_expressionN]]
----

*Parameters*

`columnX`::
The name by which the aggregated value is returned. If omitted, the name is
equal to the corresponding expression (`expressionX`). If multiple columns
have the same name, all but the rightmost column with this name will be ignored.

`expressionX`::
An expression that computes an aggregated value. If its name coincides with one
of the computed columns, that column will be ignored.

`grouping_expressionX`::
An expression that outputs the values to group by.

NOTE: Individual `null` values are skipped when computing aggregations.

*Description*

The `INLINESTATS` command calculates an aggregate result and merges that result
back into the stream of input data. Without the optional `BY` clause this will
produce a single result which is appended to each row. With a `BY` clause this
will produce one result per grouping and merge the result into the stream based on
matching group keys.

All of the <<esql-agg-functions,aggregation functions>> are supported.

*Examples*

Find the employees that speak the most languages (it's a tie!):

[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=max-languages]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=max-languages-result]
|===

Find the longest tenured employee who's last name starts with each letter of the alphabet:

[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=longest-tenured-by-first]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=longest-tenured-by-first-result]
|===

Find the northern and southern most airports:

[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=extreme-airports]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=extreme-airports-result]
|===

NOTE: Our test data doesn't have many "small" airports.

If a `BY` field is multivalued then `INLINESTATS` will put the row in *each*
bucket like <<esql-stats-by>>:

[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=mv-group]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=mv-group-result]
|===

To treat each group key as its own row use <<esql-mv_expand>> before `INLINESTATS`:

[source.merge.styled,esql]
----
include::{esql-specs}/inlinestats.csv-spec[tag=mv-expand]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/inlinestats.csv-spec[tag=mv-expand-result]
|===
2 changes: 1 addition & 1 deletion docs/reference/esql/processing-commands/lookup.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[[esql-lookup]]
=== `LOOKUP`

experimental::["LOOKUP is a highly experimental and only available in SNAPSHOT versions."]
experimental::["LOOKUP is highly experimental and only available in SNAPSHOT versions."]

`LOOKUP` matches values from the input against a `table` provided in the request,
adding the other fields from the `table` to the output.
Expand Down
30 changes: 27 additions & 3 deletions docs/reference/how-to/size-your-shards.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,10 @@ same data. However, very large shards can also cause slower searches and will
take longer to recover after a failure.

There is no hard limit on the physical size of a shard, and each shard can in
theory contain up to just over two billion documents. However, experience shows
that shards between 10GB and 50GB typically work well for many use cases, as
long as the per-shard document count is kept below 200 million.
theory contain up to <<troubleshooting-max-docs-limit,just over two billion
documents>>. However, experience shows that shards between 10GB and 50GB
typically work well for many use cases, as long as the per-shard document count
is kept below 200 million.

You may be able to use larger shards depending on your network and use case,
and smaller shards may be appropriate for
Expand Down Expand Up @@ -184,6 +185,29 @@ index prirep shard store
// TESTRESPONSE[s/\.ds-my-data-stream-2099\.05\.06-000001/my-index-000001/]
// TESTRESPONSE[s/50gb/.*/]

If an index's shard is experiencing degraded performance from surpassing the
recommended 50GB size, you may consider fixing the index's shards' sizing.
Shards are immutable and therefore their size is fixed in place,
so indices must be copied with corrected settings. This requires first ensuring
sufficient disk to copy the data. Afterwards, you can copy the index's data
with corrected settings via one of the following options:

* running <<indices-split-index,Split Index>> to increase number of primary
shards

* creating a destination index with corrected settings and then running
<<docs-reindex,Reindex>>

Kindly note performing a <<restore-snapshot-api,Restore Snapshot>> and/or
<<indices-clone-index,Clone Index>> would be insufficient to resolve shards'
sizing.

Once a source index's data is copied into its destination index, the source
index can be <<indices-delete-index,removed>>. You may then consider setting
<<indices-add-alias,Create Alias>> against the destination index for the source
index's name to point to it for continuity.


[discrete]
[[shard-count-recommendation]]
==== Master-eligible nodes should have at least 1GB of heap per 3000 indices
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ detects anomalies where the geographic location of a credit card transaction is
unusual for a particular customer’s credit card. An anomaly might indicate
fraud.

A "typical" value indicates a centroid of a cluster of previously observed
locations that is closest to the "actual" location at that time. For example,
there may be one centroid near the person's home that is associated with the
cluster of local grocery stores and restaurants, and another centroid near the
person's work associated with the cluster of lunch and coffee places.

IMPORTANT: The `field_name` that you supply must be a single string that
contains two comma-separated numbers of the form `latitude,longitude`, a
`geo_point` field, a `geo_shape` field that contains point values, or a
Expand Down
6 changes: 3 additions & 3 deletions docs/reference/modules/gateway.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
The local gateway stores the cluster state and shard data across full
cluster restarts.

The following _static_ settings, which must be set on every master node,
The following _static_ settings, which must be set on every <<master-node,master-eligible node>>,
control how long a freshly elected master should wait before it tries to
recover the cluster state and the cluster's data.
recover the <<cluster-state,cluster state>> and the cluster's data.

NOTE: These settings only take effect on a full cluster restart.
NOTE: These settings only take effect during a <<restart-cluster-full,full cluster restart>>.

`gateway.expected_data_nodes`::
(<<static-cluster-setting,Static>>)
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/setup/restart-cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ time, so the service remains uninterrupted.
[WARNING]
====
Nodes exceeding the low watermark threshold will be slow to restart. Reduce the disk
usage below the <<cluster-routing-watermark-low,low watermark>> before to restarting nodes.
usage below the <<cluster-routing-watermark-low,low watermark>> before restarting nodes.
====

[discrete]
Expand Down
4 changes: 4 additions & 0 deletions docs/reference/upgrade/disable-shard-alloc.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ PUT _cluster/settings
}
--------------------------------------------------
// TEST[skip:indexes don't assign]

You can also consider <<modules-gateway,gateway settings>> when restarting
large clusters to reduce initial strain while nodes are processing
<<modules-discovery,through discovery>>.
Loading

0 comments on commit 7cefe9a

Please sign in to comment.