Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GatewayIndexStateIT.testRecoverBrokenIndexMetadata fails on master #40867

Closed
colings86 opened this issue Apr 4, 2019 · 7 comments
Closed

GatewayIndexStateIT.testRecoverBrokenIndexMetadata fails on master #40867

colings86 opened this issue Apr 4, 2019 · 7 comments
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >test-failure Triaged test failures from CI

Comments

@colings86
Copy link
Contributor

Build URL: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/2952/console

Reproduce Command (reproduces locally with this seed):

./gradlew :server:integTest -Dtests.seed=EECC1AE0C30C4E5F -Dtests.class=org.elasticsearch.gateway.GatewayIndexStateIT -Dtests.method="testRecoverBrokenIndexMetadata" -Dtests.security.manager=true -Dtests.locale=de -Dtests.timezone=Europe/Athens -Dcompiler.java=12 -Druntime.java=8

Stack trace:

ERROR   31.6s J5 | GatewayIndexStateIT.testRecoverBrokenIndexMetadata <<< FAILURES!
   > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=6072, name=elasticsearch[node_t0][generic][T#4], state=RUNNABLE, group=TGRP-GatewayIndexStateIT]
   > 	at __randomizedtesting.SeedInfo.seed([EECC1AE0C30C4E5F:1675F3837244CBA3]:0)
   > Caused by: java.lang.AssertionError: max seq. no. [-1] does not match [0]
   > 	at __randomizedtesting.SeedInfo.seed([EECC1AE0C30C4E5F]:0)
   > 	at org.elasticsearch.index.engine.ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:142)
   > 	at org.elasticsearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:116)
   > 	at org.elasticsearch.index.engine.NoOpEngine.<init>(NoOpEngine.java:46)
   > 	at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1447)
   > 	at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1397)
   > 	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:433)
   > 	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95)
   > 	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:308)
   > 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93)
   > 	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1686)
   > 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2339)
   > 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:677)
   > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   > 	at java.lang.Thread.run(Thread.java:748)

Will muted this test shortly

@colings86 colings86 added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Apr 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@colings86
Copy link
Contributor Author

Muted on master in cd8dd04

@dnhatn dnhatn added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Apr 4, 2019
@tlrx tlrx mentioned this issue Apr 4, 2019
50 tasks
@dnhatn
Copy link
Member

dnhatn commented Apr 4, 2019

This test failure relates to #40423 where we don't build analyzers when opening the IndexService for a closed index. However, the actual cause is shards of the closed index are not flushed before closing because the verification step does not go through when primaries are unassigned.

What if the primaries are not all allocated? The verification step cannot go through or shards might not be properly flushed. Should we close the close index API? Should we allocate an empty routing table that cannot be allocated at all?

This is already tracked in the meta-issue of replicate closed indices (#33888). I linked this test failure to the bullet and leave this muted.

@bizybot
Copy link
Contributor

bizybot commented Apr 8, 2019

One of my PR builds failed but for a different test: testRecoverMissingAnalyzer

Reproducible locally with command:

./gradlew :server:integTest \
-Dtests.seed=F3F656161D025A0 \
-Dtests.class=org.elasticsearch.gateway.GatewayIndexStateIT \
-Dtests.method="testRecoverMissingAnalyzer" \
-Dtests.security.manager=true \
-Dtests.locale=id-ID \
-Dtests.timezone=Pacific/Galapagos \
-Dcompiler.java=12 \
-Druntime.java=11
   > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=83, name=elasticsearch[node_t1][generic][T#2], state=RUNNABLE, group=TGRP-GatewayIndexStateIT]
   >    at __randomizedtesting.SeedInfo.seed([F3F656161D025A0:E796CB546B33FF52]:0)
   > Caused by: java.lang.AssertionError: max seq. no. [-1] does not match [0]
   >    at __randomizedtesting.SeedInfo.seed([F3F656161D025A0]:0)
   >    at org.elasticsearch.index.engine.ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:143)
   >    at org.elasticsearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:117)
   >    at org.elasticsearch.index.engine.NoOpEngine.<init>(NoOpEngine.java:46)
   >    at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1447)
   >    at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1397)
   >    at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:433)
   >    at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95)
   >    at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:308)
   >    at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93)
   >    at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1686)
   >    at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2339)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:677)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

@dnhatn
Copy link
Member

dnhatn commented Apr 9, 2019

I muted testRecoverMissingAnalyzer in aaec11f.

dnhatn added a commit that referenced this issue Apr 9, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
@ywelsch
Copy link
Contributor

ywelsch commented Jun 7, 2019

@dnhatn is this test fixed now by flushing at the end of peer recovery? If that's not sufficient, should we just explicitly flush in the test before we restart the node?

@dnhatn dnhatn closed this as completed in 16e6f5d Jun 9, 2019
dnhatn added a commit that referenced this issue Jun 9, 2019
These tests should be okay as we flush at the end of peer recovery.

Closes #40867
@dnhatn
Copy link
Member

dnhatn commented Jun 9, 2019

@ywelsch I fixed the assertion in ReadOnlyEngine and verified these tests. We are good here.

dnhatn added a commit that referenced this issue Jun 9, 2019
These tests should be okay as we flush at the end of peer recovery.

Closes #40867
Bukhtawar added a commit to Bukhtawar/elasticsearch that referenced this issue Jul 5, 2019
…lls below the low watermark. Relates to elastic#39334

Tracking indicesToMarkIneligibleForAutoRelease instead of a Map and addressing other minor comments

Unmute FullClusterRestartIT#testClosedIndices

Fixed in #39566
Closes #39576

Add debug log for retention leases (#42557)

We need more information to understand why CcrRetentionLeaseIT is
failing. This commit adds some debug log to retention leases and enables
them in CcrRetentionLeaseIT.

Improve how internal representation of pipelines are updated (#42257)

If a single pipeline is updated then the internal representation of
all pipelines was updated. With this change, only the internal representation
of the pipelines that have been modified will be updated.

Prior to this change the IngestMetadata of the previous and current cluster
was used to determine whether the internal representation of pipelines
should be updated. If applying the previous cluster state change failed then
subsequent cluster state changes that have no changes to IngestMetadata
will not attempt to update the internal representation of the pipelines.

This commit, changes how the IngestService updates the internal representation
by keeping track of the underlying configuration and use that to detect
against the new IngestMetadata whether a pipeline configuration has been
changed and if so, then the internal pipeline representation will be updated.

Fix RareClusterStateIT (#42430)

* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
   * Fixed by adding a wait for no in-progress publications
   * Also added debug logging that would've identified this problem
* Closes #36813

Update script-fields.asciidoc (#42490)

Fixed typo in docker.asciidoc (#42455)

Remove unused mapStringsOrdered method (#42513)

Remove unused mapStringsOrdered method

Dry up BlobStoreRepository#basePath Implementations (#42578)

* This method is just a getter in every implementation => moved the field and concrete getter to the base class to simplify implementations

Add Infrastructure to Run 3rd Party Repository Tests (#42586)

* Add Infrastructure to Run 3rd Party Repository Tests

* Add infrastructure to run third party repository tests using our standard JUnit infrastructure
* This is a prerequisite of #42189

Add test ensure we can execute update requests in mixed cluster

Relates #42596

Allocate to data-only nodes in ReopenWhileClosingIT (#42560)

If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.

Closes #39757

Reset mock transport service in CcrRetentionLeaseIT (#42600)

testRetentionLeaseIsAddedIfItDisappearsWhileFollowing does not reset the
mock transport service after test. Surviving transport interceptors from
that test can sneaky remove retention leases and make other tests fail.

Closes #39331
Closes #39509
Closes #41428
Closes #41679
Closes #41737
Closes #41756

Fixed ignoring name parameter for percolator queries (#42598)

Closes #40405

[Ml Data Frame] Return bad_request on preview when config is invalid (#42447)

Mute AsyncTwoPhaseIndexerTests#testStateMachine() (#42609)

Relates #42084

[ML DataFrame] Use date histogram fixed_interval syntax
and remove test skip

Mute NodeTests (#42614)

Relates #42577

Fix Incorrect Time Math in MockTransport (#42595)

* Fix Incorrect Time Math in MockTransport

* The timeunit here must be nanos for the current time (we even convert it accordingly in the logging)
* Also, changed the log message when dumping stack traces a little to make it easier to grep for (otherwise it's the same as the message on unregister)

Remove PRE_60_NODE_CHECKPOINT (#42527)

This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing
with 5.x nodes' lack of sequence number support.

Backported as #42531

Reset state recovery after successful recovery (#42576)

The problem this commit addresses is that state recovery is not reset on a node that then becomes
master with a cluster state that has a state not recovered flag in it. The situation that was observed
in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we
have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains),
when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state
(with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and
as it was previously leader in term1 and successfully completed state recovery, does never retry
state recovery in term 3.

Closes #39172

[DOCS] Escape cross-ref link comma for Asciidoctor (#42402)

[DOCS] Fix API Quick Reference rollup attribute for Asciidoctor (#42403)

[ML] adding delayed_data_check_config to datafeed update docs (#42095)

* [ML] adding delayed_data_check_config to datafeed update docs

* [DOCS] Edits delayed data configuration details

Avoid loading retention leases while writing them (#42620)

Resolves #41430.

Validate routing commands using updated routing state (#42066)

When multiple commands are called in sequence, fetch shards
from mutable, up-to-date routing nodes to ensure each command's
changes are visible to subsequent commands.

This addresses an issue uncovered during work on #41050.

remove 6.4.x version constants (#42127)

relates refactoring initiative #41164.

[ML Data Frame] Set DF task state when stopping  (#42516)

Set the state to stopped prior to persisting

[DOCS] Reorg monitoring configuration for re-use (#42547)

Remove suppresions for "unchecked" for hamcrest varargs methods (#41528)

In hamcrest 2.1 warnings for unchecked varargs were fixed by hamcrest using @SafeVarargs for those matchers where this warning occurred.
This PR is aimed to remove these annotations when Matchers.contains ,Matchers.containsInAnyOrder or Matchers.hasItems was used

Remove support for chained multi-fields. (#42333)

Follow-up to #41926, where we deprecated support for multi-fields within
multi-fields.

Addresses #41267.

Lazily compute Java 8 home in reindex configuration (#42630)

In the reindex from old tests we require Java 8. Today when configuring
the reindex from old tests, we eagerly evalulate Java 8 home, which
means that we require JAVA8_HOME to be set even if the reindex from old
test tasks are not in the task graph. This is an onerous requirement if,
for example, all that you want to do is build a distribution. This
commit addresses this by making evaluation of Java 8 home lazy, so that
it is only done and required if the reindex from old test tasks would be
executed.

Remove "nodes/0" folder prefix from data path (#42489)

With the removal of node.max_local_storage_nodes, there is no need anymore to keep the data in
subfolders indexed by a node ordinal. This commit makes it so that ES 8.0 will store data directly in
$DATA_DIR instead of $DATA_DIR/nodes/$nodeOrdinal.

Upon startup, Elasticsearch will check to see if there is data in the old location, and automatically
move it to the new location. This automatic migration only works if $nodeOrdinal is 0, i.e., multiple
node instances have not previously run on the same data path, which required for
node.max_local_storage_nodes to explicitly be configured.

[DOCS] Set explicit anchors for Asciidoctor (#42521)

unmute 'Test url escaping with url mustache function' and bump logging (#42400)

check position before and after latch (#42623)

check position before and after latch

[DOCS] Fix X-Pack tag for Asciidoctor (#42443)

fix javadoc of SearchRequestBuilder#setTrackTotalHits (#42219)

[ML Data Frame] Mute stop start test

Relates to https://github.com/elastic/elasticsearch/issues/42650

Add 7.1.2 version constant. (#42643)

Relates to #42635

Adjust use of Deprecated Netty API (#42613)

* With the recent upgrade to Netty 4.1.36 this method became deprecated and I made the advised change to fix the deprecation

Fix a callout in the field alias docs.

Add explicit build flag for experimenting with test execution cacheability (#42649)

* Add build flag for ignoring random test seed as task input

* Fix checkstyle violations

Use correct global checkpoint sync interval (#42642)

A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.

Closes #42637

Removes types from SearchRequest and QueryShardContext (#42112)

[ML-DataFrame] rewrite start and stop to answer with acknowledged (#42589)

rewrite start and stop to answer with acknowledged

fixes #42450

Added param ignore_throttled=false when indicesOptions.ignoreThrottled() is false (#42393)

and fixed test RequestConvertersTests and added ignore_throttled on all request

[DOCS] Set explicit anchors for TLS/SSL settings (#42524)

Testclusters: convert ccr tests (#42313)

un-mute ActivateWatchTests, bump up logging, and remove explicit sleeps (#42396)

un-mute Watcher rolling upgrade tests and bump up logging (#42377)

Fixes watcher test to remove typed api call

Muting WatcherRestIT webhook url escaping test

See #41172

[DOCS] Adds more monitoring tagged regions

Add warning scores are floats (#42667)

Allow aggregations using expressions to use _score (#42652)

_score was removed from use in aggregations using expressions
unintentionally when script contexts were added. This allows _score to once
again be used.

Refactor HLRC RequestConverters parameters to be more explicit (#42128)

The existing `RequestConverters.Params` is confusing, because it wraps
an underlying request object and mutations of the `Params` object
actually mutate the `Request` that was used in the construction of the
`Params`.

This leads to a situation where we create a `RequestConverter.Params`
object, mutate it, and then it appears nothing happens to it - it
appears to be unused. What happens behind the scenes is that the Request
object is mutated when methods on `Params` are invoked. This results in
unclear, confusing code where mutating one object changes another with
no obvious connection.

This commit refactors `RequestConverters.Params` to be a simple helper
class to produce a `Map` which must be passed explicitly to a Request
object. This makes it apparent that the `Params` are actually used, and
that they have an effect on the `request` object explicit and easier to
understand.

Co-authored-by: Ojas Gulati <ojasgulati100@gmail.com>

Propogate version in reindex from remote search (#42412)

This is related to #31908. In order to use the external version in a
reindex from remote request, the search request must be configured to
request the version (as it is not returned by default). This commit
modifies the search request to request the version. Additionally, it
modifies our current reindex from remote tests to randomly use the
external version_type.

Fix inverted condition so we never cache rest integ tests

Remove unused import

Geo: Refactor libs/geo parsers (#42549)

Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.

Detect when security index is closed (#42191)

If the security index is closed, it should be treated as unavailable
for security purposes.

Prior to 8.0 (or in a mixed cluster) a closed security index has
no routing data, which would cause a NPE in the cluster change
handler, and the index state would not be updated correctly.
This commit fixese that problem

Fix testTokenExpiry flaky test (#42585)

Test was using ClockMock#rewind passing the amount of nanoseconds
in order to "strip" nanos from the time value. This was intentional
as the expiration time of the UserToken doesn't have nanosecond
precision.
However, ClockMock#rewind doesn't support nanos either, so when it's
called with a TimeValue, it rewinds the clock by the TimeValue's
millis instead. This was causing the clock to go enough millis
before token expiration time and the test was passing. Once every
few hundred times though, the TimeValue by which we attempted to
rewind the clock only had nanos and no millis, so rewind moved the
clock back just a few millis, but still after expiration time.

This change moves the clock explicitly to the same instant as expiration,
using clock.setTime and disregarding nanos.

Revert "un-mute Watcher rolling upgrade tests and bump up logging (#42377)"

This reverts commit 697c793dcbabf1df0351d75a3705047ac4435dca.

Log leader and handshake failures by default (#42342)

Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log
anything above `DEBUG` level. However there are some situations where it is
appropriate for them to log at a higher level:

- if the low-level handshake succeeds but the high-level one fails then this
  indicates a config error that the user should resolve, and the exception
  will help them to do so.

- if leader checks fail repeatedly then we restart discovery, and the exception
  will help to determine what went wrong.

Resolves #42153

Deprecate CommonTermsQuery and cutoff_frequency (#42619)

* Deprecate CommonTermsQuery and cutoff_frequency

Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.

Relates to #27096

Fix Class Load Order in Netty4Plugin (#42591)

* Don't force the logger in the Netty4Plugin class already, at this point log4j might not be fully initialized.
   * The call was redundant anyway since we do the same thing in the Netty4Transport and Netty4HttpServerTransport classes already and there we do it properly after setting up log4j by initilizing the loggers
* Relates #42532

[DOCS] Rewrite 'wildcard' query (#42670)

[DOCS] path_hierarchy tokenizer examples (#39630)

Closes #17138

Fix error with mapping in docs

Fix refresh remote JWKS logic (#42662)

This change ensures that:

- We only attempt to refresh the remote JWKS when there is a
signature related error only ( BadJWSException instead of the
geric BadJOSEException )
- We do call OpenIDConnectAuthenticator#getUserClaims upon
successful refresh.
- We test this in OpenIdConnectAuthenticatorTests.

Without this fix, when using the OpenID Connect realm with a remote
JWKSet configured in `op.jwks_path`, the refresh would be triggered
for most configuration errors ( i.e. wrong value for `op.issuer` )
and the kibana wouldn't get a response and timeout since
`getUserClaims` wouldn't be called because
`ReloadableJWKSource#reloadAsync` wouldn't call `onResponse` on the
future.

[ML] [Data Frame] add support for weighted_avg agg (#42646)

Remove unused Gradle plugin (#42684)

Remove usage of deprecated compare gradle builds plugin (#42687)

* Remove usage of deprecated compare gradle builds plugin

* Remove system property only used by build comparison

Prevent merging nodes' data paths (#42665)

Today Elasticsearch does not prevent you from reconfiguring a node's
`path.data` to point to data paths that previously belonged to more than one
node. There's no good reason to be able to do this, and the consequences can be
quietly disastrous. Furthermore, #42489 might result in a user trying to split
up a previously-shared collection of data paths by hand and there's definitely
scope for mixing the paths up across nodes when doing this.

This change adds a check during startup to ensure that each data path belongs
to the same node.

Clarify the settings around limiting nested mappings. (#42686)

* Previously, we mentioned multiple times that each nested object was indexed as its own document. This is repetitive, and is also a bit confusing in the context of `index.mapping.nested_fields.limit`, as that applies to the number of distinct `nested` types in the mappings, not the number of nested objects. We now just describe the issue once at the beginning of the section, to illustrate why `nested` types can be expensive.
* Reference the ongoing example to clarify the meaning of the two settings.

Addresses #28363.

Make hashed token ids url safe (#42651)

This commit changes the way token ids are hashed so that the output is
url safe without requiring encoding. This follows the pattern that we
use for document ids that are autogenerated, see UUIDs and the
associated classes for additional details.

[DOCS] Disable Metricbeat system module (#42601)

Remove SecurityClient from x-pack (#42471)

This commit removes the SecurityClient class from x-pack. This client
class is a relic of the transport client, which is in the process of
being removed. Some tests were changed to use the high level rest
client and others use a client directly without the security client
wrapping it.

Remove Log4j 1.2 API as a dependency (#42702)

We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.

Remove client jar support from build (#42640)

The client jars were a way for modules and plugins to produce an
additional jar that contained classes for use by the transport client.
This commit removes that configuration as the transport client is being
removed.

relates #42638

mute failing search template test (#42730)

tracking issue #42664.

Remove groovy client docs (#42731)

The groovy client api was a wrapper around the transport client.
However, it has not been published since 2.4, as it had many issues with
the java security manager. This commit removes the docs from master for
the groovy client.

relates #42638

Fix docs typo in the certutil CSR mode (#42593)

Changes the mention of `cert` to `csr`.

Co-Authored-By: Alex Pang <pangyikhei+github@gmail.com>

Remove transport client docs (#42483)

This commit removes the transport client documentation.

remove v6.5.x and v6.6.x version constants (#42130)

related to refactoring initiative #41164.

Log the status of security on license change (#42488)

Whether security is enabled/disabled is dependent on the combination
of the node settings and the cluster license.

This commit adds a license state listener that logs when the license
change causes security to switch state (or to be initialised).

This is primarily useful for diagnosing cluster formation issues.

Remove leftover transport module docs (#42734)

This commit removes docs for alternate transport implementations which
were removed years ago. These were missed because they have redirects
masking their existsence.

Add option to ObjectParser to consume unknown fields (#42491)

ObjectParser has two ways of dealing with unknown fields: ignore them entirely,
or throw an error. Sometimes it can be useful instead to gather up these unknown
fields and record them separately, for example as arbitrary entries in a map.

This commit adds the ability to specify an unknown field consumer on an ObjectParser,
called with the field name and parsed value of each unknown field encountered during
parsing. The public API of ObjectParser is largely unchanged, with a single new
constructor method and interface definition.

Return NO_INTERVALS rather than null from empty TokenStream (#42750)

IntervalBuilder#analyzeText will currently return null if it is passed an
empty TokenStream, which can lead to a confusing NullPointerException
later on during querying. This commit changes the code to return
NO_INTERVALS instead.

Fixes #42587

[ML] [Data Frame] nesting group_by fields like other aggs (#42718)

[ML Data Frame] Refactor stop logic (#42644)

* Revert "invalid test"

This reverts commit 9dd8b52c13c716918ff97e6527aaf43aefc4695d.

* Testing

* mend

* Revert "[ML Data Frame] Mute Data Frame tests"

This reverts commit 5d837fa312b0e41a77a65462667a2d92d1114567.

* Call onStop and onAbort outside atomic update

* Don’t update CS

* Tidying up

* Remove invalid test that asserted logic that has been removed

* Add stopped event

* Revert "Add stopped event"

This reverts commit 02ba992f4818bebd838e1c7678bd2e1cc090bfab.

* Adding check for STOPPED in saveState

Re-enable token bwc tests (#42726)

This commit re-enables token bwc tests that run as part of the rolling
upgrade tests. These tests were muted while #42651 was being
backported.

[ML] Add Kibana application privilege to data frame admin/user roles (#42757)

Data frame transforms are restricted by different roles to ML, but
share the ML UI.  To prevent the ML UI being hidden for users who
only have the data frame admin or user role, it is necessary to add
the ML Kibana application privilege to the backend data frame roles.

[DOCS] Remove unneeded `ifdef::asciidoctor[]` conditionals (#42758)

Several `ifdef::asciidoctor` conditionals were added so that AsciiDoc
and Asciidoctor doc builds rendered consistently.

With https://github.com/elastic/docs/pull/827, Elasticsearch Reference
documentation migrated completely to Asciidoctor. We no longer need to
support AsciiDoc so we can remove these conditionals.

Resolves #41722

 Remove CommonTermsQuery and cutoff_frequency param  (#42654)

Remove `common` query and `cutoff_frequency` parameter of
`match` and `multi_match` queries. Both have already been
deprecated for the next 7.x version.

Closes: #37096

Clarify that inner_hits must be used to access nested fields. (#42724)

This PR updates the docs for `docvalue_fields` and `stored_fields` to clarify
that nested fields must be accessed through `inner_hits`. It also tweaks the
nested fields documentation to make this point more visible.

Addresses #23766.

Remove locale-dependent string checking

We were checking if an exception was caused by a specific reason "Not a
directory". Alas, this reason is locale-dependent and can fail on
systems that are not set to en_US.UTF-8. This commit addresses this by
deriving what the locale-dependent error message would be and using that
for comparison with the actual exception thrown.

Closes #41689

[DOCS] Remove unneeded options from `[source,sql]` code blocks (#42759)

In AsciiDoc, `subs="attributes,callouts,macros"` options were required
to render `include-tagged::` in a code block.

With elastic/docs#827, Elasticsearch Reference documentation migrated
from AsciiDoc to Asciidoctor.

In Asciidoctor, the `subs="attributes,callouts,macros"` options are no
longer needed to render `include-tagged::` in a code block. This commit
removes those unneeded options.

Resolves #41589

address SmokeTestWatcherWithSecurityIT#testSearchInputWithInsufficientPrivileges (#42764)

This commit adds busy wait and increases the interval for
SmokeTestWatcherWithSecurityIT#testSearchInputWithInsufficientPrivileges.

Watcher will not allow the same watch to be executed concurrently. If it
finds that case, it will update the watch history with a "not_executed_already_queued"
status. Given a slow machine, and 1 second interval this is possible.

To address this, this commit increases the interval so the watch can fire at most 2
times with a greater interval between the executions and adds a busy wait for the
expected state.

While this does not gaurntee a fix, it should greatly reduce the chances of this
test erroring.

Remove XPackClient from x-pack (#42729)

This commit removes the XPackClient class from x-pack. This class is a
relic of the TransportClient and simply a wrapper around it. Calls are
replaced with direct usage of a client. Additionally, the
XPackRestHandler class has been removed as it only served to provide
the XPackClient to implementing rest handlers.

Remove MonitoringClient from x-pack (#42770)

This commit removes the monitoring client from x-pack. This class is a
relic of the TransportClient and was only used in a test.

Use an anonymous inner class instead of lambda for UP-TO-DATE support

remove v6.8.x version constant and the backcompat code that uses it (#42146)

Remove Support for VERSION_CHECKPOINTS Translogs (#42782)

* Closes #42699

Remove some leftover refs to minimum_master_nodes (#42700)

Today `InternalTestCluster` has a few vestigial mentions of the
`minimum_master_nodes` setting. This commit removes them and simplifies some of
the surrounding logic.

Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197)

This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.

This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.

[ML] Better detection of binary input in find_file_structure (#42707)

This change helps to prevent the situation where a binary
file uploaded to the find_file_structure endpoint is
detected as being text in the UTF-16 character set, and
then causes a large amount of CPU to be spent analysing
the bogus text structure.

The approach is to check the distribution of zero bytes
between odd and even file positions, on the grounds that
UTF-16BE or UTF16-LE would have a very skewed distribution.

[Docs] Add example to reimplement stempel analyzer (#42676)

Adding an example of how to re-implement the polish stempel analyzer
in case a user want to modify or extend it. In order for the analyzer to be
able to use polish stopwords, also registering a polish_stop filter for the
stempel plugin.

Closes #13150

Clarify heap setting in Docker docs (#42754)

Add note in the Docker docs that even when container memory is limited,
we still require specifying -Xms/-Xmx using one of the supported
methods.

[ML] Add a limit on line merging in find_file_structure (#42501)

When analysing a semi-structured text file the
find_file_structure endpoint merges lines to form
multi-line messages using the assumption that the
first line in each message contains the timestamp.
However, if the timestamp is misdetected then this
can lead to excessive numbers of lines being merged
to form massive messages.

This commit adds a line_merge_size_limit setting
(default 10000 characters) that halts the analysis
if a message bigger than this is created.  This
prevents significant CPU time being spent subsequently
trying to determine the internal structure of the
huge bogus messages.

[DOCS] Adds redirect for deprecated `common` terms query (#42767)

Make Connection Future Err. Handling more Resilient (#42781)

* There were a number of possible (runtime-) exceptions that could be raised in the adjusted code and prevent resolving the listener
* Relates #42350

Read the default pipeline for bulk upsert through an alias (#41963)

This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.

Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.

RollupStart endpoint should return OK if job already started (#41502)

If a job is started or indexing, RollupStart should always return
a success (200 OK) response since the job is, in fact, started

SQL: [Docs] Fix links syntax (#42806)

Fix a couple of wrong links because of the order of the anchor
and the usage of backquotes.

More improvements to cluster coordination docs (#42799)

This commit addresses a few more frequently-asked questions:

* clarifies that bootstrapping doesn't happen even after a full cluster
  restart.

* removes the example that uses IP addresses, to try and further encourage the
  use of node names for bootstrapping.

* clarifies that auto-bootstrapping might form different clusters on different
  hosts, and gives a process for starting again if this wasn't what you wanted.

* adds the "do not stop half-or-more of the master-eligible nodes" slogan that
  was notably absent.

* reformats one of the console examples to a narrower width

Remove "template" field in IndexTemplateMetaData (#42099)

Remove "template" field from XContent parsing in IndexTemplateMetaData

Fix error with test conventions on tasks that require Docker (#42719)

[ML] [Data Frame] adding and modifying auditor messages (#42722)

* [ML] [Data Frame] adding and modifying auditor messages

* Update DataFrameTransformTask.java

Make high level rest client a fat jar (#42771)

The original intention of the high level rest client was to provide a
single jar. We tried this long ago, but had issues with intellij not
correctly resolving internal tests that relied on the HLRC. This
commit tweaks our use of the shadow plugin so we now produce a correct
fat jar (minus the LLRC and server jars, which we can address later),
with the module "client" dependencies included, as well as the
correct pom file omitting those dependencies.

relates #42638

Add Basic Date Docs to Painless (#42544)

[Docs] Add note for date patterns used for index search. (#42810)

Add an explanatory NOTE section to draw attention to the difference
between small and capital letters used for the index date patterns.
e.g.: HH vs hh, MM vs mm.

Closes: #22322

[Docs] Fix reference to `boost` and `slop` params (#42803)

For `multi_match` query: link `boost` param to the generic reference
for query usage and `slop` to the `match_phrase` query where its usage
is documented.

Fixes: #40091

Remove unnecessary usage of Gradle dependency substitution rules (#42773)

Don't require TLS for single node clusters (#42826)

This commit removes the TLS cluster join validator.

This validator existed to prevent v6.x nodes (which mandated
TLS) from joining an existing cluster of v5.x nodes (which did
not mandate TLS) unless the 6.x node (and by implication the
5.x nodes) was configured to use TLS.

Since 7.x nodes cannot talk to 5.x nodes, this validator is no longer
needed.

Removing the validator solves a problem where single node clusters
that were bound to local interfaces were incorrectly requiring TLS
when they recovered cluster state and joined their own cluster.

OIDC Guide additions (#42555)

- Call out the fact that the SSL Configuration is important and
offer a minimal example of configuring a custom CA for trust.
- Add information about the `op.issuer` that was missing and add
information about the `rp.post_logout_redirect` in the example
since `op.endsession_endpoint` was already mentioned there and
these two should be together
- Explain that `op.jwkset_path` can be a URL.

[ML] [Data Frame] Adding supported aggs in docs (#42728)

* [ML] [Data Frame] Adding supported aggs in docs

* [DOCS] Moves pivot to definitions list

[ML][Data Frame] forcing that no ptask => STOPPED state (#42800)

* [ML][Data Frame] forcing that no ptask => STOPPED state

* Addressing side-effect, early exit for stop when stopped

[Docs] Add to preference parameter docs (#42797)

Adding notes to the existing docs about how using `preference` might increase
request cache utilization but also add warning about the downsides.

Closes #24278

[DOCS] Fix broken bucket script agg link

Refactor control flow in TransportAnalyzeAction (#42801)

The control flow in TransportAnalyzeAction is currently spread across two large
methods, and is quite difficult to follow. This commit tidies things up a bit, to make
it clearer when we use pre-defined analyzers and when we use custom built ones.

[DOCS] Fix typo in bucket script aggregation link

Fix testNoMasterActionsWriteMasterBlock (#42798)

This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.

Closes #39688

Change shard allocation filter property and api (#42602)

The current example is not working and a bit confused. This change tries
to match it with the sample of the watcher blog.

NullPointerException when creating a watch with Jira action (#41922) (#42081)

NullPointerException when secured_url does not use proper scheme in jira action.
This commit will handle Expection and display proper message.

Eclipse libs projects setup fix (#42852)

Fallout from #42773 for eclipse users.

Replicate aliases in cross-cluster replication (#41815)

This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.

Fix version parsing in various tests (#42871)

This commit fixes the version parsing in various tests. The issue here is that
the parsing was relying on java.version. However, java.version can contain
additional characters such as -ea for early access builds. See JEP 233:

Name                            Syntax
------------------------------  --------------
java.version                    $VNUM(\-$PRE)?
java.runtime.version            $VSTR
java.vm.version                 $VSTR
java.specification.version      $VNUM
java.vm.specification.version   $VNUM

Instead, we want java.specification.version.

Adjust BWC version on aliases replication

This commit adjusts the BWC version on aliases replication after the
change has been backported to 7.x (currently versioned as 7.3.0).

Enable testing against JDK 13 EA builds (#40829)

This commit adds JDK 13 to the CI rotation for testing. For now, we will
be testing against JDK 13 EA builds.

Avoid clobbering shared testcluster JAR files when installing modules (#42879)

Permit API Keys on Basic License (#42787)

Kibana alerting is going to be built using API Keys, and should be
permitted on a basic license.

This commit moves API Keys (but not Tokens) to the Basic license

Relates: kibana#36836

Deduplicate alias and concrete fields in query field expansion (#42328)

The full-text query parsers accept field pattern that are expanded using the mapping.
Alias field are also detected during the expansion but they are not deduplicated with the
concrete fields that are found from other patterns (or the same). This change ensures
that we deduplicate the target fields of the full-text query parsers in order to avoid
adding the same clause multiple times. Boolean queries are already able to deduplicate
clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect
 these duplicates early on.

Enable Parallel Deletes in Azure Repository (#42783)

* Parallel deletes via private thread pool

More logging in testRerouteOccursOnDiskPassingHighWatermark (#42864)

This test is failing because recoveries of these empty shards are not
completing in a reasonable time, but the reason for this is still obscure. This
commit adds yet more logging.

Relates #40174, #42424

Removes type from TermVectors APIs (#42198)

Use reader attributes to control term dict memory useage (#42838)

This change makes use of the reader attributes added in LUCENE-8671
to ensure that `_id` fields are always on-heap for best update performance
and term dicts are generally off-heap on Read-Only engines.

Closes #38390

Fix Stuck IO Thread Logging Time Precision (#42882)

* The precision of the timestamps we get from the cached time thread is only 200ms by default resulting in a number of needless ~200ms slow network thread execution logs
  * Fixed by making the warn threshold a function of the precision of the cached time thread found in the settings

Enable console audit logs for docker (#42671)

Enable audit logs in docker by creating console appenders for audit loggers.
also rename field @timestamp to timestamp and add field `type` with value audit

The docker build contains now two log4j configuration for oss or default versions. The build now allows override the default configuration.

Also changed the format of a timestamp from ISO8601 to include time zone as per this discussion https://github.com/elastic/elasticsearch/pull/36833#discussion_r244225243

closes #42666

[ML] Change dots in CSV column names to underscores (#42839)

Dots in the column names cause an error in the ingest
pipeline, as dots are special characters in ingest pipeline.
This PR changes dots into underscores in CSV field names
suggested by the ML find_file_structure endpoint _unless_
the field names are specifically overridden.  The reason for
allowing them in overrides is that fields that are not
mentioned in the ingest pipeline can contain dots.  But it's
more consistent that the default behaviour is to replace
them all.

Fixes elastic/kibana#26800

Disable building on JDK 13 in CI

This commit disables building on JDK 13 in CI. The reason for this is
because Gradle is not yet ready to run on JDK 13. We could re-introduce
infrastructure to enable Gralde to run on a different JDK than the build
JDK, but rather than introducing such complexity we will instead wait
for Gradle to be ready to run on JDK 13.

Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189

Fix Azure Plugin Compilation Issue

Fix Infinite Loops in ExceptionsHelper#unwrap (#42716)

* Fix Infinite Loops in ExceptionsHelper#unwrap

* Keep track of all seen exceptions and break out on loops
* Closes #42340

Add custom metadata to snapshots (#41281)

Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.

Omit JDK sources archive from bundled JDK (#42821)

Clean Up Painless Datetime Docs (#42869)

This change abstracts the specific types away from the different
representations of datetime as a datetime representation in code can be all
kinds of different things. This defines the three most common types of
datetimes as numeric, string, and complex while outlining the type most
typically used for these as long, String, and ZonedDateTime, respectively.
Documentation uses the definitions while examples use the types. This makes
the documentation easier to consume especially for people from a non-Java
background.

Optimize Snapshot Finalization (#42723)

* Optimize Snapshot Finalization

* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs

Make sibling pipeline agg ctor's protected (#42808)

SiblingPipelineAggregator is a public interfaces,
but the ctor was package-private.  These should be protected so that
plugin authors can extend and implement their own sibling pipeline agg.

[DOCS] Adds discovery.type (#42823)

Co-Authored-By: David Turner <david.turner@elastic.co>

[Docs] Clarify caveats for phonetic filters replace option (#42807)

The `replace` option in the phonetic token filter can have suprising side
effects, e.g. such as described in #26921. This PR adds a note to be mindful
about such scenarios and offers alternatives to using the `replace` option.

Closes #26921

Skip installation of pre-bundled integ-test modules (#42900)

Mute failing test

Remove alpha/beta/rc from version constants (#42778)

Prerelease qualifiers were moved outside of Version logic within
Elasticsearch for 7.0.0, where they are now just an external modifier on
the filename. However, they still existed inside code to support 6.x
constants. Now that those constants have been removed in master, the
prerelease logic can be removed.

Skip shadow jar logic for javadoc and sources jars (#42904)

For shadow jars, we place the original jar in a build/libs
directory. This is to avoid clobbering the original jar when building
the shadow jar. However, we need to skip this logic for javadoc and
sources jars otherwise they would never be copied to the
build/distributions directory during assembly.

Use jar task name constant in BuildPlugin

Rather than comparing to a raw string, this commit uses a built-in
constant to refer to the jar task name.

Relates #42904

Remove the transport client (#42538)

This commit removes the transport client and all remaining uses in the code.

Correct versions limits for snapshot metadata field (#42911)

Now that the snapshot metadata field has been backported, the version
restrictions used in tests and for serialization need to corrected.

[ML-DataFrame] increase the scheduler interval to 10s (#42845)

increases the scheduler interval to fire less frequently, namely changing it from 1s to 10s. The scheduler interval is used for retrying after an error condition.

[ML-DataFrame] reduce log spam: do not trigger indexer if state is indexing or stopping (#42849)

reduce log spam: do not trigger indexer if state is indexing or stopping

[ML] Add earliest and latest timestamps to field stats (#42890)

This change adds the earliest and latest timestamps into
the field stats for fields of type "date" in the output of
the ML find_file_structure endpoint.  This will enable the
cards for date fields in the file data visualizer in the UI
to be made to look more similar to the cards for date
fields in the index data visualizer in the UI.

[ML] Close sample stream in find_file_structure endpoint (#42896)

A static code analysis revealed that we are not closing
the input stream in the find_file_structure endpoint.
This actually makes no difference in practice, as the
particular InputStream implementation in this case is
org.elasticsearch.common.bytes.BytesReferenceStreamInput
and its close() method is a no-op.  However, it is good
practice to close the stream anyway.

Mute testEnableDisableBehaviour (#42929)

[ML] [Data Frame] Adding pending task wait to the hlrc cleanup (#42907)

Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)

* Add a merge policy that prunes soft-deleted postings

This change adds a merge policy that drops all postings for documents that
are marked as deleted. This is usually unnecessary unless soft-deletes are used
with a rentention policy since otherwise a merge would remove deleted documents anyway.
Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting search and update perfromance.
Note, using this merge policy will remove all search capabilities for soft-deleted documents.

* fix checkstyle

* fix assertion

* fix imports

* fix compilation

* add predicate to select fields to prune

* only purne ID field

* beef up test

* roll back retention query

* foo

* remove redundant modifier

* fix assumption about empty Terms

* remove null check

* Add test for the engine to check if we prune the IDs of retained docs away

Mute failing testPerformActionAttrsRequestFails (#42933)

[ML][Data Frame] pull state and states for indexer from index (#42856)

* [ML][Data Frame] pull state and states for indexer from index

* Update DataFrameTransformTask.java

Revert "Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)"

This reverts commit 186b52c5738688b72543d9353539468e719fafce
github messed up the commit message due to a retry.
A followup commit will add this change again with a corrected
commit message.

Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)

This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.

Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.

configure auto expand for dataframe indexes (#42924)

creates the dataframe destination index with auto expand for replicas (0-1)

Fix NPE when rejecting bulk updates (#42923)

Single updates use a different internal code path than updates that are wrapped in a bulk request.
While working on a refactoring to bring both closer together I've noticed that bulk updates were
failing some of the tests that single updates passed. In particular, bulk updates cause
NullPointerExceptions to be thrown and listeners not being properly notified when being rejected
from the thread pool.

Fix testPendingTasks (#42922)

Fixes a race in the test which can be reliably reproduced by adding Thread.sleep(100) to the end of
IndicesService.processPendingDeletes

Closes #18747

Fix `InternalEngineTests#testPruneAwayDeletedButRetainedIds`

The test failed because we had only a single document in the index
that got deleted such that some assertions that expected at least
one live doc failed.

Relates to: #40741

[TEST] Remove unnecessary log line

[DOCS] Rewrite terms query (#42889)

Reindex max_docs parameter name (#41894)

Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.

The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.

Finally, all 3 endpoints now support max_docs in both body and URL.

Relates #24344

[DOCS] Move 'Scripting' section to top-level navigation. (#42939)

shrink may full copy when using multi data paths (#42913)

Additional scenario for full segment copy if hard link
cannot work across disks.

Fix concurrent search and index delete (#42621)

Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.

This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).

Closes #28053

Wire query cache into sorting nested-filter computation (#42906)

Don't use Lucene's default query cache when filtering in sort.

Closes #42813

Make PR template reference supported architectures (#42919)

This commit changes the GitHub PR template to refer to supported "OS
and architecture" (rather than use OS) since we only accept PRs for
x86_64 (and not Linux ARM, s390, etc)

Relax timeout in NodeConnectionsServiceTests (#42934)

Today we assert that the connection thread is blocked by the time the test gets
to the barrier, but in fact this is not a valid assertion. The following
`Thread.sleep()` will cause the test to fail reasonably often.

```diff
diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
index 193cde3180d..0e57211cec4 100644
--- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
@@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase {
             final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node);
             if (connectionBlock != null) {
                 try {
+                    Thread.sleep(50);
                     connectionBlock.run();
                 } catch (Exception e) {
                     throw new AssertionError(e);
```

This change relaxes the test to allow some time for the connection thread to
hit the barrier.

Fixes #40170

Improve translog corruption detection (#42744)

Today we test for translog corruption by incrementing a byte by 1 somewhere in
a file, and verify that this leads to a `TranslogCorruptionException`.
However, we rely on _all_ corruptions leading to this exception in the
`RemoveCorruptedShardDataCommand`: this command fails if a translog file
corruption leads to a different kind of exception, and `EOFException` and
`NegativeArraySizeException` are both possible. This commit strengthens the
translog corruption detection tests by simulating the following:

- a bit is flipped
- all bits are cleared or set
- a random value is written
- the file is truncated

It also makes sure that we return a `TranslogCorruptionException` in all such
cases.

Fixes #42661

Fix FsRepositoryTests.testSnapshotAndRestore (#42925)

* The commit generation can be 3 or 2 here -> fixed by checking the actual generation on the second commit instead of hard coding 2
* Closes #42905

Only ignore IOException when fsyncing on dirs (#42972)

Today in the method IOUtils#fsync we ignore IOExceptions when fsyncing a
directory. However, the catch block here is too broad, for example it
would be ignoring IOExceptions when we try to open a non-existant
file. This commit addresses that by scoping the ignored exceptions only
to the invocation of FileChannel#force.

Remove Comma in Example (#41873)

The comma is there in error as there are no other parameter after 'value'

[ML][Data frame] make sure that fields exist when creating progress (#42943)

[TEST] Adding a BWC test for ML categorization config (#42981)

This test coverage was previously missing.

Remove WatcherClient from x-pack (#42815)

This commit removes the WatcherClient and WatcherRestHandler from the
codebase. The WatcherClient was a convenience wrapper around the
transport client, which is being removed so the client no longer serves
a purpose. The WatcherRestHandler is no longer needed as its primary
purpose was to provide a WatcherClient to the implementing handlers.

Remove the CcrClient (#42816)

This commit removes the CcrClient class, which is a wrapper around the
transport client. The transport client is being removed so the client
is no longer needed.

Remove the ILMClient (#42817)

This commit removes the ILMClient class, which is a wrapper around the
transport client. This class is not used in the codebase and the
transport client is being removed.

[DOCS] Add explicit `articles_case` parameter to Elision Token Filter example (#42987)

Update default shard count per index in readme (#42388)

The default shard count has been reduced from 5 to 1. This commit
updates the readme to reflect that changed default.

[ML][Data Frame] allow null values for aggs with sparse data (#42966)

* [ML][Data Frame] allow null values for aggs with sparse data

* Making classes static, memory allocation optimization

Drop dead code for socket permissions for transport (#42990)

This code has not been needed since the removal of tribe nodes, it was
left behind when those were dropped (note that regular transport
permissions are handled through transport profiles, even if they are not
explicitly in use).

Fix possible NPE in put mapping validators (#43000)

When applying put mapping validators, we apply all the validators in the
collection. If a failure occurs, we collect that as a top-level
exception, and suppress any additional failures into the top-level
exception. However, if a request passes the validator after a top-level
exception has been collected, we would try to suppress a null exception
into the top-level exception. This is a violation of the
Throwable#addSuppressed API. This commit addresses this, and adds test
to cover the logic of collecting the failures when validating a put
mapping request.

Fix put mapping request validators random test

This commit fixes a test bug in the request validators random test. In
particular, an assertion was not properly nested in a guard that would
ensure that was at least one failure.

Relates #43000

Fix IOUtils#fsync on Windows fsyncing directories (#43008)

Fsyncing directories on Windows is not possible. We always suppressed
this by allowing that an AccessDeniedException is thrown when attemping
to open the directory for reading. Yet, this suppression also allowed
other IOExceptions to be suppressed, and that was a bug (e.g., the
directory not existing, or a filesystem error and reasons that we might
get an access denied there, like genuine permissions issues). This
leniency was previously removed yet it exposed that we were suppressing
this case on Windows. Rather than relying on exceptions for flow control
and continuing to suppress there, we simply return early if attempting
to fsync a directory on Windows (we will not put this burden on the
caller).

Mute testLookupSeqNoByIdInLucene

Tracked at #42979

Mute AutodetectMemoryLimitIT#testTooManyPartitions

Relates #43013

Fix assertion in ReadOnlyEngine (#43010)

We should execute the assertion before throwing an exception;
otherwise, it's a noop.

Unmuted testRecoverBrokenIndexMetadata

These tests should be okay as we flush at the end of peer recovery.

Closes #40867

Refactor put mapping request validation for reuse (#43005)

This commit refactors put mapping request validation for reuse. The
concrete case that we are after here is the ability to apply effectively
the same framework to indices aliases requests. This commit refactors
the put mapping request validation framework to allow for that.

Do not allow modify aliases on followers (#43017)

Now that aliases are replicated by a follower from its leader, this
commit prevents directly modifying aliases on follower indices.

Adjust IndicesAliasesRequest origin BWC version

The work to add the origin field to the IndicesAliasesRequest has been
backported to 7.x. Since this version is currently 7.3.0, this commit
adjusts the version in master accordingly.

Add note to CCR docs regarding alias replication

This commit adds a note to the docs regarding the automatic replication
of aliases by a follower index from its leader index.

Add note to CCR docs about mapping/alias updates

This commit adds a note to the docs clarifying that it is not possible
to manually update the mapping nor the aliases of a follower index.

Unmute PermissionsIT test and enable debug logging for it (#42876)

This unmutes `testWhenUserLimitedByOnlyAliasOfIndexCanWriteToIndexWhichWasRolledoverByILMPolicy`
and enables DEBUG logging. The failure from this test case from a query
running rather than ILM itself, so more information is needed.

Relates to #41440

Since SQL is GA, remove the sql language plugin from this list (#41533)

SQL: cover the Integer type when extracting values from _source (#42859)

* Take into consideration a wider range of Numbers when extracting the
values from source, more specifically - BigInteger and BigDecimal.

Allow routing commands with ?retry_failed=true (#42658)

We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.

This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:

```
POST /_cluster/reroute?retry_failed=true
{
  "commands": [
    {
      "allocate_replica": {
        "index": "blahblah",
        "shard": 2,
        "node": "node-4"
      }
    }
  ]
}
```

Fixes #39546

Fix auto fuzziness in query_string query (#42897)

Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string`
does not take the length of the term into account when computing the distance and always use
a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when
the fuzziness is computed.

Don't run build-tools integ tests on FIPS (#42986)

These run Gradle and FIPS isn't supported

Closes #41721

Fix typo in create-index.asciidoc (#41806)

Update regexp-syntax.asciidoc (#43021)

Corrects a typo.

Update search-settings.asciidoc (#43016)

Grammar and spelling fixes

[ML] Re-enable integration test (#41712)

Move construction of custom analyzers into AnalysisRegistry (#42940)

Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.

This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.

Improve documentation for smart_cn analyzer (#42822)

Correct the description of generate_word_parts (#43026)

Clean up configuration when docker isn't available (#42745)

We initially added `requireDocker` for a way for tasks to say that they
absolutely must have it, like the  build docker image tasks.
Projects using the test fixtures plugin are not in this both, as the
intent with these is that they will be skipped if docker and docker-compose
is not available.

Before this change we were lenient, the docker image build would succeed
but produce nothing. The implementation was also confusing as it was not
immediately obvious this was the case due to all the indirection in the
code.

The reason we have this leniency is that when we added the docker image
build, docker was a fairly new requirement for us, and we didn't have
it deployed in CI widely enough nor had CI configured to prefer workers
with docker when possible. We are in a much better position now.
The other reason was other stack teams running `./gradlew assemble`
in their respective CI and the possibility of breaking them if docker is
not installed. We have been advocating for building specific distros for
some time now and I will also send out an additional notice

The PR also removes the use of `requireDocker` from tests that actually
use test fixtures and are ok without it, and fixes a bug in test
fixtures that would cause incorrect configuration and allow some tasks
to run when docker was not available and they shouldn't have.

Closes  #42680 and #42829  see also #42719

Better Exception in NetworkUtilsTests (#42109)

* We are still running into an exception here every so often
   * Adjusted exception to contain interface name
* Relates to #41549

Fix GCS Blob Repository 3rd Party Tests (#43030)

* We have to strip the trailing slash from child names here like we do for AWS
* closes #43029

[DOCS] Change `// TESTRESPONSE[_cat]` to `// TESTRESPONSE[non_json]` (#43006)

[ML] Get resources action should be lenient when sort field is unmapped (#42991)

Get resources action sorts on the resource id. When there are no resources at
all, then it is possible the index does not contain a mapping for the resource
id field. In that case, the search api fails by default.

This commit adjusts the search request to ignore unmapped fields.

Closes elastic/kibana#37870

Mute AzureDiscoveryClusterFormationTests (#43049)

Relates #43048

Fix IpFilteringIntegrationTests (#43019)

* Increase timeout to 5s since we saw 500ms+ GC pauses on CI
* closes #40689

Increase waiting time when check retention locks (#42994)

WriteActionsTests#testBulk and WriteActionsTests#testIndex sometimes
fail with a pending retention lock. We might leak retention locks when
switching to async recovery. However, it's more likely that ongoing
recoveries prevent the retention lock from releasing.

This change increases the waiting time when we check for no pending
retention lock and also ensures no ongoing recovery in
WriteActionsTests.

Closes #41054

[ML][Data Frame] Removes slice specification from DBQ. See #42996 (#43036)

Rename processor test fix (#43035)

If the source field name is a prefix of the target field name, the
source field still exists after rename processor has run. Adjusted test
case to handle that case.

Default distro run creates elastic-admin user (#43004)

When using gradle run by itself, this uses the default distro with a
basic license and enables security. There is a setup command to create
a elastic-admin user but only when the license is a trial license. Now
that security is available with the basic license, we should always run
this command when using the default distribution.

Fixing handling of auto slices in bulk scroll requests (#43050)

* Fixing handling of auto slices in bulk scroll requests

* adjusting assertions for tests

Unmute IndexFollowingIT#testFollowIndex

Fixed in #41987

Fix NPE in CcrRetentionLeaseIT (#43059)

The retention leases stats is null if the processing shard copy is being
closed. In this the case, we should check against null then retry to
avoid failing a test.

Closes #41237

[ML] Changes slice specification to auto. See #42996 (#43039)

[ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds (#42969)

* [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds

* only supporting doc_values for geo_point fields

* moving validation into GeoPointField ctor

Upgrade AWS SDK to Latest Version (#42708)

* Just staying up to data on the SDK version
* Use `AbstractAmazonEC2` to shorten code

Better test diag output on OOM (#42989)

If linearizability checking fails with OOM (or other exception), we did
not get the serialized history written into the log, making it difficult
to debug in cases where the problem is hard to reproduce. Fixed to
always attempt dumping the serialized history.

Related to #42244

Refresh remote JWKs on all errors (#42850)

It turns out that key rotation on the OP, can manifest as both
a BadJWSException and a BadJOSEException in nimbus-jose-jwt. As
such we cannot depend on matching only BadJWSExceptions to
determine if we should poll the remote JWKs for an update.

This has the side-effect that a remote JWKs source will be polled
exactly one additional time too for errors that have to do with
configuration, or for errors that might be caused by not synched
clocks, forged JWTs, etc. ( These will throw a BadJWTException
which extends BadJOSEException also )

Split search in two when made against throttled and non throttled searches (#42510)

When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.

This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.

Closes #40900

[DOCS] Clarify phrase suggester docs smoothing parameter (#42947)

Closes #28512

remove path from rest-api-spec (#41452)

SQL: Clarify that the connections the jdbc driver creates are not pooled (#42992)

Restructure the SQL Language section to have proper sub-sections (#43007)

Rest docs page update
- have the section be on separate pages
- add an Overview page
- add other formats examples

Increase test logging for testSyncedFlushSkipOutOfSyncReplicas

Relates to #43086

Rename TESTRESPONSE[_cat] to TESTRESPONSE[non_json] (#43087)

Documents the new deprecations op…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants