Stabilize RareClusterStateIT #38671

original-brownbear · 2019-02-10T13:11:55Z

Use actual master node, not just a master eligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs.
Closes [CI] Failure in org.elasticsearch.indices.state.RareClusterStateIT.testDelayedMappingPropagationOnReplica #36813

elasticmachine · 2019-02-10T13:11:57Z

Pinging @elastic/es-distributed

original-brownbear · 2019-02-10T13:59:15Z

Jenkins test this

original-brownbear · 2019-02-10T16:33:09Z

Jenkins run elasticsearch-ci/1
Jenkins run elasticsearch-ci/2

ywelsch · 2019-02-12T09:03:01Z

do we know why it takes 15 seconds? What is it waiting on?

original-brownbear · 2019-02-12T09:09:24Z

@ywelsch it's waiting on the next cluster state from the dynamic mapping upgrade here https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L139.

I tried fixing this by creating the future before initiating the mapping update, but that randomly lead to even longer wait times (probably because some other new state without the mapping update hit the observer) and then we try writing again and it again fails and then we have to start over. As far as I understand it the deeper problem seems to be the indiscriminate waiting for just the next state in https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L139 combined with the fact that we add the observer only after we initiate the state update so it might just miss the update.

* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes elastic#36813

original-brownbear · 2019-02-13T19:32:07Z

@ywelsch urgh :) this turned out to be really trivial. It was exactly 15s waits here because for the tested/failing seed from the issue it would use master nodes in the order not-master, actual-master, not-master, not-master, ..., actual-master within 15 seconds when randomly selecting master nodes.
So mostly if you ran that in a loop, the first retry on the busy assert would pass, but if you got unlucky you'd have to wait 15s to get the next try that hits actual-master :)

* elastic/master: Remove immediate operation retry after mapping update (elastic#38873) Remove mentioning of types from bulk API docs (elastic#38896) SQL: change JDBC setup URL in the documentation (elastic#38564) Skip BWC tests in checkPart1 and checkPart2 (elastic#38730) Enable silent FollowersCheckerTest (elastic#38851) Update TESTING.asciidoc with platform specific instructions (elastic#38802) Use consistent view of realms for authentication (elastic#38815) Stabilize RareClusterState (elastic#38671) Increase Timeout in UnicastZenPingTests (elastic#38893) Do not recommend installing vagrant-winrm elastic#38887 _cat/indices with Security, hide names when wildcard (elastic#38824) SQL: fall back to using the field name for column label (elastic#38842) Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (elastic#38709) Remove joda time mentions in documentation (elastic#38720) Add enabled status for token and api key service (elastic#38687)

* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes elastic#36813

* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes #36813

original-brownbear added >bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v8.0.0 v7.2.0 labels Feb 10, 2019

original-brownbear added >test Issues or PRs that are addressing/adding tests WIP labels Feb 10, 2019

original-brownbear removed the WIP label Feb 10, 2019

original-brownbear added the WIP label Feb 10, 2019

original-brownbear changed the title ~~Fix Race in Wait for Dynamic Mapping Update~~ Increase Timeout in RareClusterStateIT Feb 10, 2019

original-brownbear removed WIP >bug labels Feb 10, 2019

original-brownbear requested review from andrershov, ywelsch and DaveCTurner February 11, 2019 08:41

original-brownbear added the WIP label Feb 12, 2019

original-brownbear removed request for andrershov, ywelsch and DaveCTurner February 12, 2019 09:40

Stabilize RareClusterState

2adb7a2

* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes elastic#36813

original-brownbear force-pushed the 36813 branch from 1ab030b to 2adb7a2 Compare February 13, 2019 19:28

original-brownbear removed the WIP label Feb 13, 2019

original-brownbear requested a review from ywelsch February 13, 2019 19:28

original-brownbear changed the title ~~Increase Timeout in RareClusterStateIT~~ Stabilize RareClusterStateIT Feb 13, 2019

original-brownbear requested a review from andrershov February 13, 2019 22:20

ywelsch approved these changes Feb 14, 2019

View reviewed changes

original-brownbear added the backport pending label Feb 14, 2019

original-brownbear merged commit c8224e3 into elastic:master Feb 14, 2019

original-brownbear deleted the 36813 branch February 14, 2019 14:15

original-brownbear removed the backport pending label Feb 28, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize RareClusterStateIT #38671

Stabilize RareClusterStateIT #38671

original-brownbear commented Feb 10, 2019 •

edited

Loading

elasticmachine commented Feb 10, 2019

original-brownbear commented Feb 10, 2019

original-brownbear commented Feb 10, 2019

ywelsch commented Feb 12, 2019

original-brownbear commented Feb 12, 2019

original-brownbear commented Feb 13, 2019

Stabilize RareClusterStateIT #38671

Stabilize RareClusterStateIT #38671

Conversation

original-brownbear commented Feb 10, 2019 • edited Loading

elasticmachine commented Feb 10, 2019

original-brownbear commented Feb 10, 2019

original-brownbear commented Feb 10, 2019

ywelsch commented Feb 12, 2019

original-brownbear commented Feb 12, 2019

original-brownbear commented Feb 13, 2019

original-brownbear commented Feb 10, 2019 •

edited

Loading