-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stabilize RareClusterStateIT #38671
Stabilize RareClusterStateIT #38671
Conversation
original-brownbear
commented
Feb 10, 2019
•
edited
Loading
edited
- Use actual master node, not just a master eligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs.
- Closes [CI] Failure in org.elasticsearch.indices.state.RareClusterStateIT.testDelayedMappingPropagationOnReplica #36813
Pinging @elastic/es-distributed |
Jenkins test this |
Jenkins run elasticsearch-ci/1 |
do we know why it takes 15 seconds? What is it waiting on? |
@ywelsch it's waiting on the next cluster state from the dynamic mapping upgrade here https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L139. I tried fixing this by creating the future before initiating the mapping update, but that randomly lead to even longer wait times (probably because some other new state without the mapping update hit the observer) and then we try writing again and it again fails and then we have to start over. As far as I understand it the deeper problem seems to be the indiscriminate waiting for just the next state in https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L139 combined with the fact that we add the observer only after we initiate the state update so it might just miss the update. |
* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes elastic#36813
1ab030b
to
2adb7a2
Compare
@ywelsch urgh :) this turned out to be really trivial. It was exactly 15s waits here because for the tested/failing seed from the issue it would use master nodes in the order |
* elastic/master: Remove immediate operation retry after mapping update (elastic#38873) Remove mentioning of types from bulk API docs (elastic#38896) SQL: change JDBC setup URL in the documentation (elastic#38564) Skip BWC tests in checkPart1 and checkPart2 (elastic#38730) Enable silent FollowersCheckerTest (elastic#38851) Update TESTING.asciidoc with platform specific instructions (elastic#38802) Use consistent view of realms for authentication (elastic#38815) Stabilize RareClusterState (elastic#38671) Increase Timeout in UnicastZenPingTests (elastic#38893) Do not recommend installing vagrant-winrm elastic#38887 _cat/indices with Security, hide names when wildcard (elastic#38824) SQL: fall back to using the field name for column label (elastic#38842) Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test (elastic#38709) Remove joda time mentions in documentation (elastic#38720) Add enabled status for token and api key service (elastic#38687)
* Use actual master node, not just a master elligible node when trying to cancel publication. This only works on the master and for unlucky seeds we never try the master within the 10s that the busy assert runs. * Closes elastic#36813