Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid throwing NPE when no master found #42419

Conversation

DaveCTurner
Copy link
Contributor

Today in SpecificMasterNodesIT we assert the name of the master and throw a
NPE if there is no master. This doesn't work within an assertBusy() because
the NPE triggers an immediate failure rather than the desired retry. This
commit addresses this by first asserting that the master is non-null.

Fixes #38331
Relates #38432, #37802

Today in `SpecificMasterNodesIT` we assert the name of the master and throw a
NPE if there is no master. This doesn't work within an `assertBusy()` because
the NPE triggers an immediate failure rather than the desired retry. This
commit addresses this by first asserting that the master is non-null.

Fixes elastic#38331
Relates elastic#38432
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.2.0 v7.3.0 v7.1.1 labels May 23, 2019
@DaveCTurner DaveCTurner requested a review from ywelsch May 23, 2019 07:01
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's trappy that a cluster state with no master is returned and would like to see that fixed in TransportClusterStateAction.
We use a TransportMasterNodeAction to make this request end up on the master, and do all kind of checks to make sure this node is still master, for it to fail in the final stage due to a race between master service and cluster (applier) service.
In TransportClusterStateAction.masterOperation we should use the cluster state that is passed to masterOperation as the cluster state to return.

In case where we wait for metadataversion, we should also make sure we don't lose the master (otherwise throw NotMasterException which will bubble it back to TransportMasterNodeAction and its retries facilities)...

@DaveCTurner
Copy link
Contributor Author

Ok, closing in favour of #42454.

@DaveCTurner DaveCTurner deleted the 2019-05-23-specific-master-nodes-npe branch May 23, 2019 17:02
@jpountz
Copy link
Contributor

jpountz commented May 24, 2019

@DaveCTurner Please remember to remove labels when closing unmerged PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes fails
4 participants