-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add infrastructure to manage network connections outside of Transport/TransportService #22194
Conversation
UnicastZenPing today establishes real connections to nodes during it's ping phase that can be used by other parts of the system. Yet, this is potentially dangerous and undesirable unless the nodes have been fully verified and shoudl be connected to in the case of a cluster state update or if we join a newly elected master. For zen-pings this change adds dedicated connections that are only kept alive while pings are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Great To Me (given the current structure of the class). I left one question. Another idea I had is to strengthen the UnicastZenTests
test by using a transport that explodes if one calls the connectToNode
methods?
@@ -508,7 +520,9 @@ public void run() { | |||
} | |||
}); | |||
} else { | |||
sendPingRequestToNode(sendPingsHandler.id(), timeout, pingRequest, latch, node, nodeToSend); | |||
final DiscoveryNode finalNodeToSend = nodeToSend; | |||
sendPingRequestToNode(() -> transportService.getConnection(finalNodeToSend), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - why not just get the connection from the transport service right here? does the supplier buy as something? maybe we can even try to get it first and create a new temporary only if it fails
Transport.Connection existingConnection = null;
if (transportService.nodeConnected(nodeToSend)) {
try {
existingConnection = transportService.getConnection(nodeToSend);
} catch (NodeNotConnectedException e) {
// it's ok we'll create a temp connection
}
}
if (existingConnection != null) {
sendPingRequestToNode(existingConnection, sendPingsHandler.id(), timeout, pingRequest, latch, node, nodeToSend);
} else {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yet, it buys us to not duplicate the exception handling.
@elasticmachine test this please |
logger.trace("[{}] connecting (light) to {}", sendPingsHandler.id(), finalNodeToSend); | ||
transportService.connectToNodeAndHandshake(finalNodeToSend, timeout.getMillis()); | ||
Transport.Connection connection; | ||
if (nodeFoundByAddress && transportService.nodeConnected(finalNodeToSend)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this returns true? This code is only executed if further up !transportService.nodeConnected(nodeToSend)
holds (or we created a temporary disco node which has a fresh id and is never connected).
I also wonder if we should use a temporary disco-node when nodeFoundByAddress == true
and transportService.nodeConnected(nodeToSend) == false
.
Last, regarding the existing comment on line 454
// if we find on the disco nodes a matching node by address, we are going to restore the connection
// anyhow down the line if its not connected...
Now that we are not reestablishing the full connection to a node anymore that's in the CS, I wonder what this will do when the node gets disconnected but not removed from the cluster state. When pinging completes (with a temporary connection) and the node stays in the cluster state that is published by the master, then the master will not reestablish the connection in ClusterService (as the node has not been removed on the CS and connectToNodes is only called on the newly added nodes).
Let's talk about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywelsch these are valid points. Seeing the PR already made me realize it's time to re-write this class as it has grown too tangled. I hoped I can do it after @s1monw pushed this PR in, but I'm not sure it's worth it anymore to push through and fix all of this.
@s1monw how about you push just the transport layer changes and I will use them in a follow up PR to clean up this class?
Re node connections -
You are right too that this is not good and not optimal. Since the nodes was never removed from the cluster state, it was also never removed from NodeConnectionService
(I think). IMO this is just too complex (and there is nodeFD and node joins that play a role here too, on the master). I suggest we change NodeConnectionService
to always work on the ClusterState.nodes()
directly rather than trying to be overly heroic and use the deltas. This will give a hard validation of every CS. This should be very cheep tests. We can do it as another follow - potentially before we do the unicast zen ping work.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will just keep the transport layer changes here...
@bleskes i rolled back the changes to |
Thx. Still LGTM :) |
…/TransportService (#22194) Some expert users like UnicastZenPing today establishes real connections to nodes during it's ping phase that can be used by other parts of the system. Yet, this is potentially dangerous and undesirable unless the nodes have been fully verified and should be connected to in the case of a cluster state update or if we join a newly elected master. For use-cases like this, this change adds the infrastructure to manually handle connections that are not publicly available on the node ie. should not be managed by `Transport`/`TransportSerivce`
The `UnicastZenPing` shows it's age and is the result of many small changes. The current state of affairs is confusing and is hard to reason about. This PR cleans it up (while following the same original intentions). Highlights of the changes are: 1) Clear 3 round flow - no interleaving of scheduling. 2) The previous implementation did a best effort attempt to wait for ongoing pings to be sent and completed. The pings were guaranteed to complete because each used the total ping duration as a timeout. This did make it hard to reason about the total ping duration and the flow of the code. All of this is removed now and ping should just complete within the given duration or not be counted (note that it was very handy for testing, but I move the needed sync logic to the test). 3) Because of (2) the pinging scheduling changed a bit, to give a chance for the last round to complete. We now ping at the beginning, 1/3 and 2/3 of the duration. 4) To offset for (3) a bit, incoming ping requests are now added to on going ping collections. 5) UnicastZenPing never establishes full blown connections (but does reuse them if there). Relates to #22120 6) Discovery host providers are only used once per pinging round. Closes #21739 7) Usage of the ability to open a connection without connecting to a node ( #22194 ) and shorter connection timeouts helps with connections piling up. Closes #19370 8) Beefed up testing and sped them up. 9) removed light profile from production code
The `UnicastZenPing` shows it's age and is the result of many small changes. The current state of affairs is confusing and is hard to reason about. This PR cleans it up (while following the same original intentions). Highlights of the changes are: 1) Clear 3 round flow - no interleaving of scheduling. 2) The previous implementation did a best effort attempt to wait for ongoing pings to be sent and completed. The pings were guaranteed to complete because each used the total ping duration as a timeout. This did make it hard to reason about the total ping duration and the flow of the code. All of this is removed now and ping should just complete within the given duration or not be counted (note that it was very handy for testing, but I move the needed sync logic to the test). 3) Because of (2) the pinging scheduling changed a bit, to give a chance for the last round to complete. We now ping at the beginning, 1/3 and 2/3 of the duration. 4) To offset for (3) a bit, incoming ping requests are now added to on going ping collections. 5) UnicastZenPing never establishes full blown connections (but does reuse them if there). Relates to #22120 6) Discovery host providers are only used once per pinging round. Closes #21739 7) Usage of the ability to open a connection without connecting to a node ( #22194 ) and shorter connection timeouts helps with connections piling up. Closes #19370 8) Beefed up testing and sped them up. 9) removed light profile from production code
#22194 gave us the ability to open low level temporary connections to remote node based on their address. With this use case out of the way, actual full blown connections should validate the node on the other side, making sure we speak to who we think we speak to. This helps in case where multiple nodes are started on the same host and a quick node restart causes them to swap addresses, which in turn can cause confusion down the road.
#22194 gave us the ability to open low level temporary connections to remote node based on their address. With this use case out of the way, actual full blown connections should validate the node on the other side, making sure we speak to who we think we speak to. This helps in case where multiple nodes are started on the same host and a quick node restart causes them to swap addresses, which in turn can cause confusion down the road.
Some expert users like UnicastZenPing today establishes real connections to nodes during it's ping
phase that can be used by other parts of the system. Yet, this is potentially dangerous
and undesirable unless the nodes have been fully verified and should be connected to in the
case of a cluster state update or if we join a newly elected master. For usecases like this, this change adds the infrastructure to manually handle connections that are not publicly available on the node ie. should not be managed by Transport/TransportSerivce