-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid zen pinging threads to pile up #19719
Conversation
Oh, I did not come up with a good test for this so if anyone has an idea :) |
@bleskes or @jasontedor can anyone of you have a look at this? That would be great, thanks! |
logger.trace("disconnected from [{}] due to explicit disconnect call", node); | ||
transportServiceAdapter.raiseNodeDisconnected(node); | ||
// this might be called multiple times, so do a lightweight check outside of the lock | ||
NodeChannels nodeChannels = connectedNodes.get(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can not check out of lock. The problem is that we need to make sure that we have a stricit linearization of connection operation. If you call disconnect and it succeed you know you are disconnected and no ongoing work from before will end up reconnecting you (not how we only add the connected node to the map after a successful connection). I think the right solution here is to add timeouts to the connections done from the pings? maybe an easy way is to have a different connection timeout for "light" connections then we do for normal ones.
Thx @tlrx . I left a comment. Regarding testing - you can maybe use MockTransportService and add a connection rules that waits some time before throwing an exception. Another alternative is to implement a MockTcpTransport that behaves as you want. Maybe @jasontedor has a better idea. |
The Unicast Zen Ping service pings all known nodes every 3 seconds using a light connecting method. For nodes defined in the configuration as unicast hosts and not yet "found by address" (meaning that a successful connection has never been established to them) the node is added to a list of nodes to disconnect once the ping is terminated whatever the result of the ping. The round of pings is executed until a master is elected, but if no master can be elected (because of min master nodes or in case of a tribe client node with an offline remote cluster) the pings are executed over and over. The thing is that nodes are pinged every 3s but the connection timeout is configured by default to 30s. This leads to a situation where many threads are created and added to the generic thread pool in order to disconnect from the node but the disconnect method TcpTransport.disconnectFromNode(DiscoveryNode node) blindly tries to acquire a lock on the node even if it will be impossible to disconnect from it (because node is not reachable). So disconnecting threads are stacked at the rate of 1 every 3sec until the generic thread pool is full. Adding a check in the TcpTransport.disconnectFromNode(DiscoveryNode node) similar to the check done in disconnectFromNode(DiscoveryNode node, Channel channel, String reason) helps to avoid threads to block for nothing. We could also use a connection timeout of 3s when pinging nodes. This would help to fail connection faster and it would keep the number of blocking threads lower but would not resolve the main issue of threads blocking for nothing. This settings can be used to reproduce the issue (check number of threads of generic thread pool): tribe.t1.cluster.name: "offline" tribe.t1.discovery.zen.ping.unicast.hosts: - '10.10.10.10' or discovery.zen.minimum_master_nodes: 2 discovery.zen.ping.unicast.hosts: - '10.10.10.10'
@bleskes Thanks for your comments.
I thought about this again and I agree the "fix" I proposed is not the right thing to do. Like I said in the description of this PR, the threads are piling up again and again because we try to disconnect from a node even if we never succeed to connect to it and that does not make sense.
That may to fail pings and disconnectings sooner but it won't fix the origin of the issue: we try to disconnect from nodes we never connect to. It seems like a waste of resources to me. I changed the fix to only disconnect from node we successfully connected to (in light mode) and added a test. Please let me know what you think about this change. |
@bleskes waiting for your review |
…ect timeouts Timeouts are global today across all connections this commit allows to specify a connection timeout per node such that depending on the context connections can be established with different timeouts. Relates to elastic#19719
Superseded by #22277 |
The Unicast Zen Ping service pings all known nodes every 3 seconds using a light connecting method. For nodes defined in the configuration as unicast hosts and not yet "found by address" (meaning that a successful connection has never been established to them) the node is added to a list of nodes to disconnect once the ping is terminated whatever the result of the ping. The round of pings is executed until a master is elected, but if no master can be elected (because of min master nodes or in case of a tribe client node with an offline remote cluster) the pings are executed over and over.
The thing is that nodes are pinged every
3s
but the connection timeout is configured by default to30s
. This leads to a situation where many threads are created and added to the generic thread pool in order to disconnect from the node but the disconnect methodTcpTransport.disconnectFromNode(DiscoveryNode node)
blindly tries to acquire a lock on the node even if it will be impossible to disconnect from it (because node is not reachable). So disconnecting threads are stacked at the rate of 1 every 3sec until the generic thread pool is full.Adding a check in the
TcpTransport.disconnectFromNode(DiscoveryNode node)
similar to the check done indisconnectFromNode(DiscoveryNode node, Channel channel, String reason)
avoids threads to block for nothing.We could also use a connection timeout of 3s when pinging nodes as it would help to fail connection faster and it would keep the number of blocking threads lower but would not resolve the main issue of threads blocking for nothing.
This settings can be used to reproduce the issue (check number of threads of generic thread pool):
or
closes #19370
I think we have the same issue in 2.x. in
NettyTransport