Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid zen pinging threads to pile up #19719

Closed
wants to merge 3 commits into from
Closed

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Aug 1, 2016

The Unicast Zen Ping service pings all known nodes every 3 seconds using a light connecting method. For nodes defined in the configuration as unicast hosts and not yet "found by address" (meaning that a successful connection has never been established to them) the node is added to a list of nodes to disconnect once the ping is terminated whatever the result of the ping. The round of pings is executed until a master is elected, but if no master can be elected (because of min master nodes or in case of a tribe client node with an offline remote cluster) the pings are executed over and over.

The thing is that nodes are pinged every 3s but the connection timeout is configured by default to 30s. This leads to a situation where many threads are created and added to the generic thread pool in order to disconnect from the node but the disconnect method TcpTransport.disconnectFromNode(DiscoveryNode node) blindly tries to acquire a lock on the node even if it will be impossible to disconnect from it (because node is not reachable).  So disconnecting threads are stacked at the rate of 1 every 3sec until the generic thread pool is full.

Adding a check in the TcpTransport.disconnectFromNode(DiscoveryNode node) similar to the check done indisconnectFromNode(DiscoveryNode node, Channel channel, String reason) avoids threads to block for nothing.

We could also use a connection timeout of 3s when pinging nodes as it would help to fail connection faster and it would keep the number of blocking threads lower but would not resolve the main issue of threads blocking for nothing.

This settings can be used to reproduce the issue (check number of threads of generic thread pool):

tribe.t1.cluster.name: "offline"
tribe.t1.discovery.zen.ping.unicast.hosts:
- '10.10.10.10'

or

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
- '10.10.10.10'

closes #19370

I think we have the same issue in 2.x. in NettyTransport

@tlrx
Copy link
Member Author

tlrx commented Aug 1, 2016

Oh, I did not come up with a good test for this so if anyone has an idea :)

@tlrx
Copy link
Member Author

tlrx commented Aug 1, 2016

@bleskes or @jasontedor can anyone of you have a look at this? That would be great, thanks!

logger.trace("disconnected from [{}] due to explicit disconnect call", node);
transportServiceAdapter.raiseNodeDisconnected(node);
// this might be called multiple times, so do a lightweight check outside of the lock
NodeChannels nodeChannels = connectedNodes.get(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can not check out of lock. The problem is that we need to make sure that we have a stricit linearization of connection operation. If you call disconnect and it succeed you know you are disconnected and no ongoing work from before will end up reconnecting you (not how we only add the connected node to the map after a successful connection). I think the right solution here is to add timeouts to the connections done from the pings? maybe an easy way is to have a different connection timeout for "light" connections then we do for normal ones.

@bleskes
Copy link
Contributor

bleskes commented Aug 3, 2016

Thx @tlrx . I left a comment. Regarding testing - you can maybe use MockTransportService and add a connection rules that waits some time before throwing an exception. Another alternative is to implement a MockTcpTransport that behaves as you want. Maybe @jasontedor has a better idea.

tlrx added 3 commits August 24, 2016 11:03
The Unicast Zen Ping service pings all known nodes every 3 seconds using a light connecting method. For nodes defined in the configuration as unicast hosts and not yet "found by address" (meaning that a successful connection has never been established to them) the node is added to a list of nodes to disconnect once the ping is terminated whatever the result of the ping. The round of pings is executed until a master is elected, but if no master can be elected (because of min master nodes or in case of a tribe client node with an offline remote cluster) the pings are executed over and over.

The thing is that nodes are pinged every 3s but the connection timeout is configured by default to 30s. This leads to a situation where many threads are created and added to the generic thread pool in order to disconnect from the node but the disconnect method TcpTransport.disconnectFromNode(DiscoveryNode node) blindly tries to acquire a lock on the node even if it will be impossible to disconnect from it (because node is not reachable).  So disconnecting threads are stacked at the rate of 1 every 3sec until the generic thread pool is full.

Adding a check in the TcpTransport.disconnectFromNode(DiscoveryNode node) similar to the check done in disconnectFromNode(DiscoveryNode node, Channel channel, String reason) helps to avoid threads to block for nothing.

We could also use a connection timeout of 3s when pinging nodes. This would help to fail connection faster and it would keep the number of blocking threads lower but would not resolve the main issue of threads blocking for nothing.

This settings can be used to reproduce the issue (check number of threads of generic thread pool):

tribe.t1.cluster.name: "offline"
tribe.t1.discovery.zen.ping.unicast.hosts:
- '10.10.10.10'

or

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
- '10.10.10.10'
@tlrx
Copy link
Member Author

tlrx commented Aug 24, 2016

@bleskes Thanks for your comments.

we can not check out of lock. The problem is that we need to make sure that we have a stricit linearization of connection operation.

I thought about this again and I agree the "fix" I proposed is not the right thing to do. Like I said in the description of this PR, the threads are piling up again and again because we try to disconnect from a node even if we never succeed to connect to it and that does not make sense.

I think the right solution here is to add timeouts to the connections done from the pings? maybe an easy way is to have a different connection timeout for "light" connections then we do for normal ones.

That may to fail pings and disconnectings sooner but it won't fix the origin of the issue: we try to disconnect from nodes we never connect to. It seems like a waste of resources to me.

I changed the fix to only disconnect from node we successfully connected to (in light mode) and added a test. Please let me know what you think about this change.

@clintongormley
Copy link
Contributor

@bleskes waiting for your review

@bleskes
Copy link
Contributor

bleskes commented Oct 19, 2016

@tlrx and I discussed this. While this is a good change, there is still an issue where slow pinging (due to connection timeouts) can cause thread queues to fill up. @tlrx is evaluating the scope of the issue.

s1monw added a commit to s1monw/elasticsearch that referenced this pull request Nov 29, 2016
…ect timeouts

Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.

Relates to elastic#19719
s1monw added a commit that referenced this pull request Dec 1, 2016
…ect timeouts (#21847)

Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.

Relates to #19719
s1monw added a commit that referenced this pull request Dec 1, 2016
…ect timeouts (#21847)

Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.

Relates to #19719
@tlrx
Copy link
Member Author

tlrx commented Jan 5, 2017

Superseded by #22277

@tlrx tlrx closed this Jan 5, 2017
@tlrx tlrx deleted the fix-19370 branch January 27, 2017 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thread leak in TribeNode when a cluster is offline
3 participants