Avoid zen pinging threads to pile up #19719

tlrx · 2016-08-01T13:15:21Z

The Unicast Zen Ping service pings all known nodes every 3 seconds using a light connecting method. For nodes defined in the configuration as unicast hosts and not yet "found by address" (meaning that a successful connection has never been established to them) the node is added to a list of nodes to disconnect once the ping is terminated whatever the result of the ping. The round of pings is executed until a master is elected, but if no master can be elected (because of min master nodes or in case of a tribe client node with an offline remote cluster) the pings are executed over and over.

The thing is that nodes are pinged every 3s but the connection timeout is configured by default to 30s. This leads to a situation where many threads are created and added to the generic thread pool in order to disconnect from the node but the disconnect method TcpTransport.disconnectFromNode(DiscoveryNode node) blindly tries to acquire a lock on the node even if it will be impossible to disconnect from it (because node is not reachable). So disconnecting threads are stacked at the rate of 1 every 3sec until the generic thread pool is full.

Adding a check in the TcpTransport.disconnectFromNode(DiscoveryNode node) similar to the check done indisconnectFromNode(DiscoveryNode node, Channel channel, String reason) avoids threads to block for nothing.

We could also use a connection timeout of 3s when pinging nodes as it would help to fail connection faster and it would keep the number of blocking threads lower but would not resolve the main issue of threads blocking for nothing.

This settings can be used to reproduce the issue (check number of threads of generic thread pool):

tribe.t1.cluster.name: "offline"
tribe.t1.discovery.zen.ping.unicast.hosts:
- '10.10.10.10'

or

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
- '10.10.10.10'

closes #19370

I think we have the same issue in 2.x. in NettyTransport

tlrx · 2016-08-01T13:16:06Z

Oh, I did not come up with a good test for this so if anyone has an idea :)

tlrx · 2016-08-01T13:17:09Z

@bleskes or @jasontedor can anyone of you have a look at this? That would be great, thanks!

bleskes · 2016-08-03T10:07:00Z

core/src/main/java/org/elasticsearch/transport/TcpTransport.java

-                    logger.trace("disconnected from [{}] due to explicit disconnect call", node);
-                    transportServiceAdapter.raiseNodeDisconnected(node);
+        // this might be called multiple times, so do a lightweight check outside of the lock
+        NodeChannels nodeChannels = connectedNodes.get(node);


we can not check out of lock. The problem is that we need to make sure that we have a stricit linearization of connection operation. If you call disconnect and it succeed you know you are disconnected and no ongoing work from before will end up reconnecting you (not how we only add the connected node to the map after a successful connection). I think the right solution here is to add timeouts to the connections done from the pings? maybe an easy way is to have a different connection timeout for "light" connections then we do for normal ones.

bleskes · 2016-08-03T10:12:53Z

Thx @tlrx . I left a comment. Regarding testing - you can maybe use MockTransportService and add a connection rules that waits some time before throwing an exception. Another alternative is to implement a MockTcpTransport that behaves as you want. Maybe @jasontedor has a better idea.

The Unicast Zen Ping service pings all known nodes every 3 seconds using a light connecting method. For nodes defined in the configuration as unicast hosts and not yet "found by address" (meaning that a successful connection has never been established to them) the node is added to a list of nodes to disconnect once the ping is terminated whatever the result of the ping. The round of pings is executed until a master is elected, but if no master can be elected (because of min master nodes or in case of a tribe client node with an offline remote cluster) the pings are executed over and over. The thing is that nodes are pinged every 3s but the connection timeout is configured by default to 30s. This leads to a situation where many threads are created and added to the generic thread pool in order to disconnect from the node but the disconnect method TcpTransport.disconnectFromNode(DiscoveryNode node) blindly tries to acquire a lock on the node even if it will be impossible to disconnect from it (because node is not reachable). So disconnecting threads are stacked at the rate of 1 every 3sec until the generic thread pool is full. Adding a check in the TcpTransport.disconnectFromNode(DiscoveryNode node) similar to the check done in disconnectFromNode(DiscoveryNode node, Channel channel, String reason) helps to avoid threads to block for nothing. We could also use a connection timeout of 3s when pinging nodes. This would help to fail connection faster and it would keep the number of blocking threads lower but would not resolve the main issue of threads blocking for nothing. This settings can be used to reproduce the issue (check number of threads of generic thread pool): tribe.t1.cluster.name: "offline" tribe.t1.discovery.zen.ping.unicast.hosts: - '10.10.10.10' or discovery.zen.minimum_master_nodes: 2 discovery.zen.ping.unicast.hosts: - '10.10.10.10'

tlrx · 2016-08-24T10:30:55Z

@bleskes Thanks for your comments.

we can not check out of lock. The problem is that we need to make sure that we have a stricit linearization of connection operation.

I thought about this again and I agree the "fix" I proposed is not the right thing to do. Like I said in the description of this PR, the threads are piling up again and again because we try to disconnect from a node even if we never succeed to connect to it and that does not make sense.

I think the right solution here is to add timeouts to the connections done from the pings? maybe an easy way is to have a different connection timeout for "light" connections then we do for normal ones.

That may to fail pings and disconnectings sooner but it won't fix the origin of the issue: we try to disconnect from nodes we never connect to. It seems like a waste of resources to me.

I changed the fix to only disconnect from node we successfully connected to (in light mode) and added a test. Please let me know what you think about this change.

clintongormley · 2016-10-18T08:20:38Z

@bleskes waiting for your review

bleskes · 2016-10-19T10:15:45Z

@tlrx and I discussed this. While this is a good change, there is still an issue where slow pinging (due to connection timeouts) can cause thread queues to fill up. @tlrx is evaluating the scope of the issue.

…ect timeouts Timeouts are global today across all connections this commit allows to specify a connection timeout per node such that depending on the context connections can be established with different timeouts. Relates to elastic#19719

…ect timeouts (#21847) Timeouts are global today across all connections this commit allows to specify a connection timeout per node such that depending on the context connections can be established with different timeouts. Relates to #19719

tlrx · 2017-01-05T08:41:57Z

Superseded by #22277

tlrx added >bug review v5.0.0-beta1 labels Aug 1, 2016

bleskes reviewed Aug 3, 2016
View reviewed changes

tlrx mentioned this pull request Aug 5, 2016

Thread leak in TribeNode when a cluster is offline #19370

Closed

tlrx added 3 commits August 24, 2016 11:03

Revert changes in TcpTransport.java

55484d5

UnicastZenPing: Do not try to disconnect from unconnected nodes

1994c01

tlrx force-pushed the fix-19370 branch from 81dd93a to 1994c01 Compare August 24, 2016 09:03

clintongormley added v5.0.0 and removed v5.0.0-beta1 labels Sep 14, 2016

clintongormley added v5.1.1 and removed v5.0.0 labels Oct 18, 2016

tlrx mentioned this pull request Nov 25, 2016

Remove connectToNodeLight and replace it with a connection profile #21799

Merged

s1monw mentioned this pull request Nov 29, 2016

Add a connect timeout to the ConnectionProfile to allow per node connect timeouts #21847

Merged

clintongormley added v5.2.0 and removed v5.1.1 labels Dec 7, 2016

tlrx closed this Jan 5, 2017

clintongormley removed the v5.2.0 label Jan 10, 2017

tlrx deleted the fix-19370 branch January 27, 2017 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid zen pinging threads to pile up #19719

Avoid zen pinging threads to pile up #19719

tlrx commented Aug 1, 2016

tlrx commented Aug 1, 2016

tlrx commented Aug 1, 2016

bleskes Aug 3, 2016

bleskes commented Aug 3, 2016

tlrx commented Aug 24, 2016

clintongormley commented Oct 18, 2016

bleskes commented Oct 19, 2016

tlrx commented Jan 5, 2017

Avoid zen pinging threads to pile up #19719

Avoid zen pinging threads to pile up #19719

Conversation

tlrx commented Aug 1, 2016

tlrx commented Aug 1, 2016

tlrx commented Aug 1, 2016

bleskes Aug 3, 2016

Choose a reason for hiding this comment

bleskes commented Aug 3, 2016

tlrx commented Aug 24, 2016

clintongormley commented Oct 18, 2016

bleskes commented Oct 19, 2016

tlrx commented Jan 5, 2017