-
Notifications
You must be signed in to change notification settings - Fork 40
Massive leak of active handles #141
Comments
It may be connections/sockets that are closed but still referenced somewhere and not gc'd. |
I've run some local tests, and once you destroy a socket it will no longer appear in the active handles. Even if you keep a strong reference to it. I've extended
|
@dapplion what version of Node is this running on? It's interesting that this is fairly stable and then starts quickly leaking handles. You said this is running on CI? I'm wondering if there is a network event happening that's causing TCP to freak out. Is the timing (time of day) at all predictable? |
@jacobheun It's running on NodeJS v12.13.1 I haven't seen a correlation with time. Note tho that it's a very slow leak, so I don't have a ton of data. |
I am trying out to run a jsipfs daemon with the DHT random walk enabled to force some connections and see if I can replicate. Also using wtfnode. Will report back later today |
I have kept a node running DHT queries over the afternoon using Sometimes I get something like:
and in rare occasions I actually get 2-3 occurences:
I run a I couldn't get to any weird states by playing with |
I have not been able to replicate the leak yet. I'm currently using a minimal configured libp2p node running Gossipsub, with a low cap (10) on max connections. The strategies I have been using thus far are:
Memory and handlers have been stable so far. |
I've kept running various nodes at different versions to pin-point when the leak starts happening and the results are very inconclusive. The leak happens arbitrarily without correlation with network, resources, and version. I'll keep working on it. |
Perhaps a race condition that does not get into the the close event. For example, if two close happens and timeout, the first will destroy and the second does not destroy as it is destroyed already. The close event listener will be kept there. I think we should just have a |
If the socket is already destroyed how can they keep accumulating? The metrics clearly show that the number of active non-destroyed sockets increases. If destroyed multiple times they may leak handlers and events but not sockets. |
I have seen the metrics for handlers, not for leaking sockets. So, you mean there are tcp sockets hanging? |
Yes! That's what I've referred from the start. Sorry if I haven't been clear. The the wtfnode reports above |
Bump, issue still active in Lodestar nodes ChainSafe/lodestar#3526 |
Looks like the retained socket is the one that was aborted when libp2p DialRequest has multiple address, it'd keep only once and abort all remaining Line 71 in 1faa587
|
This has been verified as fixed, please re-open if it re-occurs. |
Testing Lodestar in a powerful CI we've observed a massive leak of active handles originating at our p2p port (9000). Lodestar uses only the TCP transport, thus I'm opening the issue here.
The peer count however is stable at 25, and running wtfnode in that situation returns ~600 - 25 lines with:
From wtfnode source and NodeJS net docs that indicates sockets in an intermediate not-connected state. My guess is that TCP does not handle some edge cases and sockets are left hanging.
Have you experienced something like this in your application? This is a big issue for Lodestar since we are very resource constraint.
CC @wemeetagain
The text was updated successfully, but these errors were encountered: