Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in web3.js subscription handling #26665

Closed
yamen opened this issue Jul 18, 2022 · 7 comments · Fixed by #26734
Closed

Infinite loop in web3.js subscription handling #26665

yamen opened this issue Jul 18, 2022 · 7 comments · Fixed by #26734

Comments

@yamen
Copy link

yamen commented Jul 18, 2022

I have a situation where I have a handful of subscriptions, and my WebSocket temporarily disconnects. This will sometimes cause an infinite loop with 300k errors in the logs of the type:

slotSubscribe error for argument [] WebSocket is not open: readyState 0 (CONNECTING)

Before an eventually out of memory error and process crash.

From looking at the code, there seems to be a potential edge case race condition. Namely:

  • _updateSubscriptions() is called when there are no active subscriptions and an active connection.
  • This flags _rpcWebSocketConnected = false and starts a 500ms timer to actually disconnect the WebSocket:

this._rpcWebSocketConnected = false;
this._rpcWebSocketIdleTimeout = setTimeout(() => {
this._rpcWebSocketIdleTimeout = null;
try {
this._rpcWebSocket.close();
} catch (err) {
// swallow error if socket has already been closed.
if (err instanceof Error) {
console.log(
`Error when closing socket connection: ${err.message}`,
);
}
}
}, 500);

  • Before the timer expires, the WebSocket disconnects (network issue), and new subscriptions are made. _updateSubscriptions() is called again. Now there are active subscriptions and no active connection.
  • However, as there is an active disconnect timer, the connection is flagged as active _rpcWebSocketConnected = true:

if (this._rpcWebSocketIdleTimeout !== null) {
clearTimeout(this._rpcWebSocketIdleTimeout);
this._rpcWebSocketIdleTimeout = null;
this._rpcWebSocketConnected = true;
}

  • This leads to an infinite loop in _updateSubscriptions() as the normal guards to stop this from happening have been bypassed:
    • _rpcWebSocketConnected = true even though we're not really connected, which stops the function from returning early
    • _rpcWebSocketGeneration doesn't get updated as there isn't a new close event since entering the function

Now the WebSocket calls fail (hence the error), but the failure leads to an infinite recursion of trying again and again on loop.

I think this is what's happening, very hard to debug. Using a different flag for a 'pending disconnect' scenario will likely fix this.

@steveluscher
Copy link
Contributor

Before the timer expires, the WebSocket disconnects (network issue), and new subscriptions are made. _updateSubscriptions() is called again. Now there are active subscriptions and no active connection.

If _updateSubscriptions() is called because a new subscription is made, the disconnect timer gets cancelled (see clearTimeout(this._rpcWebSocketIdleTimeout)).

I'm almost positive that this is because of corruption in rpc-websockets – the library that we depend on to track the state of the websocket connection.

Tracked in #25578.

@steveluscher
Copy link
Contributor

Oh, I see what you're saying. Yeah, that's a leak. Not sure if it's the one that's causing your problem, but a leak.

@steveluscher steveluscher reopened this Jul 22, 2022
@yamen
Copy link
Author

yamen commented Jul 22, 2022 via email

@steveluscher
Copy link
Contributor

Can you try out #26734, @yamen?

@steveluscher
Copy link
Contributor

I landed this change. If you have time to update to the version after 1.50.0 and retest your app, @yamen, I'd love to hear if this fixed things for you.

@yamen
Copy link
Author

yamen commented Jul 25, 2022

Thanks Steve I will check in a week or so, temporary workaround is stable for now so will wait for regular dependency update chore.

@laterbreh
Copy link

laterbreh commented Aug 13, 2022

We are currently on 1.52 (https://github.com/solana-labs/solana-web3.js/releases/tag/v1.52.0) and we are still experiencing this issue.
(wallet address ommited)
accountSubscribe error for argument [ 'XXXXXXXXXXXXXXXXXXXX', { encoding: 'base64', commitment: 'finalized' }

We never get more than 3 to 5 subscriptions. We actively track and unsubscribe from the wallets every X minutes. When this occurs, we hit 99% cpu usage and our gateway locks up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants