Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retry count and error state for stateful subscription. #27833

Closed
wants to merge 4 commits into from
Closed

Implement retry count and error state for stateful subscription. #27833

wants to merge 4 commits into from

Conversation

Jac0xb
Copy link

@Jac0xb Jac0xb commented Sep 16, 2022

Problem

Running into infinite recursion bug defined in solana-labs/solana-web3.js#1106, in environment in node v16, web3.js 1.56.2. Causes lambda to stall and timeout.

Error example:

 signatureSubscribe error for argument [
'{transactionSignature}',
{ commitment: 'confirmed' }
] socket not ready

Summary of Changes

Implement error state type which tracks retries and logs error + deletes subscription when max retries hit for stateful subscription.

Fixes solana-labs/solana-web3.js#1106

@mergify mergify bot added the community Community contribution label Sep 16, 2022
@mergify mergify bot requested a review from a team September 16, 2022 02:10
@Jac0xb
Copy link
Author

Jac0xb commented Sep 16, 2022

@steveluscher i think this is the easiest path forward on the issue.

@xrzhuang
Copy link

very nice

Copy link
Contributor

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for starting this work.

There's a lot to love about this PR, but also a ton of work left, if you're up for it. The addition of new states that a subscription can be in explodes the complexity of the state machine quadratically, and all of those cases have to have coverage in code and in tests. Actually, we probably need to add two states: errored and retry-exhausted.

Some notes to get things started:

  • Missed a spot at connection.ts:5183; another juncture to catch errors and perform retries
  • Need coverage in connection-subscriptions.test.ts that cover every state transition possible
    • websocket closes implicitly while in an error state
    • websocket is explicitly closed while in an error state
    • websocket reopens with subscriptions still in the error state
    • probably way more
  • The cases that I've seen so far report extremely rapid error rates. In that context, retrying 3 times is probably essentially the same as failing right away. What you might consider instead is a system of exponential backoff – where the delays between retries become longer the more you make.

All of that said, every error report anyone's ever sent to me includes the message ‘socket not open.’ That tells me that our socket open/close tracking is most of the problem here (#25578). I'm sort of inclined to see that fixed first, before implementing retries. If there's a Real Bug with connection tracking, then all retries will do is to mask that problem, without actually making anyone's experience better.

web3.js/src/connection.ts Outdated Show resolved Hide resolved
Comment on lines +5124 to +5130
console.error(
`${method} error, max retries reached`,
args,
e instanceof Error ? e.message : `${e}`,
);

delete this._subscriptionsByHash[hash];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we don't have here yet is a recovery strategy. It's enough to drop the subscription to stop the recursive setup attempts, but then at that point you have an application that thinks it has an open subscription, but doesn't.

What's the plan here?

  • Create an API to notify apps their subscriptions are dead?
  • Revive errored subscriptions at some logical juncture in the future?

@steveluscher steveluscher added the javascript Pull requests that update Javascript code label Sep 17, 2022
@steveluscher
Copy link
Contributor

Quite possibly related: #27859.

Co-authored-by: Steven Luscher <steveluscher@users.noreply.github.com>
@mergify mergify bot dismissed steveluscher’s stale review September 22, 2022 17:45

Pull request has been modified.

@steveluscher steveluscher added the web3.js Related to the JavaScript client label Dec 5, 2022
@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 29, 2022
@github-actions github-actions bot closed this Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution javascript Pull requests that update Javascript code stale [bot only] Added to stale content; results in auto-close after a week. web3.js Related to the JavaScript client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[web3.js] Infinite recursion caused by _updateSubscriptions
3 participants