Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@slack/socket-mode - Handle event.type === disconnect && reason === 'too_many_websockets' #1654

Closed
1 of 7 tasks
jhankim opened this issue Aug 30, 2023 · 25 comments
Closed
1 of 7 tasks
Labels
auto-triage-skip needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info pkg:socket-mode applies to `@slack/socket-mode` question M-T: User needs support to use the project

Comments

@jhankim
Copy link

jhankim commented Aug 30, 2023

👋 from Slack Customer Success.

Current state of @slack/socket-mode package does not account for event.type === disconnect && reason === 'too_many_websockets', causing unhandled error to bubble up when Slack decides to disconnect existing connection during connecting:handshaking state

Here's a log output - the reconnect happens initially after the app receives event.type === 'disconnect' && reason === 'warning'

Packages:

Select all that apply:

  • @slack/web-api
  • @slack/rtm-api
  • @slack/webhooks
  • @slack/oauth
  • @slack/socket-mode
  • @slack/types
  • I don't know

Requirements

Feature request: have the app catch reason === 'too_many_websockets'

Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.

@filmaj filmaj added needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info pkg:socket-mode applies to `@slack/socket-mode` labels Aug 30, 2023
@filmaj
Copy link
Contributor

filmaj commented Aug 30, 2023

Hello! Thanks for reporting.

What would the expectation be in this scenario? What should @slack/socket-mode do when it encounters this error?

@filmaj filmaj removed the untriaged label Aug 30, 2023
@jhankim
Copy link
Author

jhankim commented Aug 30, 2023

Forgot the log... 😅

2023-08-29T02:17:33.583983751Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "disconnect",
    "reason": "warning",
    "debug_info": {
        "host": "applink-13"
    }
}
2023-08-29T02:17:33.585261952Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received "disconnect" (warning) message - creating the second connection
2023-08-29T02:17:33.585795905Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Cancelled the job waiting for ping from Slack
2023-08-29T02:17:33.586029749Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Terminated the heart beat job
2023-08-29T02:17:33.586265381Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: reconnecting
2023-08-29T02:17:33.586691298Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Reconnecting to Slack ...
2023-08-29T02:17:33.587639859Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting
2023-08-29T02:17:33.587987683Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Going to establish a new connection to Slack ...
2023-08-29T02:17:33.588221708Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticating
2023-08-29T02:17:33.588430306Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Going to retrieve a new WSS URL ...
2023-08-29T02:17:33.588655237Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') start
2023-08-29T02:17:33.590704154Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request url: https://slack.com/api/apps.connections.open
2023-08-29T02:17:33.59093626Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request body: 
{}
2023-08-29T02:17:33.591177851Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request headers: 
{}
2023-08-29T02:17:33.642127756Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http response received
2023-08-29T02:17:33.643022349Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request result: 
{
    "ok": true,
    "url": "wss://wss-primary.slack.com/link/?ticket=xxxx&app_id=xxxx",
    "response_metadata": {
        "scopes": [
            "connections:write"
        ],
        "acceptedScopes": [
            "connections:write"
        ]
    }
}
2023-08-29T02:17:33.643565667Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') end
2023-08-29T02:17:33.643806343Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticated
2023-08-29T02:17:33.666075231Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:handshaking
2023-08-29T02:17:33.672679037Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "disconnect",
    "reason": "too_many_websockets",
    "debug_info": {
        "host": "applink-6"
    }
}
2023-08-29T02:17:33.673997382Z stdout F �[31merror�[39m:[app]:2023-08-29 02:17:33: An unhandled error occurred while Bolt processed (type: undefined, error: TypeError: Cannot read properties of undefined (reading 'event'))
2023-08-29T02:17:33.674576814Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Error details: TypeError: Cannot read properties of undefined (reading 'event'), retry num: undefined, retry reason: undefined
2023-08-29T02:17:33.676869527Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: reconnecting
2023-08-29T02:17:33.676971669Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Reconnecting to Slack ...
2023-08-29T02:17:33.67719043Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting
2023-08-29T02:17:33.67730345Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Going to establish a new connection to Slack ...
2023-08-29T02:17:33.677425795Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticating
2023-08-29T02:17:33.67753144Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Going to retrieve a new WSS URL ...
2023-08-29T02:17:33.677637366Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') start
2023-08-29T02:17:33.678225039Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request url: https://slack.com/api/apps.connections.open
2023-08-29T02:17:33.678302499Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request body: 
{}
2023-08-29T02:17:33.678434963Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request headers: 
{}
2023-08-29T02:17:33.706399893Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http response received
2023-08-29T02:17:33.706740691Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request result: 
{
    "ok": true,
    "url": "wss://wss-primary.slack.com/link/?ticket=xxxx&app_id=xxxx",
    "response_metadata": {
        "scopes": [
            "connections:write"
        ],
        "acceptedScopes": [
            "connections:write"
        ]
    }
}
2023-08-29T02:17:33.706905114Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') end
2023-08-29T02:17:33.707562051Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticated
2023-08-29T02:17:33.726924532Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:handshaking
2023-08-29T02:17:33.735835811Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "disconnect",
    "reason": "too_many_websockets",
    "debug_info": {
        "host": "applink-3"
    }
}
2023-08-29T02:17:33.736517499Z stdout F �[31merror�[39m:[app]:2023-08-29 02:17:33: An unhandled error occurred while Bolt processed (type: undefined, error: TypeError: Cannot read properties of undefined (reading 'event'))
2023-08-29T02:17:33.73664402Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Error details: TypeError: Cannot read properties of undefined (reading 'event'), retry num: undefined, retry reason: undefined
2023-08-29T02:17:33.737626433Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: reconnecting
2023-08-29T02:17:33.737761541Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Reconnecting to Slack ...
2023-08-29T02:17:33.738044197Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting
2023-08-29T02:17:33.738061619Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Going to establish a new connection to Slack ...
2023-08-29T02:17:33.738211885Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticating
2023-08-29T02:17:33.738315455Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Going to retrieve a new WSS URL ...
2023-08-29T02:17:33.738410664Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') start
2023-08-29T02:17:33.73889481Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request url: https://slack.com/api/apps.connections.open
2023-08-29T02:17:33.738986954Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request body: 
{}
2023-08-29T02:17:33.739225569Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request headers: 
{}
2023-08-29T02:17:33.765901421Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http response received
2023-08-29T02:17:33.766275841Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: http request result: 
{
    "ok": true,
    "url": "wss://wss-primary.slack.com/link/?ticket=xxxx&app_id=xxxx",
    "response_metadata": {
        "scopes": [
            "connections:write"
        ],
        "acceptedScopes": [
            "connections:write"
        ]
    }
}
2023-08-29T02:17:33.766585039Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: apiCall('apps.connections.open') end
2023-08-29T02:17:33.766896018Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:authenticated
2023-08-29T02:17:33.788489913Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connecting:handshaking
2023-08-29T02:17:33.797143119Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Received a message on the WebSocket: 
{
    "type": "hello",
    "num_connections": 10,
    "debug_info": {
        "host": "applink-8",
        "build_number": 38,
        "approximate_connection_time": 18060
    },
    "connection_info": {
        "app_id": "XXXX"
    }
}
2023-08-29T02:17:33.79851267Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connected
2023-08-29T02:17:33.798891461Z stdout F �[32minfo�[39m:[app]:2023-08-29 02:17:33: Now connected to Slack
2023-08-29T02:17:33.799282016Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connected:preparing
2023-08-29T02:17:33.799645249Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Switching to the secondary connection ...
2023-08-29T02:17:33.799881795Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Switched to the secondary connection
2023-08-29T02:17:33.800372396Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Terminated the old connection
2023-08-29T02:17:33.800880064Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Started running a new heart beat job
2023-08-29T02:17:33.801246493Z stdout F �[34mdebug�[39m:[app]:2023-08-29 02:17:33: Transitioning to state: connected:ready

@jhankim
Copy link
Author

jhankim commented Aug 30, 2023

The expected behavior would be for the app to catch the error when it receives:

{
  type: 'disconnect',
  reason: 'too_many_websockets'
  ...
}

@filmaj
Copy link
Contributor

filmaj commented Aug 30, 2023

OK, so catch the error and I guess not attempt to re-connect?

@jhankim
Copy link
Author

jhankim commented Aug 30, 2023

@filmaj I think the reconnect behavior is still necessary given the app would enter into a disconnected state forever. It's worth noting that in my app logs, I only see reason: 'too_many_websockets' only as a result of the app trying to reconnect after it receives reason: 'warning'.

@filmaj
Copy link
Contributor

filmaj commented Aug 30, 2023

I believe we only allow 10 concurrent websocket connections per app, so if Slack is returning "too many websockets", it makes sense to me to reduce the amount of reconnect attempts - otherwise the warning/error will continue to be raised, and we would essentially be stampeding on the apps.connection.open API.

That's what I'm trying to tease out asking these questions and having this discussion: what should the proper behaviour be? 🤔

@jhankim
Copy link
Author

jhankim commented Aug 30, 2023

Ah I see - I wasn't thinking too clearly on the implications of max limit imposed by the platform. I agree with your suggestion, the app should catch the error, then enter disconnected state. At this point the app owner can decide further logic (reboot the app via healthcheck, etc)

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

👋 It looks like this issue has been open for 30 days with no activity. We'll mark this as stale for now, and wait 10 days for an update or for further comment before closing this issue out. If you think this issue needs to be prioritized, please comment to get the thread going again! Maintainers also review issues marked as stale on a regular basis and comment or adjust status if the issue needs to be reprioritized.

@github-actions
Copy link

As this issue has been inactive for more than one month, we will be closing it. Thank you to all the participants! If you would like to raise a related issue, please create a new issue which includes your specific details and references this issue number.

@dbudim
Copy link

dbudim commented Oct 25, 2023

any updates? have the same problem

@filmaj
Copy link
Contributor

filmaj commented Oct 25, 2023

Apologies for no updates here and that I disappeared... let me look into this again after all this time. 🙇

@dbudim
Copy link

dbudim commented Oct 25, 2023

Thank you)
Problem is that we have a bot and once per week it goes down due this issue. We replace token and repeat...

@filmaj
Copy link
Contributor

filmaj commented Oct 25, 2023

@dbudim hmm, by changing your token you avoid the issue? That doesn't make sense to me.. which token are you replacing? Your bot token or your application token? Do you have more than one instance of the app running?

@dbudim
Copy link

dbudim commented Oct 25, 2023

@dbudim hmm, by changing your token you avoid the issue? That doesn't make sense to me.. which token are you replacing? Your bot token or your application token? Do you have more than one instance of the app running?

Single instance of app, replacing application token and problem is gone for a week😅

@filmaj
Copy link
Contributor

filmaj commented Oct 25, 2023

@dbudim do the logs look similar to what the original poster posted? As in, first you see a "type": "disconnect","reason": "warning", followed by "type": "disconnect","reason": "too_many_websockets", ?

@dbudim
Copy link

dbudim commented Oct 25, 2023

WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [INFO ] [SocketModeClient:261] - New session is open
[WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [DEBUG] [SocketModeClient:384] - Socket Mode Request:

{
  "type": "disconnect",
  "reason": "too_many_websockets",
  "debug_info": {
    "host": "applink-6"
  }
}

[WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [INFO ] [SocketModeClient:272] - onClose listener is called (code: 1006, reason: )
[WebSocketConnectReadThread-677874] 2023-10-25 09:45:50 [DEBUG] [DetailedLoggingListener:84] -

Yes, same situation

java SDK:

<dependency>
       <groupId>com.slack.api</groupId>
      <artifactId>bolt</artifactId>
      <version>1.34.0</version>
</dependency>
<dependency>
       <groupId>com.slack.api</groupId>
       <artifactId>bolt-socket-mode</artifactId>
       <version>1.34.0</version>
</dependency>

@filmaj
Copy link
Contributor

filmaj commented Oct 25, 2023

Oh, you are using the Java SDK, good to know.

@filmaj
Copy link
Contributor

filmaj commented Oct 25, 2023

I've posted your issue to the java-slack-sdk repo: slackapi/java-slack-sdk#1223

Best to follow that issue along since the code for the node socket-mode library is different from the java bits.

@filmaj
Copy link
Contributor

filmaj commented Jan 25, 2024

@seratch since I am looking at doing a major new release of the socket mode package, do you have any thoughts on this issue? If Slack raises "too many websockets" - what should this program do? I feel like continuing to attempt to reconnect would be creating unnecessary, likely-to-fail connection attempts. Do you have opinions on what the package should do?

@filmaj filmaj added the question M-T: User needs support to use the project label Jan 25, 2024
@filmaj
Copy link
Contributor

filmaj commented Mar 28, 2024

I started to address this in a draft PR: #1762

@seratch do you have opinions on how our socket mode packages should handle "too many websockets" errors, at a high level? The node one does not handle it at all and I believe it can cause problems.

@seratch
Copy link
Member

seratch commented Mar 29, 2024

@filmaj thank you for your detailed investigation of the scenario. I agree handling the 'too_many_websockets' reason pattern in a proper way could be a nice improvement. In reality, the only action SocketModeClient can take is to patiently wait until the Slack server permits a new connection. That being said, modifying the behavior in this manner should certainly be better than currently treating the pattern as a "slack_event" for sure.

@filmaj
Copy link
Contributor

filmaj commented Apr 17, 2024

An RC was released to try to address this:

@filmaj filmaj added this to the socket-mode@1.3.5 milestone Apr 30, 2024
@filmaj
Copy link
Contributor

filmaj commented Apr 30, 2024

The fix has now been released to mainline in 1.3.5 and is out of release candidacy. It will also be present in the soon-to-come 2.0 release of @slack/socket-mode.

@MadrMan
Copy link

MadrMan commented May 4, 2024

@filmaj I seem to be hitting the following now on 1.3.5:

[INFO]  socket-mode:SocketModeClient:0 Going to establish a new connection to Slack ...
/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76
      throw new Error('Unhandled event \'' + event + '\' in state \'' + this.currentState + '\'.');
            ^

Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
    at StateMachine.handleUnhandledEvent (/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76:13)
    at /home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
    at TaskScheduler.execute (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:29:7)
    at TaskScheduler.enqueue (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:19:12)
    at HierarchicalStateMachine.handle (/home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
    at SocketModeClient.onWebSocketMessage (/home/patrick/slack/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
    at WebSocket.onMessage (/home/patrick/slack/node_modules/ws/lib/event-target.js:132:16)
    at WebSocket.emit (node:events:518:28)
    at Receiver.receiverOnMessage (/home/patrick/slack/node_modules/ws/lib/websocket.js:1068:20)
    at Receiver.emit (node:events:518:28)

With debug logging it shows:

[DEBUG]  socket-mode:SocketModeClient:0 Received a message on the WebSocket: {"type":"disconnect","reason":"too_many_websockets","debug_info":{"host":"applink-2"}}
[DEBUG]  socket-mode:SocketModeClient:0 Received "disconnect" (reason: too_many_websockets) message - will attempt reconnect

@filmaj
Copy link
Contributor

filmaj commented May 6, 2024

@MadrMan I filed #1787 to track this.

Worth noting that, while this is a legit problem and we will address it in this package, you should also consider why Slack is telling you you have too many parallel websocket connections open with Slack (10 is the maximum allowed). If you can get the number of parallel websocket connections for this app down under 10, you would mitigate seeing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-triage-skip needs info An issue that is claimed to be a bug and hasn't been reproduced, or otherwise needs more info pkg:socket-mode applies to `@slack/socket-mode` question M-T: User needs support to use the project
Projects
None yet
Development

No branches or pull requests

5 participants