Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError Cannot read properties of null (reading 'callback') when receiving subscribe response with no inflight subscribe #605

Closed
alecgibson opened this issue Apr 18, 2023 · 4 comments · Fixed by #607

Comments

@alecgibson
Copy link
Collaborator

We're getting this error:

app rollbar com_a_Reedsy_fix_item_Reedsy-Editor_10814_occurrence_322889472575

The error is happening because a subscribe response is being received over the WebSocket, but there's no inflightSubscribe set.

I've rummaged around in the code (which I wrote 😅 🙈 ) and I'm not really sure how we get into this state.

As far as I can tell:

@alecgibson
Copy link
Collaborator Author

Okay, I think I'm getting somewhere with this. The issue appears to be to do with our reconnecting websocket. It looks like this series of events is happening:

  1. Use reconnecting socket to connect to ShareDB
  2. Wait until connection fully established
  3. Call socket.close()
  4. SYNCHRONOUSLY subscribe a document (or do it very, very soon after calling socket.close())
  5. ShareDB hasn't yet received the socket's onclose() call (probably because the socket is still hanging up)
  6. ShareDB calls connection.send() and submits a subscribe request down the (closing) socket
  7. onclose() called, and ShareDB updates its state to disconnected, pushes inflight subscribe back onto pending queue
  8. Reconnect socket
  9. THIS IS THE KEY: for some reason, the reconnecting socket has buffered or retried the message sent whilst closing, and submits this as soon as the socket opens, even before ShareDB has started its handshake. ShareDB proceeds under the assumption that its previous message has been discarded, which it hasn't.
  10. ShareDB finishes its handshake, and retries its pending subscribe
  11. Server processes request that was sent before handshake, and the inflight subscribe calls its callback
  12. The actual request that was sent is processed, and returns, but finds no inflight subscribe and throws

Screenshot 2023-04-26 at 15 18 00

There's a lot of moving parts here, but I'm pretty sure this is the rough shape of the problem.

This could well be a bug in our reconnecting socket library, but regardless of that, I guess it raises a question around how ShareDB should act before its handshake has completed: should the server just bin any messages we receive before this? The client already tries to avoid sending messages before the handshake, so we should be able to assume anything sent before the handshake is rubbish?

@alecgibson
Copy link
Collaborator Author

alecgibson commented Apr 26, 2023

Ha yes, okay reconnecting-websocket will buffer messages if you call send() after the socket has been closed.

I guess this is probably a valid thing for a reconnecting socket to do in order to keep its API roughly aligned with a "vanilla" WebSocket that's just always connected. This particular behaviour can be turned off, so maybe this is a consumer problem...?

Still, I think we should make ShareDB behave "sensibly" if clients decide to retry messages on reconnection, before the handshake has happened. Will put together a PR and we can discuss there I guess.

@alecgibson
Copy link
Collaborator Author

Hmm had a very quick look at this and actually I'm not sure how best to do this without making a breaking change to our protocol.

Outline of v1.0 vs 1.1 protocol here: #341

The crux of it is that v1.0 just sends an init to the client, and doesn't wait for anything back from the client, so as far as a v1.0 server is concerned, it can just start processing any messages that come in as soon as it binds.

A v1.1 server can disregard any messages it receives before the client's hs message, but I think(?) there's no way for a server to really tell the difference between:

  • a legitimate message sent by a v1.0 client
  • a spurious retried message sent by a v1.0 client
  • a spurious retried message sent before handshake by a v1.1 client

We could potentially do something like buffer for some time, or n messages before we assume the handshake is never coming (ie is a v1.0 client), but this starts to feel janky.

Maybe we should just do something simple like log a warning if an Agent receives a message before handshake (which consumers can turn off if they know they still have v1.0 clients kicking about for some reason...?).

alecgibson added a commit that referenced this issue May 2, 2023
At the moment, it's possible for messages to be sent before the client-
server handshake.

Sending messages before the handshake has happened has undefined
behaviour, and can result in errors such as in:
#605

We can't just ignore these messages, because old clients might
potentially be on v1.0 of the client-server protocol, in which the
server informs the client when it's ready, but not the other way around,
so it's impossible to know when a client should be considered "ready",
and its messages acceptable.

Instead, we add a warning for clients on v1.1 who have sent a message
before their handshake. In order to aid with debugging, we keep track of
the first message received, and log it when the handshake is received
(which means that v1.0 clients will never get such a warning).
alecgibson added a commit that referenced this issue May 2, 2023
In our docs, we recommend using [`reconnecting-websocket`][1] to handle
websocket reconnection.

However, [by default][2] that library will [buffer][3] messages when its
underlying socket is closed, which can lead to [undefined behaviour][4]
when ShareDB reconnects, since it works under the assumption that all
messages sent as the socket is closing have been lost (eg pushing
inflight ops back onto the pending queue, etc.).

This change updates our documentation and our examples to set
`{maxEnqueuedMessages: 0}`, which disables this buffering, and should
help to avoid ShareDB reaching undefined states when reconnecting using
this library.

[1]: https://www.npmjs.com/package/reconnecting-websocket
[2]: https://github.com/pladaria/reconnecting-websocket/blob/05a2f7cb0e31f15dff5ff35ad53d07b1bec5e197/reconnecting-websocket.ts#L46
[3]: https://github.com/pladaria/reconnecting-websocket/blob/05a2f7cb0e31f15dff5ff35ad53d07b1bec5e197/reconnecting-websocket.ts#L260
[4]: #605
alecgibson added a commit that referenced this issue May 2, 2023
At the moment, it's possible for messages to be sent before the client-
server handshake.

Sending messages before the handshake has happened has undefined
behaviour, and can result in errors such as in:
#605

We can't just ignore these messages, because old clients might
potentially be on v1.0 of the client-server protocol, in which the
server informs the client when it's ready, but not the other way around,
so it's impossible to know when a client should be considered "ready",
and its messages acceptable.

Instead, we add a warning for clients on v1.1 who have sent a message
before their handshake. In order to aid with debugging, we keep track of
the first message received, and log it when the handshake is received
(which means that v1.0 clients will never get such a warning).
@ericyhwang
Copy link
Contributor

Alec says it's when the sharedb client tries to send data while the socket is in CLOSING. Unfortunately there's no 'closing' event.

In addition to the server change above, we are also considering specifically having the client check for socket.readyState === State.OPEN before trying to send, to catch cases where the socket is in CLOSING state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants