`TypeError Cannot read properties of null (reading 'callback')` when receiving subscribe response with no inflight subscribe #605

alecgibson · 2023-04-18T12:42:35Z

We're getting this error:

The error is happening because a subscribe response is being received over the WebSocket, but there's no inflightSubscribe set.

I've rummaged around in the code (which I wrote 😅 🙈 ) and I'm not really sure how we get into this state.

As far as I can tell:

we should only get a subscribe response after sending a subscribe request
a subscribe request can only be flushed when !!connection.canSend
if the connection is closed before the response is received, the inflightSubscribe is put back into the pendingSubscribe queue
the subscribe will be retried when the socket reconnects
the only way to hit this error state is to have unset (or never set?!) inflightSubscribe, which only happens:
from the stack trace, it's clear we're receiving a message, so we know the socket is connected

The text was updated successfully, but these errors were encountered:

alecgibson · 2023-04-26T14:19:39Z

Okay, I think I'm getting somewhere with this. The issue appears to be to do with our reconnecting websocket. It looks like this series of events is happening:

Use reconnecting socket to connect to ShareDB
Wait until connection fully established
Call socket.close()
SYNCHRONOUSLY subscribe a document (or do it very, very soon after calling socket.close())
ShareDB hasn't yet received the socket's onclose() call (probably because the socket is still hanging up)
ShareDB calls connection.send() and submits a subscribe request down the (closing) socket
onclose() called, and ShareDB updates its state to disconnected, pushes inflight subscribe back onto pending queue
Reconnect socket
THIS IS THE KEY: for some reason, the reconnecting socket has buffered or retried the message sent whilst closing, and submits this as soon as the socket opens, even before ShareDB has started its handshake. ShareDB proceeds under the assumption that its previous message has been discarded, which it hasn't.
ShareDB finishes its handshake, and retries its pending subscribe
Server processes request that was sent before handshake, and the inflight subscribe calls its callback
The actual request that was sent is processed, and returns, but finds no inflight subscribe and throws

There's a lot of moving parts here, but I'm pretty sure this is the rough shape of the problem.

This could well be a bug in our reconnecting socket library, but regardless of that, I guess it raises a question around how ShareDB should act before its handshake has completed: should the server just bin any messages we receive before this? The client already tries to avoid sending messages before the handshake, so we should be able to assume anything sent before the handshake is rubbish?

alecgibson · 2023-04-26T14:25:11Z

Ha yes, okay reconnecting-websocket will buffer messages if you call send() after the socket has been closed.

I guess this is probably a valid thing for a reconnecting socket to do in order to keep its API roughly aligned with a "vanilla" WebSocket that's just always connected. This particular behaviour can be turned off, so maybe this is a consumer problem...?

Still, I think we should make ShareDB behave "sensibly" if clients decide to retry messages on reconnection, before the handshake has happened. Will put together a PR and we can discuss there I guess.

alecgibson · 2023-04-26T16:08:06Z

Hmm had a very quick look at this and actually I'm not sure how best to do this without making a breaking change to our protocol.

Outline of v1.0 vs 1.1 protocol here: #341

The crux of it is that v1.0 just sends an init to the client, and doesn't wait for anything back from the client, so as far as a v1.0 server is concerned, it can just start processing any messages that come in as soon as it binds.

A v1.1 server can disregard any messages it receives before the client's hs message, but I think(?) there's no way for a server to really tell the difference between:

a legitimate message sent by a v1.0 client
a spurious retried message sent by a v1.0 client
a spurious retried message sent before handshake by a v1.1 client

We could potentially do something like buffer for some time, or n messages before we assume the handshake is never coming (ie is a v1.0 client), but this starts to feel janky.

Maybe we should just do something simple like log a warning if an Agent receives a message before handshake (which consumers can turn off if they know they still have v1.0 clients kicking about for some reason...?).

At the moment, it's possible for messages to be sent before the client- server handshake. Sending messages before the handshake has happened has undefined behaviour, and can result in errors such as in: #605 We can't just ignore these messages, because old clients might potentially be on v1.0 of the client-server protocol, in which the server informs the client when it's ready, but not the other way around, so it's impossible to know when a client should be considered "ready", and its messages acceptable. Instead, we add a warning for clients on v1.1 who have sent a message before their handshake. In order to aid with debugging, we keep track of the first message received, and log it when the handshake is received (which means that v1.0 clients will never get such a warning).

In our docs, we recommend using [`reconnecting-websocket`][1] to handle websocket reconnection. However, [by default][2] that library will [buffer][3] messages when its underlying socket is closed, which can lead to [undefined behaviour][4] when ShareDB reconnects, since it works under the assumption that all messages sent as the socket is closing have been lost (eg pushing inflight ops back onto the pending queue, etc.). This change updates our documentation and our examples to set `{maxEnqueuedMessages: 0}`, which disables this buffering, and should help to avoid ShareDB reaching undefined states when reconnecting using this library. [1]: https://www.npmjs.com/package/reconnecting-websocket [2]: https://github.com/pladaria/reconnecting-websocket/blob/05a2f7cb0e31f15dff5ff35ad53d07b1bec5e197/reconnecting-websocket.ts#L46 [3]: https://github.com/pladaria/reconnecting-websocket/blob/05a2f7cb0e31f15dff5ff35ad53d07b1bec5e197/reconnecting-websocket.ts#L260 [4]: #605

At the moment, it's possible for messages to be sent before the client- server handshake. Sending messages before the handshake has happened has undefined behaviour, and can result in errors such as in: #605 We can't just ignore these messages, because old clients might potentially be on v1.0 of the client-server protocol, in which the server informs the client when it's ready, but not the other way around, so it's impossible to know when a client should be considered "ready", and its messages acceptable. Instead, we add a warning for clients on v1.1 who have sent a message before their handshake. In order to aid with debugging, we keep track of the first message received, and log it when the handshake is received (which means that v1.0 clients will never get such a warning).

ericyhwang · 2023-05-02T16:25:41Z

Alec says it's when the sharedb client tries to send data while the socket is in CLOSING. Unfortunately there's no 'closing' event.

In addition to the server change above, we are also considering specifically having the client check for socket.readyState === State.OPEN before trying to send, to catch cases where the socket is in CLOSING state.

alecgibson mentioned this issue May 2, 2023

🔊 Warn when messages are sent before the v1.1 handshake #607

Merged

alecgibson mentioned this issue May 2, 2023

📝 Update docs around reconnecting-websocket #608

Merged

alecgibson closed this as completed in #607 May 2, 2023

alecgibson mentioned this issue Jun 12, 2023

Fix "Cannot read properties of null (reading 'callback')" error occured in Doc.prototype._handleSubscribe #613

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TypeError Cannot read properties of null (reading 'callback')` when receiving subscribe response with no inflight subscribe #605

`TypeError Cannot read properties of null (reading 'callback')` when receiving subscribe response with no inflight subscribe #605

alecgibson commented Apr 18, 2023

alecgibson commented Apr 26, 2023

alecgibson commented Apr 26, 2023 •

edited

Loading

alecgibson commented Apr 26, 2023

ericyhwang commented May 2, 2023

TypeError Cannot read properties of null (reading 'callback') when receiving subscribe response with no inflight subscribe #605

TypeError Cannot read properties of null (reading 'callback') when receiving subscribe response with no inflight subscribe #605

Comments

alecgibson commented Apr 18, 2023

alecgibson commented Apr 26, 2023

alecgibson commented Apr 26, 2023 • edited Loading

alecgibson commented Apr 26, 2023

ericyhwang commented May 2, 2023

`TypeError Cannot read properties of null (reading 'callback')` when receiving subscribe response with no inflight subscribe #605

`TypeError Cannot read properties of null (reading 'callback')` when receiving subscribe response with no inflight subscribe #605

alecgibson commented Apr 26, 2023 •

edited

Loading