-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Sync requests immediately return with empty payload #8518
Comments
@KitsuneRal any idea when you first saw this? in the last couple of weeks, or longer? |
We can see this happening clearly in the server logs for roughly 1m04s at the mentioned time. We're not entirely sure if this is a regression in v1.21.0 or not. Here are the logs from the server side: https://gist.github.com/anoadragon453/4a54bbe3388cdc9f4d9c05feb1c028d2 We can see that they indeed return immediately for the same access token, even when a My hunch is that somehow this condition is synapse/synapse/handlers/sync.py Lines 306 to 311 in 9ca6341
The parameters to the request look fine, so It would be helpful to add some debug logging here to help track down what exactly is going wrong, although be aware this is a fairly hot path. |
Yes, in something like a couple of weeks. |
This seems fairly easy to reproduce with any user who sees no activity in their rooms during the sync timeout: the first /sync times out, returning the same sync token in Look at the logs from that first /sync request:
It builds the empty response when the endpoint is first called, rather than waiting for the timeout. Then it waits for 30 seconds and returns the empty response, so even though there has been traffic in other rooms in the meantime, the sync token is not advanced. Compare with the debug for an empty sync response from synapse 1.20:
... it waits for 30 seconds before calling the Presumably this means that the timeout at https://github.com/matrix-org/synapse/blob/v1.22.0/synapse/notifier.py#L437-L444 is not working correctly. |
Note that it only happens on matrix-org-hotfixes which has a non-zero |
Preliminary observations; It looks like the Edit: Just read everything, yeah, it looks like that's the case. |
This fixes #8518 by adding a conditional check on `SyncResult` in a function when `prev_stream_token == current_stream_token`, as a sanity check. In `CachedResponse.set.<remove>()`, the result is immediately popped from the cache if the conditional function returns "false". This prevents the caching of a timed-out `SyncResult` (that has `next_key` as the stream key that produced that `SyncResult`). The cache is prevented from returning a `SyncResult` that makes the client request the same stream key over and over again, effectively making it stuck in a loop of requesting and getting a response immediately for as long as the cache keeps those values. Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
Re-opening as the fix has been backed out, see #9358 (comment). |
To summarise what was said before: the proximate cause of this is An obvious instance of this would happen on a completely idle homeserver: if there is nothing going on, then there is nothing to advance the tokens. However, that doesn't help explain what's wrong with I think this was significantly exacerbated in v1.22.0 by #8439, in particular https://github.com/matrix-org/synapse/pull/8439/files#diff-d5643d4a94ee928bdd677a4ffbfbf3c571bc0a9f84d00d92dde31d91a6d53533R379. Setting However, this report predates that change. It's possible that this change to Anyway: we need to fix the caching behaviour anyway to fix the idle homeserver case. I think we should also fix |
…0157) This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug. The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
) Fixes #8518 by telling the ResponseCache not to cache the /sync response if the next_batch param is the same as the since token.
Description
Occasionally, without any specific action on my end, a normal long-polling sync loop becomes a storm of syncs, as each next request sent from Quotient returns immediately (30-40 ms) with a JSON object that has the full sync response structure (
rooms
withinvite
/join
/leave
inside,account_data
,to_device
, etc.) and reasonable-lookingnext_batch
but with empty objects (apparently because no events arrived since the last sync). Usually this happens within 30 seconds to 2 minutes, and requests stay with the server in a normal way (seconds) after that.Steps to reproduce
I couldn't identify a particular pattern. For reference, it occured between 10:28 and 10:29 CEST with sync coming from my user (
@kitsune
), withnext_batch
at 10:29:07 looking like "s1589639296_757284957_14250253_610667103_445506552_1585308_106571744_371537760_134971"The text was updated successfully, but these errors were encountered: