Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Replication request exceeds max size when trying to join room #9956

Closed
deepbluev7 opened this issue May 7, 2021 · 15 comments · Fixed by #10118
Closed

Replication request exceeds max size when trying to join room #9956

deepbluev7 opened this issue May 7, 2021 · 15 comments · Fixed by #10118
Assignees
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-p2 (Deprecated Label)

Comments

@deepbluev7
Copy link
Contributor

Description

I'm getting the following error when trying to join #irc:matrix.org:

2021-05-07 20:16:08,373 - synapse.http.site - 107 - WARNING - sentinel - Aborting connection from IPv4Address(type='TCP', host='127.0.0.1', port=44100) because the request exceeds maximum size
2021-05-07 20:16:08,375 - synapse.http.site - 312 - INFO - sentinel - Connection from IPv4Address(type='TCP', host='127.0.0.1', port=44100) lost before request headers were read
2021-05-07 20:16:08,376 - synapse.http.client - 440 - INFO - POST-3 - Received response to POST http://127.0.0.1:9893/_synapse/replication/remote_join/%21BAXLHOFjvDKUeLafmO%3Amatrix.org/%40deepbluev7%3Aneko.dev/qjkxpFkuVk: 502
2021-05-07 20:16:08,376 - synapse.http.server - 88 - INFO - POST-3 - <SynapseRequest at 0x7f0f95586d00 method='POST' uri='/_matrix/client/r0/join/%23irc%3Amatrix.org' clientproto='HTTP/1.0' site=8086> SynapseError: 502 - Failed to talk to 
main process
2021-05-07 20:16:08,377 - synapse.access.http.8086 - 387 - INFO - POST-3 - 127.0.0.1 - 8086 - {@deepbluev7:neko.dev} Processed request: 27.573sec/-0.000sec (0.005sec, 0.001sec) (0.002sec/0.008sec/10) 64B 502 "POST /_matrix/client/r0/join/%
23irc%3Amatrix.org HTTP/1.0" "mtxclient v0.5.1" [0 dbevts]

I have a few workers with an event_creator and persister as well as a master and a few federation workers. This is from the event_creator log. Increasing the maximum size in

if self.content.tell() + len(data) > self._max_request_body_size:
made the join work.

Steps to reproduce

  • Setup synapse with event_creator
  • Try to join #irc:matrix.org

Version information

  • Homeserver:

If not matrix.org: neko.dev

  • Version: 1.33.1

  • Install method: custom ebuild

  • Platform: Gentoo
@clokep
Copy link
Member

clokep commented May 7, 2021

Hmm we probably shouldn't be limiting the size of replication traffic?

@deepbluev7
Copy link
Contributor Author

Or at least the limit should probably be fairly big. I have no idea, what it currently is, but a few hundred megabytes is probably reasonable?

@babolivier babolivier added P2 S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels May 10, 2021
@babolivier
Copy link
Contributor

For the record, the current limit is 1024, which sounds quite low to me.
This needs discussion on whether to drop the limit or increase it to a more sensible value.

@MTRNord
Copy link
Contributor

MTRNord commented May 10, 2021

Having the same issue now in #steamlug:matrix.org and got the report that this causes each time my bot tries to join to do a "tried to join but made no change" membership event. So it seems like it joins but synapse doesnt realize it joined

@deepbluev7
Copy link
Contributor Author

Yes, if that request fails a proper join even is sent, you synapse just fails to process the response.

@MTRNord
Copy link
Contributor

MTRNord commented May 10, 2021

Yes, if that request fails a proper join even is sent, you synapse just fails to process the response.

Yeah in this case it caused hundreds of join events and most likely even explains the extrem high load I had on join replication on master

@clokep
Copy link
Member

clokep commented May 10, 2021

For the record, the current limit is 1024, which sounds quite low to me.

That's the default in the __init__, but Synapse seems to alway creates it with at least 200 * 65536, I believe. See #9817.

@maranda
Copy link

maranda commented May 21, 2021

For what is worth my server (running 1.34) seemingly was affected by this issue as well, and was a show stopper in joining a very large number of federated rooms. This seems to be badly affecting Synapse workers setups with sharded event persisters.

@maranda
Copy link

maranda commented May 23, 2021

Until something better will come from upstream I temporarily came up with this patch:
https://github.com/maranda/synapse/commit/2567207edf383e8aaccf8c65342e5722f08834d7
Which allows to set the max request size limit per worker, I did set default_max_request_size: 65535000 in the event persister workers configuration files, which raises 5 folds the default size limit (around 64 MiB). It's not a very elegant solution but works.

@erikjohnston
Copy link
Member

I think #10082 might fix this?

@clokep
Copy link
Member

clokep commented May 27, 2021

I'm hoping it will be fixed by this, I suppose it is still possible to hit this error but should be unlikely.

@deepbluev7
Copy link
Contributor Author

It looks like it would fix it for most cases, but it will fail with 101 events, that all have the maximum event size, I suppose.

@callahad
Copy link
Contributor

callahad commented Jun 3, 2021

We probably should reduce the batch size to account for that just to be certain. @erikjohnston to review

@erikjohnston
Copy link
Member

Yeah, looks like we cap request size to 200 events, and we're currently batching into 1000 events.

@richvdh
Copy link
Member

richvdh commented Jun 4, 2021

this was introduced by #9817.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. z-p2 (Deprecated Label)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants