synapse hangs when getting room history #2726

sbug-1bit · 2017-12-03T17:59:16Z

Description

Scrolling up in Riot to get history for one or more room(s) can sometimes cause the server to use huge amount of memory, 100% of CPU and finally become unresponsive.

Steps to reproduce

Join some rooms (e.g #riot:matrix.org and #matrix:matrix.org)
stay in the rooms for some days
scroll up in Riot for the rooms, to the point where the server starts loading

I expect to get room history from the server, but the whole server goes bananas and every client connected get disconnected.

Some stuff from my homeserver.log

//---------------------------------------------//

2017-12-03 16:38:36,902 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180983- {GET-O-207649} Sending request failed to matrix.org: GET matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%24149477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.ovh: NoRouteError('Network is unreachable',)

//---------------------------------------------//

2017-12-03 16:38:42,289 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180985- {GET-O-207654} Sending request failed to matrix.org: GE
T matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%2414
9477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.
ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,291 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180992- {GET-O-207655} Sending request failed to matrix.org: GE
T matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%2414
9477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.
ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,292 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180980- {GET-O-207653} Sending request failed to matrix.org: GE
T matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%2414
9477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.
ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,293 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180974- {GET-O-207646} Sending request failed to matrix.org: GE
T matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%2414
9477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.
ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,294 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180914- {GET-O-207638} Sending request failed to matrix.org: GE
T matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%2414
9477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.
ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,295 - synapse.http.outbound - 236 - INFO - GET-180914- {GET-O-207638} [matrix.org] Result: NoRouteError('Network is unreachable
',)
2017-12-03 16:38:42,296 - synapse.handlers.federation - 788 - ERROR - GET-180914- Failed to backfill from matrix.org because No route to host: 101:
 Network is unreachable.
Traceback (most recent call last):
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/handlers/federation.py", line 761, in try_backfill
    extremities=[e for e in extremities.keys()]
NoRouteError: No route to host: 101: Network is unreachable.

//---------------------------------------------//

2017-12-03 16:38:42,327 - synapse.http.outbound - 154 - INFO - GET-180964- {GET-O-207744} [jki.re] Sending request: GET matrix://jki.re/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%24149477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.ovh
2017-12-03 16:38:42,329 - synapse.http.endpoint - 264 - INFO - - Connecting to 212.71.233.145:8080
2017-12-03 16:38:42,334 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207465} Sending request failed to willworkforcookies.com: PUT matrix://willworkforcookies.com/_matrix/federation/v1/send/1512207355002/: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,337 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180915- {GET-O-207658} Sending request failed to jki.re: GET matrix://jki.re/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%24149477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,340 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180967- {GET-O-207659} Sending request failed to jki.re: GET matrix://jki.re/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%24149477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,341 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207331} Sending request failed to jki.re: PUT matrix://jki.re/_matrix/federation/v1/send/1512207354868/: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,343 - synapse.http.matrixfederationclient - 212 - WARNING - GET-180972- {GET-O-207657} Sending request failed to jki.re: GET matrix://jki.re/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%24149477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.ovh: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,344 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207518} Sending request failed to maunium.net: PUT matrix://maunium.net/_matrix/federation/v1/send/1512207355055/: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,346 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207462} Sending request failed to ozlabs.org: PUT matrix://ozlabs.org/_matrix/federation/v1/send/1512207354999/: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,347 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207485} Sending request failed to status.im: PUT matrix://status.im/_matrix/federation/v1/send/1512207355022/: NoRouteError('Network is unreachable',)
2017-12-03 16:38:42,349 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207380} Sending request failed to berigora.net: PUT matrix://berigora.net/_matrix/federation/v1/send/1512207354917/: NoRouteError('Network is unreachable',)


//---------------------------------------------//

2017-12-03 16:39:48,688 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207396} Sending request failed to nuclearlemons.uk: PUT ma
trix://nuclearlemons.uk/_matrix/federation/v1/send/1512207354933/: NoRouteError('Network is unreachable',)
2017-12-03 16:39:48,693 - twisted - 131 - INFO - - DNSDatagramProtocol starting on 37041
2017-12-03 16:39:48,694 - twisted - 131 - INFO - - Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fbfe9ba1750>
2017-12-03 16:39:48,697 - synapse.http.endpoint - 264 - INFO - - Connecting to 139.162.251.243:8448
2017-12-03 16:39:48,703 - twisted - 131 - INFO - - DNSDatagramProtocol starting on 61735
2017-12-03 16:39:48,704 - twisted - 131 - INFO - - Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fbff1696250>
2017-12-03 16:39:48,708 - synapse.http.endpoint - 264 - INFO - - Connecting to 212.201.68.138:8448
2017-12-03 16:39:48,710 - twisted - 131 - INFO - - Stopping factory _HTTP11ClientFactory(<function quiescentCallback at 0x7fbfea1407d0>, <synapse.h
ttp.endpoint._WrappingEndpointFac object at 0x7fbfb0ac9150>)
2017-12-03 16:39:48,711 - synapse.metrics - 162 - INFO - - Collecting gc 0
2017-12-03 16:39:50,231 - synapse.http.outbound - 236 - INFO - GET-180962- {GET-O-207976} [jki.re] Result: 200 OK
2017-12-03 16:39:50,233 - synapse.http.outbound - 236 - INFO - GET-180986- {GET-O-207975} [jki.re] Result: 200 OK
2017-12-03 16:39:50,239 - synapse.http.outbound - 236 - INFO - - {PUT-O-207536} [ejpcmac.net] Result: 200 OK
2017-12-03 16:39:50,241 - synapse.federation.transaction_queue - 622 - INFO - - TX [ejpcmac.net] {1512207355073} got 200 response
2017-12-03 16:39:50,242 - synapse.federation.transaction_queue - 578 - INFO - - TX [ejpcmac.net] {1512207355459} Sending transaction [1512207355459
], (PDUs: 0, EDUs: 1, failures: 0)
2017-12-03 16:39:50,243 - synapse.http.outbound - 154 - INFO - - {PUT-O-208003} [ejpcmac.net] Sending request: PUT matrix://ejpcmac.net/_matrix/fed
eration/v1/send/1512207355459/
2017-12-03 16:39:50,251 - synapse.http.outbound - 236 - INFO - - {PUT-O-207509} [madduck.net] Result: 200 OK
2017-12-03 16:39:50,252 - synapse.federation.transaction_queue - 622 - INFO - - TX [madduck.net] {1512207355046} got 200 response
2017-12-03 16:39:50,254 - synapse.federation.transaction_queue - 578 - INFO - - TX [madduck.net] {1512207355460} Sending transaction [1512207355460
], (PDUs: 0, EDUs: 1, failures: 0)
2017-12-03 16:39:50,255 - synapse.http.outbound - 154 - INFO - - {PUT-O-208004} [madduck.net] Sending request: PUT matrix://madduck.net/_matrix/fed
eration/v1/send/1512207355460/
2017-12-03 16:39:50,263 - synapse.http.outbound - 236 - INFO - - {PUT-O-207460} [riot.ovh] Result: 200 OK
2017-12-03 16:39:50,265 - synapse.federation.transaction_queue - 622 - INFO - - TX [riot.ovh] {1512207354997} got 200 response
2017-12-03 16:39:50,266 - synapse.federation.transaction_queue - 578 - INFO - - TX [riot.ovh] {1512207355461} Sending transaction [1512207355461], (PDUs: 0, EDUs: 1, failures: 0)
2017-12-03 16:39:50,267 - synapse.http.outbound - 154 - INFO - - {PUT-O-208005} [riot.ovh] Sending request: PUT matrix://riot.ovh/_matrix/federation/v1/send/1512207355461/
2017-12-03 16:39:50,305 - synapse.http.outbound - 236 - INFO - GET-180972- {GET-O-207995} [jki.re] Result: 200 OK
2017-12-03 16:39:50,307 - synapse.http.outbound - 236 - INFO - GET-180977- {GET-O-207996} [jki.re] Result: 200 OK
2017-12-03 16:39:50,317 - synapse.http.outbound - 236 - INFO - GET-180963- {GET-O-207997} [jki.re] Result: 200 OK
2017-12-03 16:39:50,370 - synapse.http.outbound - 236 - INFO - POST-181220- {POST-O-207994} [msgs.tk] Result: 200 OK
2017-12-03 16:39:50,372 - synapse.http.outbound - 154 - INFO - POST-181220- {POST-O-208006} [matrix.org] Sending request: POST matrix://matrix.org/_matrix/federation/v1/get_groups_publicised
2017-12-03 16:39:50,382 - synapse.access.https.8448 - 59 - INFO - PUT-181229- 51.15.53.216 - 8448 - Received request: PUT /_matrix/federation/v1/send/1510421211180/
2017-12-03 16:39:50,386 - synapse.federation.transport.server - 140 - INFO - PUT-181229- Request from matrix.ffslfl.net
2017-12-03 16:39:50,388 - synapse.federation.transport.server - 251 - INFO - PUT-181229- Received txn 1510421211180 from matrix.ffslfl.net. (PDUs: 0, EDUs: 1, failures: 0)
2017-12-03 16:39:50,388 - synapse.util.async - 230 - INFO - PUT-181229- Acquired uncontended linearizer lock 'fed_txn_handler' for key (u'matrix.ffslfl.net', u'1510421211180')
2017-12-03 16:39:50,404 - synapse.access.https.8448 - 59 - INFO - PUT-181230- 5.196.33.160 - 8448 - Received request: PUT /_matrix/federation/v1/send/1510698073360/

//---------------------------------------------//

017-12-03 16:39:54,562 - synapse.federation.transport.server - 140 - INFO - PUT-181241- Request from ghostcorp.net
2017-12-03 16:39:54,563 - synapse.federation.transport.server - 251 - INFO - PUT-181241- Received txn 1508788481345 from ghostcorp.net. (PDUs: 0, EDUs: 1, failures: 0)
2017-12-03 16:39:54,564 - synapse.util.async - 230 - INFO - PUT-181241- Acquired uncontended linearizer lock 'fed_txn_handler' for key (u'ghostcorp.net', u'1508788481345')
2017-12-03 16:39:54,566 - synapse.http.matrixfederationclient - 212 - WARNING - - {PUT-O-207736} Sending request failed to matrix.vops.no: PUT matrix://matrix.vops.no/_matrix/federation/v1/send/1512207355229/: ConnectError('No route to host',)
2017-12-03 16:39:54,568 - twisted - 131 - INFO - - Stopping factory _HTTP11ClientFactory(<function quiescentCallback at 0x7fbfb66ce7d0>, <synapse.http.endpoint._WrappingEndpointFac object at 0x7fbfae507750>)
2017-12-03 16:39:54,603 - synapse.http.endpoint - 264 - INFO - - Connecting to matrix.sjemm.net:8448
2017-12-03 16:39:54,613 - synapse.http.endpoint - 264 - INFO - - Connecting to node5.voycephone.com:8448
2017-12-03 16:39:54,624 - synapse.http.endpoint - 264 - INFO - - Connecting to dmatrix.xyz:8448
2017-12-03 16:39:54,657 - synapse.util.async - 237 - INFO - PUT-181127- Releasing linearizer lock 'fed_room_pdu' for key '!cURbafjkfsMDVwdRDQ:matrix.org'
2017-12-03 16:39:54,660 - synapse.federation.federation_server - 199 - ERROR - PUT-181127- Failed to handle PDU $151231902824UWYNu:willworkforcookies.com
Traceback (most recent call last):
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/federation/federation_server.py", line 191, in process_pdus_for_room
    transaction.origin, pdu
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/federation/federation_server.py", line 584, in _handle_received_pdu
    yield self.handler.on_receive_pdu(origin, pdu, get_missing=True)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/handlers/federation.py", line 193, in on_receive_pdu
    origin, pdu, prevs, min_depth
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/handlers/federation.py", line 296, in _get_missing_events_for_pdu
    timeout=10000,
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/federation/federation_client.py", line 787, in get_missing_events
    timeout=timeout,
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/federation/transport/client.py", line 470, in get_missing_events
    timeout=timeout,
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/http/matrixfederationclient.py", line 388, in post_json
    ignore_backoff=ignore_backoff,
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/http/matrixfederationclient.py", line 136, in _request
    ignore_backoff=ignore_backoff,
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/home/spacebug/.synapse/local/lib/python2.7/site-packages/synapse/util/retryutils.py", line 82, in get_retry_limiter
    destination=destination,
NotRetryingDestination: Not retrying server willworkforcookies.com.
2017-12-03 16:39:55,076 - synapse.util.async - 237 - INFO - - Releasing linearizer lock '_JoinedHostsCache' for key ()
2017-12-03 16:40:32,964 - synapse.http.outbound - 154 - INFO - GET-181198- {GET-O-208008} [matrix.org] Sending request: GET matrix://matrix.org/_matrix/federation/v1/backfill/!cURbafjkfsMDVwdRDQ:matrix.org/?limit=100&v=%2414946262111053koxQX%3Amatrix.zottel.net&v=%24149477510723IVych%3Amatrix.imcpwn.com&v=%2414958216361415416rMTmt%3Amatrix.org&v=%241501108226525ijeNf%3Amatrixim.cc&v=%2414957447092437gqHKB%3Ariot.ovh
2017-12-03 16:40:34,193 - synapse.util.async - 237 - INFO - PUT-181227- Releasing linearizer lock 'fed_txn_handler' for key (u'chat.droogers.net', u'1511017312765')
2017-12-03 16:40:34,195 - synapse.access.https.8448 - 91 - INFO - PUT-181227- 91.121.163.93 - 8448 - {chat.droogers.net} Processed request: 49373ms (7ms, 4ms) (172ms/3) 11B 200 "PUT /_matrix/federation/v1/send/1511017312765/ HTTP/1.1" "Synapse/0.25.1"
2017-12-03 16:41:19,004 - twisted - 131 - INFO - - Starting factory _HTTP11ClientFactory(<function quiescentCallback at 0x7fbfd6a92b18>, <synapse.http.endpoint._WrappingEndpointFac object at 0x7fbf753e3490>)
2017-12-03 16:41:19,012 - synapse.util.async - 237 - INFO - PUT-181232- Releasing linearizer lock 'fed_txn_handler' for key (u'neon.cat', u'1512303878966')

//---------------------------------------------//

2017-12-03 16:42:33,324 - twisted - 131 - INFO - - (UDP Port 38880 Closed)
2017-12-03 16:42:33,324 - twisted - 131 - INFO - - Stopping protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fbf67c5ea50>
2017-12-03 16:42:33,331 - twisted - 131 - INFO - - DNSDatagramProtocol starting on 16624
2017-12-03 16:42:33,332 - twisted - 131 - INFO - - Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fbf6fe7a690>
2017-12-03 16:42:33,333 - twisted - 131 - INFO - - DNSDatagramProtocol starting on 50570
2017-12-03 16:42:33,334 - twisted - 131 - INFO - - Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7fbf6fe7a810>
2017-12-03 16:42:33,337 - twisted - 131 - INFO - - Stopping factory _HTTP11ClientFactory(<function quiescentCallback at 0x7fbf663f61b8>, <synapse.http.endpoint._WrappingEndpointFac object at 0x7fbf7527f8d0>)
2017-12-03 16:42:33,338 - twisted - 131 - INFO - - Stopping factory _HTTP11ClientFactory(<function quiescentCallback at 0x7fbfb13f1aa0>, <synapse.http.endpoint._WrappingEndpointFac object at 0x7fbfe9e4cf50>)
2017-12-03 16:42:33,339 - synapse.metrics - 162 - INFO - - Collecting gc 2

Thats the last line before the server totally hanged -^

Version information

Homeserver: Was this issue identified on matrix.org or another homeserver?
My own homeserver lithen.net

If not matrix.org:

Version: What version of Synapse is running?
Synapse/0.25.1
Install method: package manager/git clone/pip
Installed using pip
Platform: Tell us about the environment in which your homeserver is operating
- distro, hardware, if it's running in a vm/container, etc.
Platform is GNU/Linux 'Debian Stretch', where synapse and turnserver are running inside an LXC.
I'm using postgresql with synapse. (I also did have this issue when using sqlite, thats why I changed to postgresql to test).
In another LXC, I have apache running as a webserver and redirects matrix.lithen.net to synapse.
I have a SRV record in my DNS for matrix so I can have lithen.net as homeserver.

The text was updated successfully, but these errors were encountered:

ShellCode33 · 2017-12-28T22:07:43Z

Same problem here, I'm running synapse in an LXC (and the LXC is behind a reverse proxy (nginx)) and I cant make it work :(

telnet 192.168.1.6 8448 gives me "Connection refused" how is that possible ?

from nginx logs :
connect() failed (111: Connection refused) while connecting to upstream, client:

Here is my nginx config:
https://0bin.net/paste/miSdHH+lC4vk988p#uTjdp6Ecm0WrshgdQqBFHLOtf+JMXYlLtP6QMybIINA

sbug-1bit · 2017-12-28T22:24:38Z

This is not the same issue that I have.

You should not forward port 8448 in ngnix !
Port 8448 should be forwarded in your router directly to your LXC with synapse. (This port uses TLS).
(This port has nothing to do with running a proxy).

You should forward port 443 for a virtualhost in ngnix, to port 8008 to your LXC with synapse.
So that if someone goes to https://yourdomain.com they come to your webserver, and if they go to https://matrix.yourdomain.com they are proxyed to http://192.168.1.6:8008/

(You will then also need a DNS record for matrix.yourdomain.com pointing to same as yourdomain.com).

For more help I suggest joining #matrix:matrix.org

ShellCode33 · 2017-12-29T10:28:09Z

Hey, thanks for your reply. The port 8008 is well configured, but I'm struggling with 8448. I removed it from the reverse proxy, I'm now trying to perform port forwarding with iptables. I've a bridge interface, all my LXC are in the subnet 192.168.1.0/24, I did the following :
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -o eno1 -j MASQUERADE

I tried a lot of iptables rules and to perform port forwarding I actually got :

# Matrix port redirection
iptables -t nat -A PREROUTING -p tcp --dport 8448 -j DNAT --to 192.168.1.6:8448
iptables -t nat -A PREROUTING -p udp --dport 8448 -j DNAT --to 192.168.1.6:8448
iptables -A FORWARD -d 192.168.1.6 -p tcp --dport 8448 -j ACCEPT
iptables -A FORWARD -d 192.168.1.6 -p udp --dport 8448 -j ACCEPT
iptables -A FORWARD --in-interface lxc-nat-bridge -j ACCEPT

I receive all messages but I cant send any. When I try to join a room, I got "Internal server error".... Please help 😭

sbug-1bit · 2017-12-29T11:03:20Z

Sorry it was many years ago I used iptables so I think you will find better help somewhere else about that.

ShellCode33 · 2017-12-29T11:07:16Z

@spacebug0 No problem, thanks anyway

sbug-1bit · 2017-12-30T19:34:10Z

Running synapse with metrics enables and Prometheus I got these.

Not sure which graphs are interested for the devs but as one can see something works really hard when getting room history (sometimes).

ara4n · 2017-12-30T23:05:03Z

I finally got a chance to look at this - @spacebug0, sorry for the delay and thanks for your patience.
Looking at the logs (https://lithen.net/files/homeserver.log), it seems that there's a whole stack of calls to /messages for Matrix HQ which never return:

sierra:Desktop matthew$ cat homeserver.log.txt | grep 'request.*/messages'
2017-12-30 18:23:30,885 - synapse.access.http.8008 - 59 - INFO - GET-74- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:24:36,685 - synapse.access.http.8008 - 59 - INFO - GET-170- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:25:35,224 - synapse.access.http.8008 - 59 - INFO - GET-266- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:26:36,649 - synapse.access.http.8008 - 59 - INFO - GET-370- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:27:41,834 - synapse.access.http.8008 - 59 - INFO - GET-459- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:28:24,664 - synapse.access.http.8008 - 59 - INFO - GET-521- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>

Looking at the individual stuck requests, they all seem to get stuck trying to resolve state after checking a given auth event, which is a power level change from July 2015 (https://riot.im/develop/#/room/#matrix:matrix.org/$1436805007295iQZlO:jki.re)

2017-12-30 18:30:27,304 - synapse.handlers.federation - 1800 - INFO - GET-74- Different auth: set([u'$1436805007295iQZlO:jki.re'])
2017-12-30 18:30:27,306 - synapse.state - 393 - INFO - GET-74- Resolving state for !cURbafjkfsMDVwdRDQ:matrix.org with 2 groups

2017-12-30 18:31:50,407 - synapse.handlers.federation - 1800 - INFO - GET-789- Different auth: set([u'$1436805007295iQZlO:jki.re'])
2017-12-30 18:31:50,412 - synapse.state - 393 - INFO - GET-789- Resolving state for !cURbafjkfsMDVwdRDQ:matrix.org with 2 groups

etc

The first one (GET-74) also does a huge amount of backfill attempts (which seem to succeed) before getting stuck on this 'bad' event.

I've asked @spacebug0 to leave the room, but it sounds like state resets cause them to spontaneously rejoin it - and then on calling /messages in the room the problem recurs.

However, after the stack of requests which fail, there are then a few which succeed (after restarting the server):

2017-12-30 18:33:02,372 - synapse.access.http.8008 - 59 - INFO - GET-83- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:33:02,376 - synapse.access.http.8008 - 59 - INFO - GET-84- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:34:55,905 - synapse.access.http.8008 - 91 - INFO - GET-83- 192.168.1.6 - 8008 - {@spacebug:lithen.net} Processed request: 113531ms (15531ms, 824ms) (29653ms/46) 2384B 200 "GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted> HTTP/1.1" "Riot.im/0.7.03 (Linux; U; Android 7.1.2; ZTE A2017U Build/NJH47; Flavour FDroid; MatrixAndroidSDK 0.8.03)"
2017-12-30 18:34:55,910 - synapse.access.http.8008 - 91 - INFO - GET-84- 192.168.1.6 - 8008 - {@spacebug:lithen.net} Processed request: 113533ms (16972ms, 900ms) (41852ms/94) 2380B 200 "GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted> HTTP/1.1" "Riot.im/0.7.03 (Linux; U; Android 7.1.2; ZTE A2017U Build/NJH47; Flavour FDroid; MatrixAndroidSDK 0.8.03)"

Looking at the request where /messages does manage to return (after 113s), the only difference I can see is that it tries to backfill from jki.re rather than matrix.org. And then subsequent ones seem to be quick:

2017-12-30 19:25:54,671 - synapse.access.http.8008 - 59 - INFO - GET-5275- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283929_3853327_24_96380_1621_19_141_10568_1&limit=30&access_token=<redacted>
2017-12-30 19:25:55,754 - synapse.access.http.8008 - 91 - INFO - GET-5275- 192.168.1.6 - 8008 - {@spacebug:lithen.net} Processed request: 1082ms (235ms, 4ms) (764ms/6) 2636B 200 "GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283929_3853327_24_96380_1621_19_141_10568_1&limit=30&access_token=<redacted> HTTP/1.1" "Riot.im/0.7.03 (Linux; U; Android 7.1.2; ZTE A2017U Build/NJH47; Flavour FDroid; MatrixAndroidSDK 0.8.03)"

So I'm wondering if the initial /messages req (GET-74) gets complete stuck, wedging all subsequent ones until the server is restarted, leaking RAM as they stack up. But it's unclear why it gets wedged, and why it succeeds later on when it backfills from jki.re.

A final observation:

2017-12-30 18:26:12,945 - synapse.handlers.federation - 770 - INFO - GET-74- Failed to backfill from matrix.org because 403: Room '!cURbafjkfsMDVwdRDQ:matrix.org' does not exist

...does not look good at all.

@erikjohnston, any idea what's going on here? is it possible that a backfill failure like this would cause this explosion?

aventrax · 2018-01-29T14:54:38Z

I'm in a similar situation, I had given up to try joining big rooms, but even using synapse with 5 users but still "federated" (just to be able to chat with a couple of friends), my VPS (2GB of ram) is suffering too much and the process takes too much memory.

It can happen that for 2-3 days it does not crash, but then (I still ignore the cause) it start eating ram and the oom-killer start killing things.

Initially I was furious because other processes (like my postgres db) was killed randomly, but now I added the MemoryLimit=512M to my /lib/systemd/system/matrix-synapse.service (on Debian 9) and only synapse is killed, other process survive. With this setting my synapse server is killed once a day than is auto-restarted by systemd and on the client side the users are happy, but I hope to have this solved...

richvdh · 2022-06-29T10:03:24Z

Duplicate of #2504

richvdh mentioned this issue Feb 21, 2019

backfill slow and expensive #2504

Open

anoadragon453 mentioned this issue Sep 18, 2020

Use _check_sigs_and_hash_and_fetch to validate backfill requests #8350

Merged

richvdh closed this as completed Jun 29, 2022

richvdh marked this as a duplicate of #2504 Jun 29, 2022

MadLittleMods added the A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) label Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synapse hangs when getting room history #2726

synapse hangs when getting room history #2726

sbug-1bit commented Dec 3, 2017 •

edited by richvdh

Loading

ShellCode33 commented Dec 28, 2017 •

edited

Loading

sbug-1bit commented Dec 28, 2017

ShellCode33 commented Dec 29, 2017

sbug-1bit commented Dec 29, 2017

ShellCode33 commented Dec 29, 2017

sbug-1bit commented Dec 30, 2017

ara4n commented Dec 30, 2017

aventrax commented Jan 29, 2018 •

edited

Loading

richvdh commented Jun 29, 2022 •

edited

Loading

synapse hangs when getting room history #2726

synapse hangs when getting room history #2726

Comments

sbug-1bit commented Dec 3, 2017 • edited by richvdh Loading

Description

Steps to reproduce

Version information

ShellCode33 commented Dec 28, 2017 • edited Loading

sbug-1bit commented Dec 28, 2017

ShellCode33 commented Dec 29, 2017

sbug-1bit commented Dec 29, 2017

ShellCode33 commented Dec 29, 2017

sbug-1bit commented Dec 30, 2017

ara4n commented Dec 30, 2017

aventrax commented Jan 29, 2018 • edited Loading

richvdh commented Jun 29, 2022 • edited Loading

sbug-1bit commented Dec 3, 2017 •

edited by richvdh

Loading

ShellCode33 commented Dec 28, 2017 •

edited

Loading

aventrax commented Jan 29, 2018 •

edited

Loading

richvdh commented Jun 29, 2022 •

edited

Loading