Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse v1.0.0rc1 significantly increases RAM usage #5395

Closed
evilham opened this issue Jun 7, 2019 · 30 comments
Closed

Synapse v1.0.0rc1 significantly increases RAM usage #5395

evilham opened this issue Jun 7, 2019 · 30 comments
Labels
A-Performance Performance, both client-facing and admin-facing z-bug (Deprecated Label)

Comments

@evilham
Copy link
Contributor

evilham commented Jun 7, 2019

Description

My main Synapse server had been running happily with 4G RAM for some months, after upgrading to v1.0.0rc1 RAM usage increased a lot, leading to system crash.

After disabling presence (usual trick to reduce RAM usage), the RAM consumption went down enough for the server to be usable, but it is still out of the ordinary.

Steps to reproduce

  • upgrade to v1.0.0rc1
  • see RAM usage increase
  • beware of potential server crash

Version information

  • Version: v1.0.0rc1

  • Install method: pip

Note: I waited until another user on #synapse-admins:matrix.org confirmed this.

@fredcy
Copy link

fredcy commented Jun 7, 2019

I'm running synapse v1.0.0rc1 on a vps with 8G RAM. Not long after restarting synapse (less than an hour) I see it using about 2.8g of RES memory per the top command. Unfortunately, I don't have a baseline reading prior to upgrading to v1.0.0rc1.

@aaronraimist
Copy link
Contributor

My RAM usage also spiked immediately after upgrading to 1.0.0rc1. Usually I hover around 2.1-2.8 GB used but it takes a while to get there. After upgrading 1.0.0rc1 my RAM usage was steadily climbing until it got to 3.7GB (out of 3.85GB total) and is currently sitting at 100% CPU on both cores.

@turt2live
Copy link
Member

It's possible that the matrix.org outages are causing problems for people - how are extremities (#1760) looking?

@aaronraimist
Copy link
Contributor

Extremities are ok. Highest is 5. (this is after Synapse OOMed and I stopped it from restarting but I assume it doesn't need to be running)

@aaronraimist
Copy link
Contributor

aaronraimist commented Jun 7, 2019

Haven't setup grafana yet but this gets spammed in the logs as right memory usage is climbing rapidly. When I just restart my server it goes to about normal memory levels but it spikes as soon as I say something in a big room.

Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,138 - synapse.federation.sender.per_destination_queue - 306 - WARNING - federation_transaction_transmission_loop-641 - TX [matrix.3lite.com] Received 401 response to transaction: 401: b'Unauthorized'
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,152 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1153 - {PUT-O-1373} [spinweb.de] Request failed: PUT matrix://spinweb.de/_matrix/federation/v1/send/1559924920618: ConnectionRefusedError('Connection refused',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,388 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1190 - {PUT-O-1370} [matrix.derpaholic.com] Request failed: PUT matrix://matrix.derpaholic.com/_matrix/federation/v1/send/1559924920615: DNSLookupError('no results for hostname lookup: matrix.derpaholic.com',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,389 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-135 - {PUT-O-200} [matrixsoup.pw] Request failed: PUT matrix://matrixsoup.pw/_matrix/federation/v1/send/1559924919575: DNSLookupError('no results for hostname lookup: matrixsoup.pw',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,395 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-561 - {PUT-O-660} [openpsychology.net] Request failed: PUT matrix://openpsychology.net/_matrix/federation/v1/send/1559924919954: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'openpsychology.net'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,396 - synapse.http.matrixfederationclient - 482 - WARNING - POST-663 - {POST-O-1251} [half-shot.uk] Request failed: POST matrix://half-shot.uk/_matrix/federation/v1/get_groups_publicised: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'half-shot.uk'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,397 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-822 - {PUT-O-1024} [bachmeier.xyz] Request failed: PUT matrix://bachmeier.xyz/_matrix/federation/v1/send/1559924920277: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'bachmeier.xyz'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,398 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1054 - {PUT-O-1341} [matrix.hithlum.net] Request failed: PUT matrix://matrix.hithlum.net/_matrix/federation/v1/send/1559924920586: ConnectError('Address not available',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,399 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1217 - {PUT-O-1383} [bisonpeak.com] Request failed: PUT matrix://bisonpeak.com/_matrix/federation/v1/send/1559924920628: ConnectError('Address not available',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,462 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1047 - {PUT-O-1115} [matrix.rollspel.club] Request failed: PUT matrix://matrix.rollspel.club/_matrix/federation/v1/send/1559924920368: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,463 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1159 - {PUT-O-1150} [chat.murkfall.net] Request failed: PUT matrix://chat.murkfall.net/_matrix/federation/v1/send/1559924920403: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,464 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1264 - {PUT-O-1179} [matrix.jamidisi-edu.de] Request failed: PUT matrix://matrix.jamidisi-edu.de/_matrix/federation/v1/send/1559924920432: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,465 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1426 - {PUT-O-1240} [bruder.space] Request failed: PUT matrix://bruder.space/_matrix/federation/v1/send/1559924920493: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,508 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-647 - {PUT-O-1270} [matrix.sayous.me] Request failed: PUT matrix://matrix.sayous.me/_matrix/federation/v1/send/1559924920516: ConnectionRefusedError('Connection refused',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,537 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-36 - {PUT-O-181} [matrix.matthewsmith.id.au] Request failed: PUT matrix://matrix.matthewsmith.id.au/_matrix/federation/v1/send/1559924919556: ConnectError('Host is unreachable',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,550 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1027 - {PUT-O-1338} [matrix.cigliola.com] Request failed: PUT matrix://matrix.cigliola.com/_matrix/federation/v1/send/1559924920583: DNSLookupError('no results for hostname lookup: matrix.cigliola.com',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,560 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1440 - {PUT-O-1243} [cervoi.se] Request failed: PUT matrix://cervoi.se/_matrix/federation/v1/send/1559924920496: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'cervoi.se'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,561 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1254 - {PUT-O-1177} [sciuro.org] Request failed: PUT matrix://sciuro.org/_matrix/federation/v1/send/1559924920430: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'sciuro.org'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,561 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-653 - {PUT-O-963} [haupt.bahnhof.cz] Request failed: PUT matrix://haupt.bahnhof.cz/_matrix/federation/v1/send/1559924920216: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'haupt.bahnhof.cz'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,562 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1169 - {PUT-O-1156} [hax404.de] Request failed: PUT matrix://hax404.de/_matrix/federation/v1/send/1559924920409: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'hax404.de'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,565 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-168 - {PUT-O-439} [ejpcmac.net] Request failed: PUT matrix://ejpcmac.net/_matrix/federation/v1/send/1559924919760: ConnectError('Address not available',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,584 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-564 - {PUT-O-667} [talk.seb.host] Request failed: PUT matrix://talk.seb.host/_matrix/federation/v1/send/1559924919961: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,648 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1163 - {PUT-O-1153} [xx4h.de] Request failed: PUT matrix://xx4h.de/_matrix/federation/v1/send/1559924920406: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'xx4h.de'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,695 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1378 - {PUT-O-2128} [alternanet.fr] Request failed: PUT matrix://alternanet.fr/_matrix/federation/v1/send/1559924921339: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,724 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1429 - {PUT-O-2127} [matrix.alicorn.lv] Request failed: PUT matrix://matrix.alicorn.lv/_matrix/federation/v1/send/1559924921338: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,745 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-97 - {PUT-O-198} [linuxgaming.life] Request failed: PUT matrix://linuxgaming.life/_matrix/federation/v1/send/1559924919573: TimeoutError('',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,745 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-878 - {PUT-O-1429} [chat.millcrick.com] Request failed: PUT matrix://chat.millcrick.com/_matrix/federation/v1/send/1559924920670: TimeoutError('',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,761 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1164 - {PUT-O-1367} [tilde.fun] Request failed: PUT matrix://tilde.fun/_matrix/federation/v1/send/1559924920612: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,772 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1384 - {PUT-O-1226} [matrix.exmachina.community] Request failed: PUT matrix://matrix.exmachina.community/_matrix/federation/v1/send/1559924920479: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,846 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-823 - {PUT-O-1425} [cryptmessage.org] Request failed: PUT matrix://cryptmessage.org/_matrix/federation/v1/send/1559924920668: DNSLookupError('no results for hostname lookup: cryptmessage.org',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,861 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-949 - {PUT-O-1323} [chat.mutnt.io] Request failed: PUT matrix://chat.mutnt.io/_matrix/federation/v1/send/1559924920568: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,862 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-219 - {PUT-O-361} [stevestreza.com] Request failed: PUT matrix://stevestreza.com/_matrix/federation/v1/send/1559924919683: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,863 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-766 - {PUT-O-912} [gauchiste.club] Request failed: PUT matrix://gauchiste.club/_matrix/federation/v1/send/1559924920166: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,864 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-879 - {PUT-O-2132} [tud.at] Request failed: PUT matrix://tud.at/_matrix/federation/v1/send/1559924921342: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,916 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1025 - {PUT-O-1340} [tech9.io] Request failed: PUT matrix://tech9.io/_matrix/federation/v1/send/1559924920585: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'tech9.io'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,918 - synapse.http.matrixfederationclient - 482 - WARNING - POST-838 - {POST-O-2121} [half-shot.uk] Request failed: POST matrix://half-shot.uk/_matrix/federation/v1/get_groups_publicised: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'half-shot.uk'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,919 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1142 - {PUT-O-1364} [matrix.wbatey.co] Request failed: PUT matrix://matrix.wbatey.co/_matrix/federation/v1/send/1559924920609: ResponseNeverReceived:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'matrix.wbatey.co'))])]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,928 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-663 - {PUT-O-966} [pressreleasepoint.com] Request failed: PUT matrix://pressreleasepoint.com/_matrix/federation/v1/send/1559924920219: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,929 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1374 - {PUT-O-938} [chat.askcornerstone.net] Request failed: PUT matrix://chat.askcornerstone.net/_matrix/federation/v1/send/1559924920192: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,931 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1282 - {PUT-O-1400} [matrix.bruder.space] Request failed: PUT matrix://matrix.bruder.space/_matrix/federation/v1/send/1559924920645: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,964 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1179 - {PUT-O-1371} [koopatroopas.net] Request failed: PUT matrix://koopatroopas.net/_matrix/federation/v1/send/1559924920616: ConnectionRefusedError('Connection refused',)
Jun 07 16:35:56 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:56,970 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1445 - {PUT-O-1245} [0xaa55.org] Request failed: PUT matrix://0xaa55.org/_matrix/federation/v1/send/1559924920498: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'0xaa55.org'))])]
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,003 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1065 - {PUT-O-2143} [maeli.net] Request failed: PUT matrix://maeli.net/_matrix/federation/v1/send/1559924921352: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,021 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1030 - {PUT-O-1110} [londondrugs.com] Request failed: PUT matrix://londondrugs.com/_matrix/federation/v1/send/1559924920363: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'londondrugs.com'))])]
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,032 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-397 - {PUT-O-550} [dh9ph.de] Request failed: PUT matrix://dh9ph.de/_matrix/federation/v1/send/1559924919857: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'dh9ph.de'))])]
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,039 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1213 - {PUT-O-2142} [comms.rok-on.com] Request failed: PUT matrix://comms.rok-on.com/_matrix/federation/v1/send/1559924921351: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,064 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-749 - {PUT-O-1285} [matrix-test.dev.mccme.ru] Request failed: PUT matrix://matrix-test.dev.mccme.ru/_matrix/federation/v1/send/1559924920530: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'matrix-test.dev.mccme.ru'))])]
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,178 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1108 - {PUT-O-1358} [matrix.balsillie.net] Request failed: PUT matrix://matrix.balsillie.net/_matrix/federation/v1/send/1559924920603: DNSServerError(<Message id=3940 rCode=2 maxSize=0 flags=answer,recDes,recAv queries=[Query('_matrix._tcp.matrix.balsillie.net', 33, 1)]>,)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,279 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-907 - {PUT-O-1316} [matrix.anatoli.design] Request failed: PUT matrix://matrix.anatoli.design/_matrix/federation/v1/send/1559924920561: DNSLookupError('no results for hostname lookup: matrix.anatoli.design',)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,323 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1294 - {PUT-O-2147} [emdete.de] Request failed: PUT matrix://emdete.de/_matrix/federation/v1/send/1559924921355: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,485 - synapse.http.matrixfederationclient - 517 - WARNING - federation_transaction_transmission_loop-1384 - {PUT-O-2148} [matrix.exmachina.community] Request failed: PUT matrix://matrix.exmachina.community/_matrix/federation/v1/send/1559924921356: HttpResponseException("400: b'Bad Request'",)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,775 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1255 - {PUT-O-1389} [village.mobi] Request failed: PUT matrix://village.mobi/_matrix/federation/v1/send/1559924920634: ConnectionRefusedError('Connection refused',)
Jun 07 16:35:57 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:57,876 - synapse.http.matrixfederationclient - 482 - WARNING - POST-838 - {POST-O-2121} [half-shot.uk] Request failed: POST matrix://half-shot.uk/_matrix/federation/v1/get_groups_publicised: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'half-shot.uk'))])]
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,044 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-828 - {PUT-O-1026} [matrix.liberium.fr] Request failed: PUT matrix://matrix.liberium.fr/_matrix/federation/v1/send/1559924920279: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:58 ubuntu systemd[1]: Stopping Synapse server...
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,617 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1390 - {PUT-O-1412} [matrix.ananstra.org] Request failed: PUT matrix://matrix.ananstra.org/_matrix/federation/v1/send/1559924920657: ConnectionRefusedError('Connection refused',)
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,739 - synapse.http.matrixfederationclient - 482 - WARNING - POST-663 - {POST-O-1251} [half-shot.uk] Request failed: POST matrix://half-shot.uk/_matrix/federation/v1/get_groups_publicised: RequestTransmissionFailed:[VerificationError(errors=[DNSMismatch(mismatched_id=DNS_ID(hostname=b'half-shot.uk'))])]
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,740 - synapse.http.server - 415 - WARNING - POST-663 - Not sending response to request <XForwardedForRequest at 0x7fc8573864a8 method='POST' uri='/_matrix/client/r0/publicised_groups' clientproto='HTTP/1.0' site=8008>, already disconnected.
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,821 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1210 - {PUT-O-1379} [synapse.labhijau.net] Request failed: PUT matrix://synapse.labhijau.net/_matrix/federation/v1/send/1559924920624: DNSLookupError('no results for hostname lookup: synapse.labhijau.net',)
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,865 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1380 - {PUT-O-1409} [matrix.ctf15.com] Request failed: PUT matrix://matrix.ctf15.com/_matrix/federation/v1/send/1559924920654: DNSLookupError('no results for hostname lookup: matrix.ctf15.com',)
Jun 07 16:35:58 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,888 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1212 - {PUT-O-927} [bitnexus.com] Request failed: PUT matrix://bitnexus.com/_matrix/federation/v1/send/1559924920181: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:35:59 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:58,989 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-1436 - {PUT-O-1417} [matrix.linuxlab.sh] Request failed: PUT matrix://matrix.linuxlab.sh/_matrix/federation/v1/send/1559924920662: DNSLookupError('no results for hostname lookup: matrix.linuxlab.sh',)
Jun 07 16:35:59 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:59,120 - synapse.http.site - 203 - WARNING - PUT-1050 - Error processing request <XForwardedForRequest at 0x7fc852dbaa90 method='PUT' uri='/_matrix/federation/v1/send/1559902917769' clientproto='HTTP/1.0' site=8048>: <class 'twisted.internet.error.ConnectionDone'> Connection was closed cleanly.
Jun 07 16:35:59 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:59,131 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-624 - {PUT-O-1264} [matrix.footynews.info] Request failed: PUT matrix://matrix.footynews.info/_matrix/federation/v1/send/1559924920510: ConnectionRefusedError('Connection refused',)
Jun 07 16:35:59 ubuntu matrix-synapse[22132]: 2019-06-07 16:35:59,132 - synapse.http.matrixfederationclient - 482 - WARNING - federation_transaction_transmission_loop-291 - {PUT-O-393} [synapse.finxploit.com] Request failed: PUT matrix://synapse.finxploit.com/_matrix/federation/v1/send/1559924919715: RequestTransmissionFailed:[Error([('SSL routines', 'CONNECT_CR_CERT', 'certificate verify failed')],)]
Jun 07 16:36:00 ubuntu matrix-synapse[28990]: matrix-synapse
Jun 07 16:36:01 ubuntu matrix-synapse[29453]: Error: No such container: matrix-synapse
Jun 07 16:36:01 ubuntu systemd[1]: matrix-synapse.service: Main process exited, code=exited, status=137/n/a
Jun 07 16:36:01 ubuntu systemd[1]: matrix-synapse.service: Failed with result 'exit-code'.
Jun 07 16:36:01 ubuntu systemd[1]: Stopped Synapse server.

@evilham
Copy link
Contributor Author

evilham commented Jun 7, 2019

I have setup grafana, but the graphs are useless if I enable presence, since the server just crashes.
If presence is disabled, RAM usage is pretty much the same as before v1.0.0rc1 with presence enabled, but it does keep working.
Sadly, my graph is not very useful.

@evilham
Copy link
Contributor Author

evilham commented Jun 7, 2019

Just as an update: RAM usage did keep increasing until the host crashed even with presence disabled, apparently that just bought this server some time.

Started a while true; do sv restart matrix-master; date; sleep 1800; done on a screen meanwhile (it's terrible, does the job apparently).

@aaronraimist
Copy link
Contributor

Adding our comments from #synapse:matrix.org:

For me, as long as I don't speak in large rooms the memory usage stays stable. As soon as I try to speak in #synapse:matrix.org or another large room then it crashes.

aaron: wild guess: on this channel there are more servers without valid certs than on other channels, so that makes it worse here?

Seems to be correct.

@ingothierack
Copy link

Im on 0.99.5.2 current. Had activated "federation_verify_certificates" from first install a few weeks ago.
Have memory issues from the start.
Had now deactivated for around 2 hour, and have an stable memoy usage. After set to true, memory increase again.

image

10:20 restart with federation_verify_certificates: false. 12:20 restarted with true again. Memory grows shortly to 3.4 GB from 1.5 GB before. Maybe this is only more caching. But as you can see in the graph, it can groß up to 5.5 GB.
Maybe take a look in the verification process for memory issues.

@JJJollyjim
Copy link

I'm on 0.99.5.2, reproduced @ingothierack's results.

With a procedure of start the server then post in #synapse:matrix.org:

  • federation_verify_certificates: false -> synapse process ends up around 700MB
  • federation_verify_certificates: true -> synapse process grows >1.6GB and crashes the machine (which has always functioned okay on 2GB ram)

@JJJollyjim
Copy link

I added a stub to my synapse install that dumps a tracemalloc snapshot to a file when it receives a SIGUSR1.

This time, I used a procedure of start synapse, start typing in #synapse:matrix.org (thereby sending out typing notifications), watch the memory usage.

Again, with federation_verify_certificates: false, I end up stable (around 1GB), and it doesn't increase if I keep typing.

With federation_verify_certificates: true, the process keeps growing (much faster) until the machine dies.

I sent the SIGUSR1 to the first case once it had settled, and the second case when it was around 1.5GB and about to kill the machine.

Surprisingly, there is very little difference between the two!

The largest difference is +18.3MiB in twisted/internet/defer.py, which has a total of 25.5MiB of allocations... this does not at all explain the massive memory usage.

I see two possibilities:

  • I don't know how to use tracemalloc (likely)
  • All the significant memory usage is from non-python-controlled allocations: openssl or something perhaps?

I might see if I can figure out how to debug/prove the second case.

@JJJollyjim
Copy link

image

KDE heaptrack shows that 408MB (about the increase from don't-verify to verify) is allocated and never freed by openssl.

@evilham
Copy link
Contributor Author

evilham commented Jun 8, 2019

I can can confirm the issue is with the change of default for federation_verify_certificates, so it is not specific to v1.0.0rc1 but was present before.

@JJJollyjim
Copy link

I found this likely relevant issue: pyca/cryptography#4833, which is fixed in the recently-released 2.7 (my version, provided by the latest matrix-provided deb, is 2.6)

@JJJollyjim
Copy link

(I say likely-relevant because the leaks are specifically from ASN.1 parsing, and the issue relates to X509 which is ASN.1-encoded)

@evilham
Copy link
Contributor Author

evilham commented Jun 8, 2019

@JJJollyjim: the server seeing issues on my side has cryptography==2.7, so I don't think it's that specific issue, though it is interesting to keep in mind.

@ara4n
Copy link
Member

ara4n commented Jun 8, 2019

@JJJollyjim do you get backtraces in heaptrack to see where the leaky ASN.1 is being leaked from? Also, do you get the same behaviour if you upgrade to 2.7?

@JJJollyjim
Copy link

JJJollyjim commented Jun 9, 2019

I do have the backtraces (of the C stack frames, not the python ones): all the leaks (actually about 800MB, now I've figured out how to read this) come from _cffi_f_ssl_ctx_load_verify_locations (i.e. the python binding to the openssl function ssl_ctx_load_verify_locations, which apparently is for loading CA certificates). That in turn is called by some unknown python function... _PyMethodDef_RawFastCallKeywords.

I could try and see if there is some allocation tracing thing that can also collect python function names from the stack frames. At a guess though, are we loading the CA store every time we do some sort of ssl thing, and keeping them around, instead of just having one shared copy?

@evilham
Copy link
Contributor Author

evilham commented Jun 9, 2019

Interesting. I've been trying a patch, it seems to work well so far. There is still increased RAM usage but not terribly so, there may still be room for improvement but it was written in 10 mins.

Basically, the way things are, each message sent over federation defines a new class and holds an instance of a class and there appears to be no caching whatsoever, this is the modified file:
https://matrix.unchat.cat/matrix_federation_agent.py
(Not on GH yet, sorry :-))

I'll see if I manage time-wise to add context caching and specifically look at the CA bits, but it'd be useful if someone could confirm that this is an improvement.

PS, the related file:

class EndpointFactory(object):

@JJJollyjim
Copy link

JJJollyjim commented Jun 9, 2019

The patch definitely didn't fix the whole problem for me (can't say if it had any affect, as the memory usage is still at the point where the server crashes)

Setting up a synapse instance on my workstation to test further.

@evilham
Copy link
Contributor Author

evilham commented Jun 9, 2019

Thank you. I saw lower mem usage but not too much.
Care with testing of n a workstation, since the issue is triggered when federating with many servers.

@richvdh
Copy link
Member

richvdh commented Jun 9, 2019

Thanks to everyone who has helped investigate this so far, and sorry we didn't catch it before putting out the RC.

all the leaks (actually about 800MB, now I've figured out how to read this) come from _cffi_f_ssl_ctx_load_verify_locations (i.e. the python binding to the openssl function ssl_ctx_load_verify_locations, which apparently is for loading CA certificates).

This is very helpful indeed. Some background here:

Every time we start an outbound federation connection, ClientTLSOptionsFactory.get_options is called. That in turn calls twisted.internet.ssl.CertificateOptions._makeContext, which creates a fresh OpenSSL context and then calls trustRoot._addCACertsToContext(ctx), which in this case loads the CA files from disk and attaches them to the new context.

In theory that should be fine, because the context (along with the loaded certs) gets thrown away when the connection is closed.

However, in practice, we are making several hundred outbound connections at once, and, thanks to http keepalive and connection pooling, the connections stay open for several minutes. What I therefore suspect is happening is that this is not a true leak, but simply massively increased memory usage thanks to attaching all of the CA certificates to each of the outbound connections. If Synapse lasted long enough to send all of the outbound requests and time out the connections, then we would see memory usage drop again. (The connection keepalive timeout is 120 seconds).

Intuitively it seems stupid that we are reading the certs from disk and storing them in a separate memory store for every single outbound connection; however, I'm not aware of a way to avoid this with the openssl interface. I don't think it is correct to share OpenSSL contexts between connections. Any thoughts or advice from those familiar with the OpenSSL API would be welcome.

@ara4n
Copy link
Member

ara4n commented Jun 9, 2019

https://www.postgresql.org/message-id/E1bsROr-0002Z2-25%40gemulon.postgresql.org implies that it /is/ possible to share contexts between connections, albeit fiddly (which is why postgres stopped doing so)

@richvdh
Copy link
Member

richvdh commented Jun 9, 2019

(#4673 was a bug that happened when we briefly tried sharing contexts: I'm not sure offhand if there is a better solution to that than having one-context-per-connection)

@ara4n
Copy link
Member

ara4n commented Jun 9, 2019

http://openssl.6102.n7.nabble.com/Possibility-to-cache-ca-bundle-and-reuse-it-between-SSL-sessions-td51090.html looks to give most of the answers on how the API is meant to be used for this scenario.

@hawkowl
Copy link
Contributor

hawkowl commented Jun 9, 2019

Right now, you need one, because you set the info callback (which is how you do SNI) on the context, not the connection. twisted/twisted#1128 will make it a little faster, though.

@JJJollyjim
Copy link

JJJollyjim commented Jun 9, 2019

This node.js issue from 2011 also discusses the issue: https://paul.querna.org/articles/2011/04/05/openssl-memory-use/

They link this commit nodejs/node-v0.x-archive@5c35dff which does the caching across contexts, and mention that they plan to make node.crypto reuse contexts in future -- not sure if this ended up happening.

The other issue they discuss in the article - allocation of zlib buffers for TLS compression - is likely no longer relevant since TLS compression was turned off by default in openssl 1.1.0 (2016) to mitigate the CRIME attack

@richvdh
Copy link
Member

richvdh commented Jun 9, 2019

Conclusion of discussion elsewhere is that you probably can share context objects, but have to do some hoop-jumping to get the SNI and cert verification done right.

Those interested can try playing with my rav/shared_ssl_context branch (https://github.com/matrix-org/synapse/tree/rav/shared_ssl_context). No promises that it won't blow up in your face.

@mvgorcum
Copy link
Contributor

mvgorcum commented Jun 9, 2019

For some reason the extreme memory usage seems to have gone away for me, after running RC1 for a day.
[edit] I forgot I restarted the server a couple of hours earlier. The memory usage increased loads again after sending a message in a couple of large rooms.

@neilisfragile neilisfragile added p1 z-bug (Deprecated Label) A-Performance Performance, both client-facing and admin-facing labels Jun 10, 2019
@evilham
Copy link
Contributor Author

evilham commented Jun 11, 2019

Me and others have indeed seen a huge improvement v1.0.0rc{2,3}; thanks to everyone involved.

For the record: this improvement is also showing near 20% less RAM consumption on my biggest matrix server compared to v0.99, good thing this was addressed before v1.0.0 :-).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests