You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
The immediate symptom I've been tracking is my postgres database server getting OOMkilled every 24-28 hours.
After setting up metrics for my database box and synapse, and keeping an eye on them for several days, I noticed a clear association between requests to the RoomMessageList servlet and jumps in memory usage on the database server. A stark example:
Furthermore, during a times of high RAM usage, I looked at postgres connections with the highest resident set size (RSS) as reported by ps and compared with the pg_stat_activity table to see which worker the connections were associated with. When nearing the server's memory limit, the connections for the worker handling this endpoint were using nearly double the RAM of any other connection:
RSS (KB)
application_name
361792
homeserver
362364
homeserver
362500
homeserver
366628
homeserver
368696
homeserver
372620
homeserver
373136
homeserver
375660
homeserver
376176
homeserver
385900
homeserver
605928
room_message_lister
712208
room_message_lister
732016
room_message_lister
734624
room_message_lister
760680
room_message_lister
768392
room_message_lister
779268
room_message_lister
852376
room_message_lister
876140
room_message_lister
922448
room_message_lister
Specifically, this worker is handling all requests for ^/_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages$. I'm using redis replication on the homeserver.
Steps to reproduce
I haven't tried to reproduce this in a clean environment, but here's the nginx block for this worker:
It seems a bit surprising that those are using so much memory. Maybe someone is making a request to retrieve an extremely large amount of messages? It would be useful to see INFO logs for the room_message_lister worker.
Description
The immediate symptom I've been tracking is my postgres database server getting OOMkilled every 24-28 hours.
After setting up metrics for my database box and synapse, and keeping an eye on them for several days, I noticed a clear association between requests to the RoomMessageList servlet and jumps in memory usage on the database server. A stark example:
Furthermore, during a times of high RAM usage, I looked at postgres connections with the highest resident set size (RSS) as reported by
ps
and compared with thepg_stat_activity
table to see which worker the connections were associated with. When nearing the server's memory limit, the connections for the worker handling this endpoint were using nearly double the RAM of any other connection:Specifically, this worker is handling all requests for
^/_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages$
. I'm using redis replication on the homeserver.Steps to reproduce
I haven't tried to reproduce this in a clean environment, but here's the nginx block for this worker:
And here's the worker's config file:
and the worker's systemd unit:
I can reproduce portions of the homeserver config or metrics that might help debug upon request.
Version information
Homeserver: https://matrix.cybre.space
Version:
{"server_version":"1.20.1 (b=master,86a72d1)","python_version":"3.6.8"}
Install method: pip
Platform: Ubuntu 18.04 VPS, not containerized.
The text was updated successfully, but these errors were encountered: