-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runaway RAM usage #908
Comments
This is not an issue with this repo but rather synapse itself matrix-org/synapse#7339 There is of course a solution for this in the repo:
Tl;dr Don't join large rooms. |
Thanks @skepticalwaves . I already am following most of those recommendations, but the large rooms are part of the reason I'm using this in the first place. I have accepted the RAM usage of Synapse. The problem I'm experiencing is related to the dramatically increased RAM usage by Postgres. As I've been researching this further, it seems that the tuning mentioned in #532 no longer is in the code base. I'm not sure exactly where it's gone, but I do see an option to add some -c to Postgres there in the yml so I'll experiment with some tuning there. Thanks! |
Perhaps you can attach to the postgres container and try analyzing what's going on with pg_top: Its worth figuring out what postgres is doing before attempting to tune it. |
#642 says how you can pass additional flags to Postgres: matrix_postgres_process_extra_arguments: [
"-c 'max_connections=200'"
] |
Initially,
Seems to have made a dramatic improvement. I will keep an eye on things over the weekend and see if it has maintained it. |
The mem spikes appear randomly, and it's easy to fall for the placebo effect. Keeping that in mind, I've not seen crippling swap usage ever since I added the following option to my {
"exec-opts": ["native.cgroupdriver=systemd"]
} |
An update... The changes I made increased the amount of time before an OOM, but eventually that behavior returned (now after a week or two instead of a few days). I tried pg_top but was unable to figure out where the memory was being used. I am wondering if it's possible there's a memory leak, but I'm also trying to tweak some other parameters (eg, disabling huge_pages). I will report back what I find. |
Noticed this too on perthchat.org, we currently don't have a room complexity limit. jgoerzen you should be aware that disabling fsync is dangerous, it removed atomicity, meaning that an unexpected shutdown of your server could leave your DB corrupted. I like to use the much safer '-c synchronous_commit=off' setting, which maintains atomicity but removes the requirement that DB updates must be written to the disk before they are acknowledged. |
I continue to see memory usage for the Postgres processes gradually increasing over a period of days until it reaches over 300MB per process and triggers OOM. Changes to settings have sometimes slowed this behavior, but not solved it. This thread https://www.postgresql-archive.org/BUG-16707-Memory-leak-td6161863.html mentions JIT as a possible source of leakins in PostgreSQL 12. I'll try disabling that next and see what happens. The filesystem on here is backed by ZFS and can be trivially rolled back to an earlier snapshot, but your point about fsync is a good one for me and for other travelers. I was willing to take the risk for diagnosis given the ZFS backing but not everyone would be. |
We've also noticed raising the global cache factor might make it "runaway" slower, more caching in Synapse can reduce the strain on the DB. We jumped from 2.0 to 4.0 and it wasn't as bad. (6 core, 24GB RAM with ~100 users and 1 worker.) |
matrix-org/synapse#10440 txn_limit could help |
Hi folks,
After running this for a long time, recently I have seen frequent out-of-memory conditions. It's running in a KVM VM, and I've increased its RAM from 3GB, to 4GB, 5GB, 6GB, and still every few days it all hangs.
This system serves only one user: me. I am in some large channels.
When there are issues, here's what top sorted by RAM looks like:
I believe it's these Postgres processes that are responsible for all the growth. The Synapse process has always been at around that size.
I see #532 but it seems to be targeting tuning for very large systems. Moreover, it doesn't seem to be editing things in .yml files, making me think the changes wouldn't be persistent.
It would be great to see some documentation on how to tune for lower-RAM situations as well as how to make those changes persistent in .yml. Thanks!
The text was updated successfully, but these errors were encountered: