-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Every node.js at 100%CPU usage after v3.0.4 upgrade #17025
Comments
@magicbelette would you share with us the number of online users you are serving with your setup? Currently we have one VM with 64 cores and around 20 RC instances for around 2000 online users. Yesterday we had an issue where all instances spiked to 100% CPU. We saw the same "Stream broadcast from xxx to xxx with name xxx not authorized" errors in our logs. I hesitate to further increase the number or RC instances because I read that every instance has to contact every other instance which could potentially create bottlenecks. Version: 2.4.8 |
@bbrauns thanks for your answer. |
sir,I think,3.* is every unstable。Could you give a hand,but: |
OK, that's just a status report. So.. Could it be due to this change @rodrigok ? |
@magicbelette |
We had another issue today. Starting a new instance leads to massive cpu usage of all the other rc instances. Starting multiple instances at once brings down the whole cluster. |
Got the same issue... Can you try to get back to this version ? I'm setting up a new cluster to test with 64 instances. |
Sorry I can't test in production. Customers are already mad. Why do you find b95cb64 suspicious? |
Just because this was the only change somehow linked to streambroadcast... But it seems that the core team was quite busy to night: As I saw a lot of |
Ok makes sense. But these changes aren't in v3.0.9 or am I wrong? |
Those were included in 3.0.8 |
Hi @magicbelette |
Hi, the situation is far more comfortable in version 3.0.9. Plus, this one #17115 should improve performance too |
Description:
Migrating from 2.2.1 to 3.0.4, every node.js are stuck at 100%CPU usage and Rocket.Chat becomes unresponsive.
Actual behavior:
The first peak at midnight is node.js instances restarting for 3.0.4 upgrade. Then in the morning it starts growing as users connect, Rocket.chat becomes (very) laggy and the load never decrease. Finally go back to Rocket.Chat 2.2.1, CPU load decreases and even if high load at the become, Rocket.Chat is still usable in 2.2.1.
Server Setup Information:
Additional context
Node.js repartition :
host 1 with 8CPU : 7 node.js instances
host 1 with 64CPU : 62 node.js instances
host 1 with 32CPU : 30 node.js instances
That's maybe a bad idea to have a lot of node.js instances on the same server and keep small 4CPU servers with 3 node.js instances each ?
Relevant logs:
A lot of differents errors but mainly, node.js instances seem to keep discovering each other :
The text was updated successfully, but these errors were encountered: