Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using this package when running multiple instances #17

Open
elie222 opened this issue Jun 7, 2016 · 9 comments
Open

Using this package when running multiple instances #17

elie222 opened this issue Jun 7, 2016 · 9 comments

Comments

@elie222
Copy link
Contributor

elie222 commented Jun 7, 2016

I've been reading through the code for this package and I just realised that the following code:
UserPresenceMonitor.start();
should only be called once. Is that correct? I have 40 or so servers oberving this info when one would be enough! If this is true, the docs need to be updated to be more clear on this.

@sampaiodiego
Copy link
Contributor

hi @elie222
if you call UserPresenceMonitor.start(); in only one server, if that server goes down you'll not be able to track users status.

it is ok to call it once in multiple servers.

@elie222
Copy link
Contributor Author

elie222 commented Jun 7, 2016

It seems to be a very high drain on resources though. When you're talking
hundreds or thousands of logins, I'm pretty sure this package is causing
the cpu to ramp up. But I'm going to check over the next few days in
production. I just wanted to make sure I wouldn't destroy anything by
removing monitor from other servers
On 7 Jun 2016 15:04, "Diego Sampaio" notifications@github.com wrote:

hi @elie222 https://github.com/elie222
if you call UserPresenceMonitor.start(); in only one server, if that
server goes down you'll not be able to track users status.

it is ok to call it once in multiple servers.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AC8oX20sTpIB8vVnonrngmzroRpeRe6Cks5qJV5OgaJpZM4IvxtR
.

@elie222
Copy link
Contributor Author

elie222 commented Jun 8, 2016

So here are the results, that I find a little strange. I had UserPresenceMonitor.start(); running on 44 different servers. I then decided to stop it running on all those servers and only run on 1 server instead. This server is now at close to 100% CPU constantly. Apparently it was too much for one server to handle this entire load.

What I don't understand is, how did running UserPresenceMonitor.start(); on all the servers mitigate this problem? Surely they'd all be observing all user logins from any server in any case?

@sampaiodiego
Copy link
Contributor

hi @elie222 thanks for writing back.

we did a look into the code and some questions came up:

  • are you using the package konecty:multiple-instances-status? it's crucial to have both on a cluster setup (I'll try to make it clear on docs)
  • how many users do have on DB (collection users)?
  • how many logged-in users you have on average?
  • how many rows do you have on the usersSessions collection?

also, it shouldn't hang on 100% CPU if you are using the konecty:multiple-instances-status package..

@elie222
Copy link
Contributor Author

elie222 commented Jun 8, 2016

Yes. I was running both.

I've had some really strange things happening with my app recently. I'm going to try and read through the code for multiple-instances-status too to see how it interacts with this package.

I just tried running UserPresenceMonitor.start(); on all 44 instances of my app, but it didn't help anything. Each client just hit 100% CPU straight away and didn't go down.

Here is a typical graph from DO:
screen shot 2016-06-09 at 1 54 50 am

I am confused as to why bandwidth spikes like that. Any ideas?

Apart from the 44 instances, there were about 1000 clients connected at the time. Perhaps 100-200 people actively using the website.

@rodrigok
Copy link
Contributor

Is this still happening?

@elie222
Copy link
Contributor Author

elie222 commented Sep 10, 2016

So my app is fine these days. I think the big change I made was
disconnecting users that were disconnected for a certain amount of time. I
do only run user presence on one server and not all servers because I don't
see the need to have multiple servers doing the same task.

I would be interested to see how this package works with thousands of
concurrent users. Has rocket chat ever had to deal with that and has it
been okay? From my observations it seems like it may be problematic.

On 10 Sep 2016 03:35, "Rodrigo Nascimento" notifications@github.com wrote:

Is this still happening?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC8oX7CA-vcCQt5yetGYmh1ORVvxThE8ks5qoftXgaJpZM4IvxtR
.

@rodrigok
Copy link
Contributor

@elie222 It's working great for Rocket.Chat now, but we don't have installations with thousands of online users at the same time.

@AmShaegar13
Copy link

@sampaiodiego @rodrigok @elie222 This sounds pretty much like the problem we are facing. I am surprised, that no one ever came across this issue with Rocket.Chat before.

RocketChat/Rocket.Chat#11288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants