-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't rejoin room if network connectivity is interrupted #84
Comments
@thazhemadam Interesting that this issue is still occurring. I'm not able to reproduce the issue from the steps. As of now following points have already been implemented which is similar to the potential solution.
|
Another idea could be encoding the "Cool nickname" for a device, with the MAC address of the device. |
Thanks for providing this, I'm able to reproduce the issue now.
Yes, this can be a potential fix too. However, MAC addresses aren't accessible from JavaScript client directly. Instead, maybe we can allot a random unique id for every client when they use Blaze for the first time.
The major reason for preventing the same nicknames in the same room was a loose attempt in preventing same client from joining same room twice. Since it is possible for multiple clients with same nicknames it would make sense to switch to a unique id based solution. @thazhemadam I'm currently marking this issue as a bug as we first need to investigate where the two existing solutions are failing, if and how it can be fixed. |
@blenderskool Did you try removing Device B from the room when it loses connection? That might fix the problem. How are you keeping track whether a user (Device) is present in the room or if it has left? |
@RotonEvan I don't think the server(which maintains the list of sockets) can be notified about the loss of socket when the socket itself loses connection. It currently performs a check on all sockets at every interval to remove dead sockets. |
That should help only if you decrease the interval time to 1 sec
Are you setting the For a solution, as you do not want a client to join same room twice, we need to detect the network lose from client side itself and add a state of network lose in the state machine. You need to put this logic in code. Else decreasing socket heartbeat interval to 1 sec might help too I think. I can help in fixing this. We can discuss and decide on a solution. I'll make a PR if we get a solution. :) |
Yes, this is the most straightforward fix for now. I'm thinking if there is another way of doing this without having to check every second (which might become a time consuming operation in case of large number of sockets?)
Yes,
On the client side, this check is being made via the In summary, I think the best option in terms of the fix is to decrease the duration of interval as you mentioned earlier 🙂 |
Can I work on this? I know it's a really easy fix but might help me in hacktoberfest :p |
@RotonEvan Yes sure go ahead, but may I know the changes you will be making to solve this? Is it decreasing the heartbeat interval to 1s? |
@blenderskool you tell what seems to be the best fix for this issue. Decreasing interval is a way out. Maybe the best one even as of now. |
@RotonEvan Decreasing interval makes the most sense as this is exactly why the ping/pong mechanism is mentioned in the WebSockets spec. |
This doesn't really sound like a permanent fix to me. The overhead incurred by performing this check every 1s will be much greater. It becomes a bottleneck for scalability. Secondly, what happens if this happens when Device B is transferring a file? Does the file get only partly received? Or does the received file get deleted? |
Yep, as I said earlier this is a major concern with decreasing the timeout interval which is why it was set to 30s in the first place.
There are two situations for understanding what happens here:
Well, this is a separate issue altogether and probably not what this issue was intended to be.
I think this is also geared towards dealing with file transfer if there's a connection loss, which can be a separate issue. |
Adding to what @thazhemadam said earlier, this might be the best approach (with unique ids instead of MAC addresses) till now (if we aren't decreasing the interval of ping/pong) to solve this issue. The server would still however not know that a peer has lost connection, which gives other peers in the same room a false impression that the peer (who lost connection) is still connected. |
You aren't logging in any user right? So in that case a unique ID is a bad idea as you would have to save the ID in browser cookies. Bdw can you tell me where you are processing the check that whether a user is already there in the room? Might get some idea from there... |
Umm I don't understand how it's bad because we have to store the id locally. Possibly localStorage as that's where all data is stored.
Check |
Cookies is bad as cuz it's not permanent as isn't in the control of the code. But about localStorage...what data do you store? |
We are not using cookies for storing data :)
Maybe you can take a look at the code from your side too 😛 |
Okay so...the problem is that your system isn't getting updated instantly when there is a connection failure for one device (consistency issue). To compensate that we can decrease the heartbeat interval. Now, you say a unique ID is a good solution. Let's first clear a few things.
The server needs to be updated. The quicker the better. This is the only way you can ensure consistency during a network partition. |
Yes, there's an issue already for the same #72
Yes, this is the caveat (quoted below) I mentioned in the earlier messages.
But I think the approach suggested by @thazhemadam was mainly to allow a peer (who lost connection) to rejoin a room within the 30s interval without getting the User with same name exists in this room error. If the peer joins within that 30s interval, well and good, otherwise the ping/pong would remove the peer after the interval is over. Keeping in mind that this approach does not let the server/other peers know that some peer lost connection and left the room.
Any thoughts on how much this would be decreased to? We can't go with |
Normally to get back network connection it takes atleast 10s. However it is necessary to update the server as early as possible. So I think we can settle somewhere between 5-10s. 7 or 8 seems fine. Tell me one, I'll update the interval in a PR. And then I can try #72 and also the 30s buffer one separately. |
@RotonEvan Ok makes sense. We can try between 10s-15s. @RotonEvan @thazhemadam From what I understand with the discussion here and issue #72, If let's say after the refactoring from issue #72, a peer (let's call it peer 1) with an id (which already exists in the room) tries to join a room, server first sends a ping to the already joined peer (let's call this peer 2) in the room. Following cases can occur here:
|
Got busy with some other work, will update the interval with 12s as of now and create a PR.
@blenderskool yeah these modifications seems good. #72 needs to get solved. |
Heartbeat interval decreased to 12s from 30s.
… room If a user A is trying to join a room which already has user B with the same name as A, Blaze first checks if user B is "alive" by sending PING message and anticipating a PONG within some timeframe. If a PONG is received by user B, user A is not let into the room because of the name clash. However, if the PONG is not received by user B within the alotted timeframe, it is assumed that user B has disconnected and the stray socket on server is closed. User A is also let in because there is no name clash in this scenario. Fixes #84
Closing this issue as it has been fixed in commit 076bdf2 and will be released in Blaze v3.0.0 |
Issue
If one of the devices loses network connectivity after joining a room, said device does not seem to be able to re-join the same room right after.
Steps to Reproduce
A Connection Error that says User with same name exists in this room gets displayed.
Potential Solution
Maintaining a temporary state locally which keeps tracks of connectivity, maybe?
Periodical ACKs from receiver device might also help.
The text was updated successfully, but these errors were encountered: