Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make client and server to resync active connections #74

Merged
merged 23 commits into from
Jun 28, 2024

Conversation

aruiz14
Copy link
Contributor

@aruiz14 aruiz14 commented Apr 5, 2024

Issue: 44576

Relates to #68
Depends on #78

Problem

remotedialer allows multiplexing connections between two peers using a single websocket connections by including a connection ID in the messages and using separate buffers. The protocol specifies different message types for different actions (Connect, Data, Error, Pause, Resume, etc.). In particular, the Error type is used to communicate the other end that a certain connection must be closed. However, depending on the cause of the original error, this message may never be successfully transmitted, as the sender will give up on sending it (#67 adds additional logging for this situation).

When this happens, one of the peers will never receive a termination message for that connection, making the underlying buffers to get stuck on Read() forever, hence causing goroutine and memory leaks.

Solution

This PR adds a new message type to the protocol (Resync), whose payload contains a list of connection IDs. Similarly to how clients sends Ping control messages, Resync messages will periodically tell the receiving peer that any connection not contained in the provided list is no longer needed and can be pruned.

Small caveat: we cannot use Control messages for this purpose, since websocket set a limit of 125 bytes for their payload, which would impose a tight restriction on the number of connections.

CheckList

  • Test

@aruiz14 aruiz14 requested a review from a team as a code owner April 5, 2024 09:55
@aruiz14 aruiz14 requested review from moio, tomleb, rmweir and MbolotSuse April 5, 2024 09:55
session_resync.go Outdated Show resolved Hide resolved
connection.go Show resolved Hide resolved
session_resync.go Outdated Show resolved Hide resolved
message.go Outdated Show resolved Hide resolved
session_resync.go Outdated Show resolved Hide resolved
session_resync.go Outdated Show resolved Hide resolved
connection.go Show resolved Hide resolved
moio
moio previously approved these changes Apr 9, 2024
Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks

session_sync.go Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
Copy link
Contributor

@tomleb tomleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to follow what we discussed during our meeting.

Given the potential deadlock mentioned in the comments below, I think this requires more testing. It should be tested in a custom Rancher build and it would be beneficial imo to have integration tests for this, though I understand that right now remotedialer doesn't have this in place.

session_sync.go Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
message.go Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Show resolved Hide resolved
types.go Outdated Show resolved Hide resolved
session_sync_test.go Outdated Show resolved Hide resolved
moio
moio previously approved these changes Jun 13, 2024
aruiz14 added a commit to aruiz14/remotedialer that referenced this pull request Jun 13, 2024
Copy link

@maxsokolovsky maxsokolovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly nits at this point, since I am not yet familiar with this project. I may have to come back and think about the solution itself once I learn a bit more about remotedialer.

message.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
session_sync.go Outdated Show resolved Hide resolved
session_sync_test.go Show resolved Hide resolved
session_sync_test.go Show resolved Hide resolved
session_sync_test.go Show resolved Hide resolved
session_test.go Show resolved Hide resolved
session_test.go Show resolved Hide resolved
session_sync_test.go Outdated Show resolved Hide resolved
session_test.go Outdated Show resolved Hide resolved
session_sync_test.go Outdated Show resolved Hide resolved
@moio
Copy link
Contributor

moio commented Jun 24, 2024

@maxsokolovsky do we have your final blessing to merge this, so that it has a chance to get into future 2.9.0 alphas?

aruiz14 added a commit to rancher/rancher that referenced this pull request Jun 26, 2024
@maxsokolovsky maxsokolovsky merged commit 8acbd20 into rancher:master Jun 28, 2024
1 check passed
@aruiz14 aruiz14 deleted the resync-protocol branch July 1, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants