-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gbn: set resend timeout dynamically #88
gbn: set resend timeout dynamically #88
Conversation
13d35f1
to
6a54977
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a review comment regarding the queue handshakeTimeout
below.
Also in general, when making this PR, one alternative approach I considered, was to continuously update resend timeout throughout the lifespan of the connection, and not just set it during the handshake process. I.e. we could instead continuously set it based on how long it took to get a response for random packets we send with some set frequency. Do you think that approach would make more sense? The benefit of that approach would be that resend timeout will reflect if the latency changes throughout the lifetime of the connection, which might be good if we have really long lived connections.
6a54977
to
b033f3d
Compare
Rebased on the latest version of #87. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make things a bit more modular and generic here.
I think it would be cool do having something like:
type TimeoutManager struct {
....
}
NewTimeOutManager(defaultTimeout, useStaticTimeout, multiplier, ...)
(m *TimeoutManager) Sent() {}
(m *TimeoutManager) Received() {}
(m *TimeoutManager) GetTimeout() time.Duration() {}
Then we can use it as follows:
- everywhere in GBN where we want to start a new timer, query the new manager's
GetTimeout
method to get what time to use. - as is done in this PR, do 1
Sent/Received
cycle during the handshake. - Then, for each new sequence number we send & when we get the corresponding ACK - do a Sent/Received cycle.
This will be nice cause then the GBN doesnt need to have all this logic in it and can just trust what the timeout manager says :)
6f87bfa
to
fcc81ed
Compare
Great idea regarding modularising it to a Timeout manager @ellemouton! |
b5a2195
to
fd1f61b
Compare
gbn/gbn_conn.go
Outdated
@@ -118,51 +103,46 @@ func newGoBackNConn(ctx context.Context, sendFunc sendBytesFunc, | |||
|
|||
ctxc, cancel := context.WithCancel(ctx) | |||
|
|||
timeoutManager := NewTimeOutManager(isServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you think I should dependency inject the timeout manager rather than instantiating it here!
fd1f61b
to
66cc660
Compare
Rebased on #87 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition! Gonna be a game changer!
Left a few comments. Also:
- does this need to be based on gbn: Sender side only packeting resend bug fix #87? afaikt it doesnt depend on that change - so perhaps better to decouple them since it isnt clear which one will get in first yet.
- A test for TimeoutMgr should please be added :)
gbn/timeout_manager.go
Outdated
resendMultiplier int | ||
resendMultiplierMu sync.RWMutex | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can use functional options for this instead & then dont need the mutex to set it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, thanks! I'm learning a lot by this :)! I'll work on fixing that tomorrow.
gbn/timeout_manager.go
Outdated
|
||
multipliedTimeout := time.Duration(m.resendMultiplier) * responseTime | ||
|
||
if multipliedTimeout > defaultResendTimeout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like default & minimum should be separate values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, agree! Let me know if you'd like me to change the minimum value as well, as I set both to 1sec now.
try rebasing linter fix: https://github.com/lightninglabs/lightning-node-connect/actions/runs/6930173289?pr=92 |
f39b75d
to
95cb429
Compare
Thanks 🔥🚀!! I've addressed your feedback with the latest push 🚀!
I'll provide a more detailed answer to this tomorrow, but will provide a short answer for now. I should probably have mentioned this in the PR description previously, but the reason I've based this on #87, was that I noticed some potential compatibility issues when manually testing this, if only 1 of the 2 counterparties updated to a version of LNC that set the resend timeout to a different value than the default. It led to a resend loop of the packets that seemed potentially even worse than original packeting resend bug, but the fix in #87 largely fixed that if that party had also implemented that fix. I'll do more thorough testing tomorrow though, and see if my initial tests were correct.
This has been added now 🔥 |
95cb429
to
8906c94
Compare
Ok so there turned out to be a few more issues effecting the itests, some that which were pre-existing and some some which where made a bit worse by this PR that now increases the resend timeout: (1) (2) I therefore changed the handshake logic for the server, so that if the server receives a SYNACK or PacketData after having sent it's SYN but timing out in the handshake, it would treat those as the completion of handshake. This was updated with b447950. Do you agree that this is a correct handling in the handshake? (3) @ellemouton, given the discussion in #87 (comment), is this way of ensuring that (4) I should probably add that I haven't really had that bad latency so it hasn't been an issue for me yet, but I've been close to it when testing. Therefore I changed this as a precaution, but will drop this commit if you think it's an unnecessary addition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great investigations and bug hunting! Very excited to see the itests being stable with this PR as well.
Just one comment about a mutex, otherwise this LGTM 🎉
gbn/timeout_manager.go
Outdated
// SetStaticResendTimeout sets the resend timeout to the given value, and also | ||
// marks the timeout manager as using a static resend timeout, which will mean | ||
// that the resend timeout will not be updated dynamically. | ||
func (m *TimeoutManager) SetStaticResendTimeout(resendTimeout time.Duration) { | ||
m.timeoutManagerMu.Lock() | ||
defer m.timeoutManagerMu.Unlock() | ||
|
||
m.resendTimeout = resendTimeout | ||
m.useStaticTimeout = true | ||
} | ||
|
||
// SetHandshakeTimeout sets the handshake timeout. | ||
func (m *TimeoutManager) SetHandshakeTimeout(handshakeTimeout time.Duration) { | ||
m.timeoutManagerMu.Lock() | ||
defer m.timeoutManagerMu.Unlock() | ||
|
||
m.handshakeTimeout = handshakeTimeout | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should not be methods. It doesnt make sense to change them during the lifetime of the timeout manager. These should be functional options instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean specifically multipliedTimeout
and SetStaticResendTimeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those functions have now been removed!
gbn/timeout_manager.go
Outdated
@@ -435,6 +493,8 @@ func (m *TimeoutManager) SetStaticResendTimeout(resendTimeout time.Duration) { | |||
defer m.timeoutManagerMu.Unlock() | |||
|
|||
m.resendTimeout = resendTimeout | |||
m.resendBooster.SetOriginalTimeout(resendTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead set this timeout at the time of initialising the resendBooster. You can do this by just re-shuffling the order of things in NewTimeOutManager
:
1: construct TimeoutMgr
2. Apply it's functional options.
3. initialise boosters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Good idea, I've implemented this with the latest push. It required me to now add the resendBoostPercent
& handshakeTimeout
fields to the TimeoutManager
though, which isn't ideal. But I think the new version is much better than the last version though, so I definitely think that's worth it.
256546e
to
549c11a
Compare
Thanks a lot for the reviews @guggero & @ellemouton! @ellemouton, please specifically check my response to #88 (comment), as that wasn't addressed with this latest push! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending one last comment regarding a possible race.
Also tested this with old server & new client & a restarting mailbox 👍
Really great work 🎉
549c11a
to
9e6babd
Compare
Thanks a lot for the review and testing @ellemouton 🚀🔥!! Let me know if you disagree with #88 (comment), else this should be good to go 🎉🚀!! |
9e6babd
to
97d5f64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff @ViktorTigerstrom!!!
Ok this is basically good to go. Added 2 final thoughts :)
0eee1d2
to
14ae529
Compare
After discussing this with @ellemouton offline, I've updated so that the |
The timeout manager is a manager for all timeout values used by the different components of the gbn package. The timeout manager enables the functionality of dynamically changing the resend timeout value based on how long it takes to receive an response from the counterparty. To make the gbn package more modular, we'll move the responsibility of managing the timeout values of the gbn package to the timeout manager in the next commit.
This commit moves the responsibility of managing the timeout values throughout the gbn package to the `TimeoutManager` struct. Even though the `TimeoutManager` is now used to manage timeouts, the resend timeout is not yet set dynamically for the connections of the gbn package. Support and usage of that functionality will added in the upcoming commits.
This commit changes the gbn `Config` struct to not specifically contain any timeout fields, but instead contain functional options that modifies the `TimeoutManager` struct. These options are then applied to the `TimeoutManager`.
This commit adds support for setting the resend timeout dynamically for a connection. Prior to this commit, the timeout before a client or server resends the queue of packets, was always set to fixed value. A fixed timeout isn't suitable for all connections though, as the latency for different clients varies. With this commit, we instead add support for setting the resend timeout based on: * How long it took for the other party to respond during the handshake process * How long it took for the other party to respond with the connect ACK for a sent data packet. The timeout is set then to the time it took for the server to respond, multiplied by the resendMultiplier, unless the duration is shorter than the default resend timeout. Note though that if a connection's resend timeout is set manually set through the WithStaticResendTimeout, the resend timeout will always be set to that value, and won't be dynamically updated.
With this commit, the resend timeout will now be set dynamically for the connections of the gbn package.
TimeoutBooster is a type can be used to boost the value of a timeout by a specified percentage. The timeout can be boosted multiple times, and number of boosts are then cumulative.
When we need to resend a data packet, or when we need to resend the SYN message during the handshake, due to the other side not responding within given timeout, we will boost the timeout by 50% for each time we need to resend without receiving any response. This ensures that if the original timeouts are set to a too short duration given the current network latency, we will eventually boost the timeouts to a long enough duration that allows the other side to be able to respond within the timeout.
If we have awaited a sync after sending the previous ping, both the pingTicker and pongTicker may have ticked when waiting to sync. In that case, we need to ensure that the pongTicker gets prioritized, which this commit ensures.
This fixes a bug where the server would not complete the handshake if the server restarted the handshake after sending it's SYN, which the client then received and then completed the handshake. This could previously happen if the server timedout when waiting for the clients SYN response, or if the client's SYNACK was lost due to packet loss.
14ae529
to
931a7c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
Based on #87
This PR updates the resend timeout to be dynamically set based on how long it took for the other party to respond during the handshake process.
The timeout is set to the time it took for the server to respond multiplied by the resendMultiplier, unless the duration is shorter than the default resend timeout. If the the resend timeout has been manually set, the resend timeout will always be set to that value, and won't be dynamically set.
Prior to this PR, the timeout before a client resends the queue of packets was always a fixed value. This fixed timeout isn't suitable for all clients as the latency for different clients varies.