-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix frame leak due to undrained send buffer #860
Conversation
connection.go
Outdated
// Remaining frames in sendCh must be drained and released, | ||
// or we will leak frames | ||
defer func() { | ||
for len(c.sendCh) > 0 { | ||
c.opts.FramePool.Release(<-c.sendCh) | ||
} | ||
}() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can act as a "best-effort" release frames, but there's many race conditions which can result in frames being written to sendCh
after the drain here. In the past, TChannel used to close(c.sendCh)
as part of closing a connection, but it was removed as there was many race conditions that could lead to the close happening while some other goroutine wrote to the channel, see
93ef5c1
The same race conditions that would have caused a panic when closing sendCh
will likely cause this drain to "miss" frames from being released.
It's worth keeping in mind that a frame leak from the pool is not a memory leak -- when a connection is closed + removed, the GC will clean up the connection, sencCh
, and all frames in that channel.
I think having the best-effort is OK, but the comment should indicate that it's not a complete solution.
If we want this logic, it should ideally happen as late in the lifecycle for a connection as possible, when we're as close to being confident that there's no other writers to sendCh
. E.g., when the connection state changes to closed,
https://cs.github.com/uber/tchannel-go/blob/751f26a5d1d895affe75a1edd208d63fe3b7b109/connection.go?q=repo%3Auber%2Ftchannel-go+%22connectionStartClose%22#L847
(We could also keep the logic in writeFrames
, and have writeFrames
block waiting for stopCh
before draining sendCh
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good pointers! even if it's best-effort I think it's still useful since it'll help improve the signal of the tests since it won't mask leaks from other code paths.
moved the drain back into writeFrames
and blocked on <-stopCh
connection.go
Outdated
@@ -858,6 +858,14 @@ func (c *Connection) checkExchanges() { | |||
func (c *Connection) close(fields ...LogField) error { | |||
c.log.WithFields(fields...).Debug("Connection closing.") | |||
|
|||
// Remaining frames in sendCh must be drained and released, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was nice that writeFrames
was the only place that consumed from sendCh
. I don't think this method is the right place since it can be called from many places. More details in the next comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, this was my idea - sorry @cinchurge. Seemed like "draining" as part of general shutdown was a reasonable thing to do. But maybe not :)
Codecov Report
@@ Coverage Diff @@
## dev #860 +/- ##
==========================================
+ Coverage 88.99% 89.16% +0.16%
==========================================
Files 43 43
Lines 4435 4439 +4
==========================================
+ Hits 3947 3958 +11
+ Misses 368 363 -5
+ Partials 120 118 -2
Continue to review full report at Codecov.
|
bump, i think i've addressed all the issues, can you guys take another look? |
relay_test.go
Outdated
@@ -831,8 +834,7 @@ func TestRelayStalledConnection(t *testing.T) { | |||
AddLogFilter("Dropping call due to slow connection.", 1, "sendChCapacity", "32"). | |||
SetSendBufferSize(32). // We want to hit the buffer size earlier, but also ensure we're only dropping once the sendCh is full. | |||
SetServiceName("s1"). | |||
SetRelayOnly(). | |||
SetCheckFramePooling() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this removal on purpose?
It's a bit surprising given the diff is expected to make the tests "less leaky", right?
connection.go
Outdated
@@ -891,6 +900,14 @@ func (c *Connection) close(fields ...LogField) error { | |||
// Close starts a graceful Close which will first reject incoming calls, reject outgoing calls | |||
// before finally marking the connection state as closed. | |||
func (c *Connection) Close() error { | |||
// Remaining frames in sendCh must be drained and released, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still want this, or is it a left-over from the previous iteration?
The PR description mentions only changes in writeFrames
, and your other comment says "as discussed, prefer to keep all code that consume sendCh close together", which seems to contradict this piece of code?
a04830d
to
7d13f32
Compare
Follow up to #845 Currently in `Connection.writeFrames()`, we return immediately upon a connection error. However, at that point there may still be frames stuck in the send buffer (`Connection.sendCh`) which haven't been sent due to the connection issue. Leaving it in such a state results in a frame leak since they will never be released. Change `writeFrames()` to always drain the send buffer upon return to fix this behavior.
7d13f32
to
3511384
Compare
* Fix frame leak due to undrained send buffer Follow up to #845 Currently in `Connection.writeFrames()`, we return immediately upon a connection error. However, at that point there may still be frames stuck in the send buffer (`Connection.sendCh`) which haven't been sent due to the connection issue. Leaving it in such a state results in a frame leak since they will never be released. Change `writeFrames()` to always drain the send buffer upon return to fix this behavior.
Follow up to #845
Currently in
Connection.writeFrames()
, we return immediatelyupon a connection error. However, at that point there may still
be frames stuck in the send buffer (
Connection.sendCh
) whichhaven't been sent due to the connection issue. Leaving it in
such a state results in a frame leak since they will never be
released.
Change
writeFrames()
to always drain the send buffer upon returnto fix this behavior.