Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lingering Deadlock when Write and CloseRead called concurrently in 1.8.6 #248

Closed
RTann opened this issue Jun 5, 2020 · 6 comments
Closed
Labels
Milestone

Comments

@RTann
Copy link

RTann commented Jun 5, 2020

In the fix for the deadlock introduced in 1.8.5, we have the following:

err = c.writeFrameMu.lock(ctx)
if err != nil {
	return 0, err
}
defer c.writeFrameMu.unlock()

// If the state says a close has already been written, we wait until
// the connection is closed and return that error.
//
// However, if the frame being written is a close, that means its the close from
// the state being set so we let it go through.
c.closeMu.Lock()
wroteClose := c.wroteClose
c.closeMu.Unlock()
if wroteClose && opcode != opClose {
	select {
	case <-ctx.Done():
		return 0, ctx.Err()
	case <-c.closed:
		return 0, c.closeErr
	}
}

...

If there is a (*Conn).Write that passes the if condition, then it waits for at least one of two channels to close, while still holding the writeFrameMu lock. Since c.wroteClose is set before (*Conn).CloseRead gets to call (*Conn).writeFrame (see (*Conn).writeClose), it is possible that (*Conn).CloseRead calls (*Conn).writeFrame after (*Conn).Write gets stuck, so (*Conn).CloseRead will get stuck waiting to get writeFrameMu, and no progress can be made.

One option would be to unlock writeFrameMu immediately after passing the if condition. defer c.writeFrameMu.unlock() would need to be removed since the mutex implementation doesn't allow for unlocking twice.

@RTann RTann changed the title Lingering Deadlock when Write and Close called concurrently in 1.8.6 Lingering Deadlock when Write and CloseRead called concurrently in 1.8.6 Jun 5, 2020
@nhooyr
Copy link
Contributor

nhooyr commented Jun 5, 2020

I believe the if condition prevents it. It won't block on the two channels if a close is being written.

I know the code flow is confusing, it'll be cleaned up soon with #239

To be clear, I'm referring to this line: https://github.com/nhooyr/websocket/blob/02861b474d9c29660eff53a3c424d589aaf46d1e/write.go#L259

@nhooyr nhooyr added invalid and removed invalid labels Jun 5, 2020
@nhooyr
Copy link
Contributor

nhooyr commented Jun 5, 2020

Oh never mind, I see what you're saying, the lock would be held up by the goroutine writing so we would never even get to the if statement. 🤦

@nhooyr
Copy link
Contributor

nhooyr commented Jun 5, 2020

Luckily, the close write has a timeout of 5s so this bug has likely had minor impact. Great catch @RTann and more the reason for me to do #239

@nhooyr nhooyr added the bug label Jun 5, 2020
@RTann
Copy link
Author

RTann commented Jun 5, 2020

Right. The goroutine who calls (*Conn).Write gets stuck inside the if while still holding the lock, and the goroutine who calls (*Conn).CloseRead gets stuck trying to obtain the lock, so it never gets to reach the if. Apologies for the confusing wording. I wanted to try to detail the chain of events I observed, which I could have worded better for sure.

nhooyr added a commit that referenced this issue Jul 4, 2020
Closes #248

Luckily, due to the 5s timeout on the close handshake, this would have
had very minimal effects on anyone in production.
@nhooyr nhooyr added this to the v1.8.7 milestone Jul 5, 2020
nhooyr added a commit that referenced this issue Jul 5, 2020
Closes #248

Luckily, due to the 5s timeout on the close handshake, this would have
had very minimal effects on anyone in production.
@nhooyr nhooyr closed this as completed Jan 9, 2021
fortytw2 added a commit to fortytw2/websocket that referenced this issue Jul 7, 2022
fortytw2 added a commit to fortytw2/websocket that referenced this issue Jul 7, 2022
* add SIMD wsmask from @wdvxdr1123

* prevent netconn timer leakage

* fix for coder/websocket#248

* fix for coder/websocket#245

* use net.ErrClosed

* improve unauthorized origin access message

* golint

* go mod tidy
@okhowang
Copy link

Is there any plan for new release?
I notice this MR isn't present in v1.8.7
@nhooyr

@nhooyr
Copy link
Contributor

nhooyr commented Sep 26, 2023

@okhowang See #256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants