-
Notifications
You must be signed in to change notification settings - Fork 291
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix deadlock caused by race while signal receivers are stopping (#1220)
A user reported a possible deadlock within the signal receivers (#1219). This happens by: * `(*signalReceivers).Stop()` is called, by Shutdowner for instance. * `(*signalReceivers).Stop()` [acquires the lock](https://github.com/uber-go/fx/blob/master/signal.go#L121). * Separately, an OS signal is sent to the program. * There is a chance that `relayer()` is still running at this point if `(*signalReceivers).Stop()` has not yet sent along the `shutdown` channel. * The relayer [attempts to broadcast the signal](https://github.com/uber-go/fx/blob/master/signal.go#L93) received via the `signals` channel. * `Broadcast()` blocks on [trying to acquire the lock](https://github.com/uber-go/fx/blob/master/signal.go#L178). * `(*signalReceivers).Stop()` blocks on [waiting for the `relayer()` to finish](https://github.com/uber-go/fx/blob/master/signal.go#L132) by blocking on the `finished` channel. * Deadlock. Luckily, this is not a hard deadlock, as `Stop` will return if the context times out, but we should still fix it. This PR fixes this deadlock. The idea behind how it does it is based on the observation that the broadcasting logic does not necessarily seem to need to share a mutex with the rest of `signalReceivers`. Specifically, it seems like we can separate protection around the registered `wait` and `done` channels, `last`, and the rest of the fields, since the references to those fields are easily isolated. To avoid overcomplicating `signalReceivers` with multiple locks for different uses, this PR creates a separate `broadcaster` type in charge of keeping track of and broadcasting to `Wait` and `Done` channels. Most of the implementation of `broadcaster` is simply moved over from `signalReceivers`. Having a separate broadcaster type seems actually quite natural, so I opted for this to fix the deadlock. Absolutely open to feedback or taking other routes if folks have thoughts. Since broadcasting is protected separately, this deadlock no longer happens since `relayer()` is free to finish its broadcast and then exit. In addition to running the example provided in the original post to verify, I added a test and ran it before/after this change. Before: ``` $ go test -v -count=10 -run "TestSignal/stop_deadlock" . === RUN TestSignal/stop_deadlock signal_test.go:141: Error Trace: /home/user/go/src/github.com/uber-go/fx/signal_test.go:141 Error: Received unexpected error: context deadline exceeded Test: TestSignal/stop_deadlock ``` (the failure appeared roughly 1/3 of the time) After: ``` $ go test -v -count=100 -run "TestSignal/stop_deadlock" . --- PASS: TestSignal (0.00s) --- PASS: TestSignal/stop_deadlock (0.00s) ``` (no failures appeared)
- Loading branch information
Showing
4 changed files
with
215 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
// Copyright (c) 2024 Uber Technologies, Inc. | ||
// | ||
// Permission is hereby granted, free of charge, to any person obtaining a copy | ||
// of this software and associated documentation files (the "Software"), to deal | ||
// in the Software without restriction, including without limitation the rights | ||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
// copies of the Software, and to permit persons to whom the Software is | ||
// furnished to do so, subject to the following conditions: | ||
// | ||
// The above copyright notice and this permission notice shall be included in | ||
// all copies or substantial portions of the Software. | ||
// | ||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
// THE SOFTWARE. | ||
|
||
package fx | ||
|
||
import ( | ||
"fmt" | ||
"os" | ||
"sync" | ||
) | ||
|
||
// broadcaster broadcasts signals to registered signal listeners. | ||
// All methods on the broadcaster are concurrency-safe. | ||
type broadcaster struct { | ||
// This lock is used to protect all fields of broadcaster. | ||
// Methods on broadcaster should protect all concurrent access | ||
// by taking this lock when accessing its fields. | ||
// Conversely, this lock should NOT be taken outside of broadcaster. | ||
m sync.Mutex | ||
|
||
// last will contain a pointer to the last ShutdownSignal received, or | ||
// nil if none, if a new channel is created by Wait or Done, this last | ||
// signal will be immediately written to, this allows Wait or Done state | ||
// to be read after application stop | ||
last *ShutdownSignal | ||
|
||
// contains channels created by Done | ||
done []chan os.Signal | ||
|
||
// contains channels created by Wait | ||
wait []chan ShutdownSignal | ||
} | ||
|
||
func (b *broadcaster) reset() { | ||
b.m.Lock() | ||
defer b.m.Unlock() | ||
b.last = nil | ||
} | ||
|
||
// Done creates a new channel that will receive signals being broadcast | ||
// via the broadcaster. | ||
// | ||
// If a signal has been received prior to the call of Done, | ||
// the signal will be sent to the new channel. | ||
func (b *broadcaster) Done() <-chan os.Signal { | ||
b.m.Lock() | ||
defer b.m.Unlock() | ||
|
||
ch := make(chan os.Signal, 1) | ||
// If we had received a signal prior to the call of done, send it's | ||
// os.Signal to the new channel. | ||
// However we still want to have the operating system notify signals to this | ||
// channel should the application receive another. | ||
if b.last != nil { | ||
ch <- b.last.Signal | ||
} | ||
b.done = append(b.done, ch) | ||
return ch | ||
} | ||
|
||
// Wait creates a new channel that will receive signals being broadcast | ||
// via the broadcaster. | ||
// | ||
// If a signal has been received prior to the call of Wait, | ||
// the signal will be sent to the new channel. | ||
func (b *broadcaster) Wait() <-chan ShutdownSignal { | ||
b.m.Lock() | ||
defer b.m.Unlock() | ||
|
||
ch := make(chan ShutdownSignal, 1) | ||
|
||
if b.last != nil { | ||
ch <- *b.last | ||
} | ||
|
||
b.wait = append(b.wait, ch) | ||
return ch | ||
} | ||
|
||
// Broadcast sends the given signal to all channels that have been created | ||
// via Done or Wait. It does not block on sending, and returns an unsentSignalError | ||
// if any send did not go through. | ||
func (b *broadcaster) Broadcast(signal ShutdownSignal) error { | ||
b.m.Lock() | ||
defer b.m.Unlock() | ||
|
||
b.last = &signal | ||
|
||
channels, unsent := b.broadcast( | ||
signal, | ||
b.broadcastDone, | ||
b.broadcastWait, | ||
) | ||
|
||
if unsent != 0 { | ||
return &unsentSignalError{ | ||
Signal: signal, | ||
Total: channels, | ||
Unsent: unsent, | ||
} | ||
} | ||
|
||
return nil | ||
} | ||
|
||
func (b *broadcaster) broadcast( | ||
signal ShutdownSignal, | ||
anchors ...func(ShutdownSignal) (int, int), | ||
) (int, int) { | ||
var channels, unsent int | ||
|
||
for _, anchor := range anchors { | ||
c, u := anchor(signal) | ||
channels += c | ||
unsent += u | ||
} | ||
|
||
return channels, unsent | ||
} | ||
|
||
func (b *broadcaster) broadcastDone(signal ShutdownSignal) (int, int) { | ||
var unsent int | ||
|
||
for _, reader := range b.done { | ||
select { | ||
case reader <- signal.Signal: | ||
default: | ||
unsent++ | ||
} | ||
} | ||
|
||
return len(b.done), unsent | ||
} | ||
|
||
func (b *broadcaster) broadcastWait(signal ShutdownSignal) (int, int) { | ||
var unsent int | ||
|
||
for _, reader := range b.wait { | ||
select { | ||
case reader <- signal: | ||
default: | ||
unsent++ | ||
} | ||
} | ||
|
||
return len(b.wait), unsent | ||
} | ||
|
||
type unsentSignalError struct { | ||
Signal ShutdownSignal | ||
Unsent int | ||
Total int | ||
} | ||
|
||
func (err *unsentSignalError) Error() string { | ||
return fmt.Sprintf( | ||
"send %v signal: %v/%v channels are blocked", | ||
err.Signal, | ||
err.Unsent, | ||
err.Total, | ||
) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.