-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Address several remaining race conditions to improve stability #325
[bugfix] Address several remaining race conditions to improve stability #325
Conversation
c.wgProc.Wait() | ||
c.capLock.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uses the upstream improvement from gotools/concurrency
...
pkg/capture/capture.go
Outdated
@@ -347,14 +348,14 @@ func (c *Capture) bufferPackets(buf *LocalBuffer, captureErrors chan error) erro | |||
|
|||
// Ensure that the buffer is released at the end of the method | |||
defer func() { | |||
c.capLock.ConsumeUnlockRequest() // Consume the unlock request to continue normal processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this into the defer
to ensure it's being run (even if in the future code is added below). Since the function is on the stack anyway this has no negative impact.
@@ -580,12 +583,12 @@ func (cm *Manager) performWriteout(ctx context.Context, timestamp time.Time, ifa | |||
writeoutChan := make(chan capturetypes.TaggedAggFlowMap, writeout.WriteoutsChanDepth) | |||
doneChan := cm.writeoutHandler.HandleWriteout(ctx, timestamp, writeoutChan) | |||
|
|||
cm.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also incorrect. The global lock must encompass the whole (unprotected) rotation, otherwise it's possible that a scheduled one runs in parallel to one triggered from an interface down. TBH this is the most likely candidate for the deadlocks (because we've seen it happen in exactly these scenarios quite frequently - interface changes + scheduled rotation)....
Damn, now there's a race in the capture manager logic (probably has been there all along, just never got triggered):
On it... |
…tions and introduces a new race condition
Improves multiple things across the board detected when running hacksaw-style concurrency tests using an actual deployment and
gpctl
(excessively).Closes #317