-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nil pointer panic on Frame.networkIdleCtx #118
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good catch, however:
I feel like there's still a race condition here.
We might miss to call the cancel funcs because of the if f.networkIdleCancelFn != nil
checks. These are not protected by locks.
It's because stopNetworkIdleTimer
can set the cancel func to nil
while startNetworkIdleTimer
is still working. This can result in goroutine leaks for uncancelled funcs (as the context canceller fires up goroutines if we don't call the cancel funcs).
Yeah, this doesn't fix the several race conditions here, just avoids this specific nil panic. I'm wondering if we can do this without storing the We essentially only need this context to dispose of the goroutine that waits on |
Yep ;)
It's actually not an anti-pattern for _pro_s :) It's ok to store a context in a struct as long as it belongs to a single request-flow chain. Also see this discussion.
What about this (not yet tried it :) so ☢️)? func (f *Frame) stopNetworkIdleTimer() {
f.networkIdleMu.Lock()
defer f.networkIdleMu.Unlock()
if f.networkIdleCancelFn == nil {
return
}
f.networkIdleCancelFn()
}
func (f *Frame) startNetworkIdleTimer() {
if f.hasLifecycleEventFired(LifecycleEventNetworkIdle) || f.detached {
return
}
go func() {
f.networkIdleMu.Lock()
{
f.networkIdleCtx, f.networkIdleCancelFn = context.WithTimeout(f.ctx, LifeCycleNetworkIdleTimeout)
defer f.networkIdleCancelFn()
}
f.networkIdleMu.Unlock()
<-f.networkIdleCtx.Done()
f.manager.frameLifecycleEvent(f.id, LifecycleEventNetworkIdle)
}()
} |
For very quick requests there's a race condition between when we set `f.networkIdleCtx` to nil[1] as part of the `FrameManager.requestStarted()` call, and when we start the idle timer again in `FrameManager.requestFinished()`[2]. This fix ensures we keep the reference to `networkIdleCtx` as part of the closure in the goroutine crated in `Frame.startNetworkIdleTimer()`, and avoid accessing `f.networkIdleCtx` directly. This could use better synchronization, but this fixed it consistently for me by testing with `examples/getattribute.js`, which also reproduces the issue (see https://github.com/grafana/xk6-browser/runs/4225921786). [1]: https://github.com/grafana/xk6-browser/blob/baaf58caef73370f1659c30b3d3a989ddcd8da27/common/frame.go#L273 [2]: https://github.com/grafana/xk6-browser/blob/baaf58caef73370f1659c30b3d3a989ddcd8da27/common/frame.go#L290 Fixes #109
Yeah, I've seen that discussion, but notice that the mentioned "good" use case for doing so is if it's still passed along as a parameter. Not what we're doing here with overwriting it, nil checks, etc. So I think our usage is what shouldn't be done. And you can see that it complicates things with locks, etc. So if we can find a different approach it would be better. I'll try out your suggestion if nothing else works. :) |
Yeah, setting it to
Mine will probably not work but it could with some tweaks, IDK :) |
70705e2
to
2f50e4d
Compare
@inancgumus Would something like 2f50e4d be a solution? From my tests it seems to work fine for all the examples, but the updated script from Tom fails with:
This also happens on |
Yep, this is the same error that I've seen so far (with Tom's script). Btw, the new solution looks better. |
This avoids the need for nil checks and the mutex, so should be race free.
2f50e4d
to
fa39ea5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
For very quick requests there's a race condition between when we set
f.networkIdleCtx
to nil as part of theFrameManager.requestStarted()
call, and when we start the idle timer again inFrameManager.requestFinished()
.This fix uses a channel instead of storing a mutex, which is safer.
Fixes #109