Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nil pointer panic on Frame.networkIdleCtx #118

Merged
merged 2 commits into from
Nov 18, 2021

Conversation

imiric
Copy link
Contributor

@imiric imiric commented Nov 16, 2021

For very quick requests there's a race condition between when we set f.networkIdleCtx to nil as part of the FrameManager.requestStarted() call, and when we start the idle timer again in FrameManager.requestFinished().

This fix uses a channel instead of storing a mutex, which is safer.

Fixes #109

Copy link
Member

@robingustafsson robingustafsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch, however:

I feel like there's still a race condition here.

We might miss to call the cancel funcs because of the if f.networkIdleCancelFn != nil checks. These are not protected by locks.

It's because stopNetworkIdleTimer can set the cancel func to nil while startNetworkIdleTimer is still working. This can result in goroutine leaks for uncancelled funcs (as the context canceller fires up goroutines if we don't call the cancel funcs).

@imiric
Copy link
Contributor Author

imiric commented Nov 18, 2021

Yeah, this doesn't fix the several race conditions here, just avoids this specific nil panic.

I'm wondering if we can do this without storing the Context on Frame (which is an anti-pattern; from the docs: "Do not store Contexts inside a struct type"). And this is not a "timer" anyway :)

We essentially only need this context to dispose of the goroutine that waits on LifeCycleNetworkIdleTimeout. Could it be done with a channel instead? I'll see if I can refactor it...

@inancgumus
Copy link
Member

inancgumus commented Nov 18, 2021

Yeah, this doesn't fix the several race conditions here, just avoids this specific nil panic.

Yep ;)

I'm wondering if we can do this without storing the Context on Frame (which is an anti-pattern; from the docs: "Do not store Contexts inside a struct type"). And this is not a "timer" anyway :)

It's actually not an anti-pattern for _pro_s :) It's ok to store a context in a struct as long as it belongs to a single request-flow chain. Also see this discussion.

We essentially only need this context to dispose of the goroutine that waits on LifeCycleNetworkIdleTimeout. Could it be done with a channel instead? I'll see if I can refactor it...

What about this (not yet tried it :) so ☢️)?

func (f *Frame) stopNetworkIdleTimer() {
	f.networkIdleMu.Lock()
	defer f.networkIdleMu.Unlock()
	if f.networkIdleCancelFn == nil {
		return
	}
	f.networkIdleCancelFn()
}


func (f *Frame) startNetworkIdleTimer() {
	if f.hasLifecycleEventFired(LifecycleEventNetworkIdle) || f.detached {
		return
	}
	go func() {
		f.networkIdleMu.Lock()
		{
			f.networkIdleCtx, f.networkIdleCancelFn = context.WithTimeout(f.ctx, LifeCycleNetworkIdleTimeout)
			defer f.networkIdleCancelFn()
		}
		f.networkIdleMu.Unlock()

		<-f.networkIdleCtx.Done()
		f.manager.frameLifecycleEvent(f.id, LifecycleEventNetworkIdle)
	}()
}

For very quick requests there's a race condition between when we set
`f.networkIdleCtx` to nil[1] as part of the `FrameManager.requestStarted()`
call, and when we start the idle timer again in `FrameManager.requestFinished()`[2].

This fix ensures we keep the reference to `networkIdleCtx` as part of the closure
in the goroutine crated in `Frame.startNetworkIdleTimer()`, and avoid
accessing `f.networkIdleCtx` directly.

This could use better synchronization, but this fixed it consistently for
me by testing with `examples/getattribute.js`, which also reproduces the
issue (see https://github.com/grafana/xk6-browser/runs/4225921786).

[1]: https://github.com/grafana/xk6-browser/blob/baaf58caef73370f1659c30b3d3a989ddcd8da27/common/frame.go#L273

[2]: https://github.com/grafana/xk6-browser/blob/baaf58caef73370f1659c30b3d3a989ddcd8da27/common/frame.go#L290

Fixes #109
@imiric
Copy link
Contributor Author

imiric commented Nov 18, 2021

Yeah, I've seen that discussion, but notice that the mentioned "good" use case for doing so is if it's still passed along as a parameter. Not what we're doing here with overwriting it, nil checks, etc. So I think our usage is what shouldn't be done. And you can see that it complicates things with locks, etc.

So if we can find a different approach it would be better. I'll try out your suggestion if nothing else works. :)

@inancgumus
Copy link
Member

inancgumus commented Nov 18, 2021

Yeah, I've seen that discussion, but notice that the mentioned "good" use case for doing so is if it's still passed along as a parameter. Not what we're doing here with overwriting it, nil checks, etc. So I think our usage is what shouldn't be done. And you can see that it complicates things with locks, etc.

Yeah, setting it to nil etc. seems like an anti-pattern.

So if we can find a different approach it would be better. I'll try out your suggestion if nothing else works. :)

Mine will probably not work but it could with some tweaks, IDK :)

@imiric imiric force-pushed the fix/109-panic-nil-networkidlectx branch from 70705e2 to 2f50e4d Compare November 18, 2021 11:24
@imiric
Copy link
Contributor Author

imiric commented Nov 18, 2021

@inancgumus Would something like 2f50e4d be a solution?

From my tests it seems to work fine for all the examples, but the updated script from Tom fails with:

ERRO[0015] unable to evaluate expression: Cannot find context with specified id (-32000)
running at reflect.methodValueCall (native)
default at file:///home/ivan/Projects/grafana/xk6-browser/examples/49-vistaprint-20211115.js:76:26(9)
        at go.k6.io/k6/js/common.Bind.func1 (native)
        at clickBusinessCards (file:///home/ivan/Projects/grafana/xk6-browser/examples/49-vistaprint-20211115.js:73:17(17))
        at file:///home/ivan/Projects/grafana/xk6-browser/examples/49-vistaprint-20211115.js:18:21(8)  executor=per-vu-iterations scenario=default source=stacktrace
panic: cannot create frame session: unable to add script to evaluate for isolated world: context canceled
running (00m13.9s), 1/1 VUs, 0 complete and 0 interrupted iterations
goroutine 132 [running]:-------------------------] 1 VUs  00m13.9s/10m0s  0/1 iters, 1 per VU
go.k6.io/k6/js/common.Throw(0x111b3c0, {0x15802e0, 0xc00009bd40})
        go.k6.io/k6@v0.35.0/js/common/util.go:36 +0x55
github.com/grafana/xk6-browser/common.k6Throw({0x159fe28, 0xc000e0f740}, {0x132303d, 0x1f}, {0xc0039a3f30, 0x1, 0x1})
        github.com/grafana/xk6-browser@v0.0.0-00010101000000-000000000000/common/helpers.go:338 +0x1ae
github.com/grafana/xk6-browser/common.(*FrameSession).onAttachedToTarget(0xc0025e7800, 0xc000e12720)
        github.com/grafana/xk6-browser@v0.0.0-00010101000000-000000000000/common/frame_session.go:653 +0x1d4
github.com/grafana/xk6-browser/common.(*FrameSession).initEvents.func1()
        github.com/grafana/xk6-browser@v0.0.0-00010101000000-000000000000/common/frame_session.go:217 +0x2f2
created by github.com/grafana/xk6-browser/common.(*FrameSession).initEvents
        github.com/grafana/xk6-browser@v0.0.0-00010101000000-000000000000/common/frame_session.go:180 +0xe7

This also happens on main so it's not related to this change, just mentioning it here. It seems like a new issue. 😞

@inancgumus
Copy link
Member

inancgumus commented Nov 18, 2021

@inancgumus Would something like 2f50e4d be a solution?

From my tests, it seems to work fine for all the examples, but the updated script from Tom fails with:

Yep, this is the same error that I've seen so far (with Tom's script). Btw, the new solution looks better.

This avoids the need for nil checks and the mutex, so should be race free.
@imiric imiric force-pushed the fix/109-panic-nil-networkidlectx branch from 2f50e4d to fa39ea5 Compare November 18, 2021 13:51
@imiric imiric requested a review from inancgumus November 18, 2021 13:52
Copy link
Member

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@imiric imiric merged commit b7f5593 into main Nov 18, 2021
@imiric imiric deleted the fix/109-panic-nil-networkidlectx branch November 18, 2021 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nil pointer: Network Idle Context Done
3 participants