Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent deadlock in TestDynamicEnqueueRequest #4949

Merged
merged 3 commits into from
Oct 14, 2021

Conversation

david-kow
Copy link
Contributor

This PR introduces a preventive measure to stop TestDynamicEnqueueRequest from a deadlock. It happens because, contrary to our expectations, we can receive more than one event even though we only do a single update after registering our handler. In my tests I saw we can receive two or even three updates. Since the channel we use is unbuffered and we do a single read, our handler is blocked. As staying in the handler prevents from shutting down gracefully, we timeout after 30 seconds with an error and fail the test.

The change makes the handler respect the provided context, allowing controller-runtime to shut down gracefully.

I've also randomized the name to facilitate running the test multiple times, as it seems the teardown is not thorough and resources with same name clash.

Before the change (although with random resource name) I was getting around 1% failure rate every time on the below:

go test -v -race -count 1000 -tags=integration pkg/controller/common/watches/handler_integration_test.go

After the change running (note 3k count):

go test -v -race -count 3000 -tags=integration pkg/controller/common/watches/handler_integration_test.go

Passes without any failures.

Fixes #4692.

@david-kow david-kow added >flaky_test >test Related to unit/integration/e2e tests labels Oct 14, 2021
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well seen! When I ran the test command with -race and -count I lacked ambition with only -count 100. With -count 1000, I've failures "which is great" but a lower rate failure than you: 0.1% or 0.2%.

LGTM!

pkg/controller/common/watches/handler_integration_test.go Outdated Show resolved Hide resolved
Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
@david-kow david-kow merged commit 4aa7e51 into elastic:master Oct 14, 2021
@david-kow david-kow deleted the fix_testdynamicenqueuerequest branch October 14, 2021 11:02
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
* Prevent deadlock in TestDynamicEnqueueRequest

* Randomize watched resource name

* Update pkg/controller/common/watches/handler_integration_test.go

Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>

Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>flaky_test >test Related to unit/integration/e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestDynamicEnqueueRequest is flaky
3 participants