Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add time.Sleep to mitigate race condition. #1923

Merged
merged 2 commits into from
Oct 4, 2024

Conversation

marianogappa
Copy link
Contributor

The ShuffleQueue scheduler strategy has an infrequent race condition, as explained by the comment:

	// A race condition is possible when the last active table asynchronously
	// queues a relation. The table finishes (calling `.Done()`) a moment
	// before the queue receives the `.Push()`. At this point, the queue is
	// empty and there are no active workers.
	//
	// A moment later, the queue receives the `.Push()` and queues a new task.
	//
	// This is a very infrequent case according to tests, but it happens.

After many attempts at a more elegant solution, I finally yielded:

time.Sleep(10 * time.Millisecond)

Looks ugly, but after running the tests 300 times (so around 3000 syncs), it works 🤷

✓ cloudquery/plugin-sdk main* $ go test ./scheduler -count=100 -run TestScheduler             ⏱ 15:04:12
ok  	github.com/cloudquery/plugin-sdk/v4/scheduler	143.523s
✓ cloudquery/plugin-sdk main* $ go test ./scheduler -count=100 -run TestScheduler              ⏱ 15:06:56
ok  	github.com/cloudquery/plugin-sdk/v4/scheduler	142.796s
✓ cloudquery/plugin-sdk main* $ go test ./scheduler -count=100 -run TestScheduler              ⏱ 15:09:22
ok  	github.com/cloudquery/plugin-sdk/v4/scheduler	144.304s

@marianogappa marianogappa marked this pull request as ready for review October 4, 2024 14:15
@marianogappa marianogappa requested a review from a team as a code owner October 4, 2024 14:15
@github-actions github-actions bot added the feat label Oct 4, 2024
@marianogappa
Copy link
Contributor Author

lol, there's a unit test failure but that one wasn't this code:

panic: test timed out after 10m0s
	running tests:
		TestScheduler_Cancellation (9m51s)
		TestScheduler_Cancellation/should_not_consume_all_message_on_cancel_shuffle (9m51s)

Note the strategy is shuffle, not shuffle-queue:

t.Run(fmt.Sprintf("%s_%s", tc.name, strategy.String())

@marianogappa marianogappa merged commit 83dfcad into main Oct 4, 2024
8 checks passed
@marianogappa marianogappa deleted the mariano/mitigate-race-condition branch October 4, 2024 14:47
kodiakhq bot pushed a commit that referenced this pull request Oct 7, 2024
🤖 I have created a release *beep* *boop*
---


## [4.66.0](v4.65.0...v4.66.0) (2024-10-07)


### Features

* Add time.Sleep to mitigate race condition. ([#1923](#1923)) ([83dfcad](83dfcad))


### Bug Fixes

* **deps:** Update aws-sdk-go-v2 monorepo ([#1926](#1926)) ([4fc8896](4fc8896))
* **deps:** Update module google.golang.org/grpc to v1.67.1 ([#1925](#1925)) ([5e0305d](5e0305d))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants