Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TestPartitionReader_ShouldNotMissRecordsIfFetchRequestContainPartialFailuresWithConcurrentFetcherIsUsed flakyness #9967

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

pracucci
Copy link
Collaborator

What this PR does

In #9963 I've introduced TestPartitionReader_ShouldNotMissRecordsIfFetchRequestContainPartialFailuresWithConcurrentFetcherIsUsed. After merging, I realised locally that sometimes the test hang.

I spent 1h debugging it and it's related to the case the kfake control return an error (not a response with error). I'm quite sure the issue has not been introduced with the changes in #9963, but I still haven't understood if the issue is in kfake (can't find a proof there) or an issue in the franz-go library.

I need more time to further investigate it. In the meanwhile I suggest to remove that type of error: the error still make sense due to the error injected by "Simulate a 10% Kafka error rate".

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

…ialFailuresWithConcurrentFetcherIsUsed flakyness

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci requested a review from a team as a code owner November 20, 2024 17:23
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the hanging behaviour looks like the behaviour when returning a nil, nil, true:

// If the control function returns true, then either the response is written
// back to the client or, if there the control function returns an error, the
// client connection is closed. If both returns are nil, then the cluster will
// loop continuing to read from the client and the client will likely have a
// read timeout at some point.

@pracucci
Copy link
Collaborator Author

the hanging behaviour looks like the behaviour when returning a nil, nil, true:

Yesteday I investigated it a bit and I'm not 💯 sure about it. In case of nil, err, true kfake should close the connection (without sending any response).

@pracucci pracucci merged commit 693efbc into main Nov 21, 2024
29 checks passed
@pracucci pracucci deleted the fix-flaky-test branch November 21, 2024 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants