-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"empty" ProducerMessage may lead to the producer hang #2150
Comments
@slaunay |
niamster
added a commit
to niamster/sarama
that referenced
this issue
Mar 20, 2022
dnwe
added a commit
that referenced
this issue
Mar 30, 2022
docmerlin
added a commit
to influxdata/kapacitor
that referenced
this issue
May 24, 2022
This bug can cause the producer to hang IBM/sarama#2150
docmerlin
added a commit
to influxdata/kapacitor
that referenced
this issue
May 24, 2022
* fix(kafka): updated kafka client to fix a bug in the library This bug can cause the producer to hang IBM/sarama#2150 * chore: go mod tidy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for providing those details.
I am not sure if this is linked to the current issue (deadlock regression) but something is not right indeed and we should probably create another issue for that particular scenario.
Here are some things I found:
brokerProducer
is blocked sending a success:https://github.com/Shopify/sarama/blob/f1bc44e541eecf45f935b97db6a457740aaa073e/async_producer.go#L1155
syncProducer
successes
goroutine is blocked forwarding a success (and therefore blocking thebrokerProducer
):https://github.com/Shopify/sarama/blob/f1bc44e541eecf45f935b97db6a457740aaa073e/sync_producer.go#L132
https://github.com/Shopify/sarama/blob/f1bc44e541eecf45f935b97db6a457740aaa073e/sync_producer.go#L96
Now what is really interesting is that the
expectation
field on aProducerMessage
used by thesyncProducer
is a channel that is always buffered with a capacity of 1.So it should never block the the
syncProducer
successes
goroutine in theory but it seems to be the case.The
null
keynull
value record you see in the topic makes me think that:ProducerMessage
ends up being sent to the remote brokernil
expectation
field ends up being sent as a success tosyncProducer
successes
goroutinenil
channel blocks forever therefore blocking thebrokerProducer
and preventing more records to be producedNow such "empty"
ProducerMessage
used by theAsyncProducer
can be:shutdown
message but those do not traverse thedispatcher
:https://github.com/Shopify/sarama/blob/f1bc44e541eecf45f935b97db6a457740aaa073e/async_producer.go#L337-L340
syn
message but should not end up in aproduceSet
:https://github.com/Shopify/sarama/blob/f1bc44e541eecf45f935b97db6a457740aaa073e/async_producer.go#L818-L827
fin
message and those could end up in aproduceSet
somehow I suppose:https://github.com/Shopify/sarama/blob/f1bc44e541eecf45f935b97db6a457740aaa073e/async_producer.go#L829-L861
As
fin
messages are used during retries, it might the root cause of the hanging if somehow they escape theAsyncProducer
and ends up on the broker and in a success channel.It would be great to confirm this is the case and ideally have a simple unit test for that scenario.
If you reduce
connections.max.idle.ms
andNet.ReadTimeout
you might be able to reproduce it faster.@hxiaodon Would you mind creating another issue?
Originally posted by @slaunay in #2133 (comment)
The text was updated successfully, but these errors were encountered: