Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consumerGroup will deadlock when handling errors #1554

Closed
LihuaWu opened this issue Dec 5, 2019 · 7 comments
Closed

consumerGroup will deadlock when handling errors #1554

LihuaWu opened this issue Dec 5, 2019 · 7 comments
Labels
stale Issues and pull requests without any recent activity

Comments

@LihuaWu
Copy link

LihuaWu commented Dec 5, 2019

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
commit:c82acaf7831341c57794be79578431c927540840 2.3.1 1.13.3
Configuration

What configuration values are you using for Sarama and Kafka?

config.Consumer.Return.Errors = true
Logs

When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.

logs: CLICK ME

[kafka.log](https://github.com/Shopify/sarama/files/3925236/kafka.log)

Problem Description

lock field in struct ConsumerGroup is a Mutex. https://github.com/Shopify/sarama/blob/c82acaf7831341c57794be79578431c927540840/consumer_group.go#L63
And it is used in the following codes

  1. https://github.com/Shopify/sarama/blob/c82acaf7831341c57794be79578431c927540840/consumer_group.go#L155
  2. https://github.com/Shopify/sarama/blob/c82acaf7831341c57794be79578431c927540840/consumer_group.go#L433

Consume will call handleError in several conditions which will cause deadlock. For example:
https://github.com/Shopify/sarama/blob/c82acaf7831341c57794be79578431c927540840/consumer_group.go#L688
https://github.com/Shopify/sarama/blob/c82acaf7831341c57794be79578431c927540840/consumer_group.go#L694
etc.

@stevenvdr
Copy link

@roblaszczak The lock in the error function was introduced in commit 93f4001 . Maybe the lock should be moved to the registerBroker function instead? The deregisterBroker function already has the lock, so I would think that will fix the issue? (I'm not familiar with the code though, so I could be overlooking something.)

@stevenvdr
Copy link

I think the mutex lock can even be safely removed in the handleError function (stevenvdr@8847ac5). Not sure how to properly test though

@roblaszczak
Copy link
Contributor

stevenvdr/sarama@8847ac5 shouldn't be merged, because it will remove fixes for race conditions from #1531 (BTW it would be nice to have some regression test for this in Sarama tests ;), currently it can be checked with Watermill tests, simple description of how to test it is here: #1531).

I don't have a lot of time to check it today, but maybe redesigning mutexes a bit may fix the issue (maybe separated mutex just for close?).

@stevenvdr
Copy link

@roblaszczak I've had a look, but I'm not familiar enough with how everything is set up. Could you have a look?

@sysadmind
Copy link

I think the fix was merged in #1581. I tested with my local consumer group and I no longer get the deadlock.

@roblaszczak
Copy link
Contributor

@stevenjm

I checked with the last release, and it's fine now:

robert@dell ~/src/watermill-pubsubs/watermill-kafka$ go test ./... -short -race -count=1                                                                                                                                                                                                                                                                                                                                                                                           ✹ ✭master 
ok  	github.com/ThreeDotsLabs/watermill-kafka/v2/pkg/kafka	20.096s

@ghost
Copy link

ghost commented Apr 26, 2020

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@ghost ghost added the stale Issues and pull requests without any recent activity label Apr 26, 2020
@ghost ghost closed this as completed Mar 17, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues and pull requests without any recent activity
Projects
None yet
Development

No branches or pull requests

4 participants