Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why example consumergroup can not work correctly ? only one client can consume message, other client can not work #1516

Closed
meiyang1990 opened this issue Oct 23, 2019 · 16 comments
Labels
needs-more-info question stale Issues and pull requests without any recent activity

Comments

@meiyang1990
Copy link

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
Configuration

What configuration values are you using for Sarama and Kafka?

Logs

When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.

logs: CLICK ME

Problem Description
@d1egoaz
Copy link
Contributor

d1egoaz commented Oct 28, 2019

Hey @meiyang1990
What are the arguments that you're using in order to test the consumer?

what is your topic configuration? how many partitions does the topic have?

@wyndhblb
Copy link
Contributor

wyndhblb commented Nov 13, 2019

i can confirm this is also happening.

TL;DR :: sarama "rebalancing" is not working correctly (if it ever did)

specs:: kafka 2.2
sarama:: 1.24.1
docker-compose based network/install (one kafka container, not multiple, but also confirmed on a real production system with 7 kafka nodes)
config :: all the defaults sarama sets except these
(the below also fails on all versions from 1.0.0 -> 2.3.0)

conf.Metadata.RefreshFrequency = 120 * time.Second
conf.Metadata.Full = true
conf.Consumer.Fetch.Min = 1
conf.Consumer.Fetch.Max = 1000
conf.ClientID = "some string"
conf.Consumer.Group.Rebalance.Strategy = sarama.BalanceStrategyRange
conf.Consumer.Offsets.CommitInterval = time.Second
conf.Version = (1.0.0)  
  1. start a test producer

Sent 5200  messages so far to topic test ...
Sent 5220  messages so far to topic test ...
Sent 5240  messages so far to topic test ...
  1. fire up a consumer

2019-11-13T03:35:41.120Z	INFO	example	log/with.go:13	claimed kafka topics{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "claims": {"test":[0,1,2,3,4,5,6,7]}}
2019-11-13T03:35:41.120Z	INFO	example	log/with.go:13	starting message consumer loop	{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "topic": "test", "partition": 7}
  1. watch the kafka groups items

watch -t -d kafka-consumer-groups.sh --bootstrap-server=localhost:9092 --describe --group group-example

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID
           HOST            CLIENT-ID
test            6          732             732             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            0          716             716             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            7          705             707             2               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            5          707             708             1               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            1          704             704             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            4          695             695             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            3          691             691             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
test            2          736             736             0               example-group-consumer-b6a9f0fd-6c9f-4f9f-88ea-dc
cd4aaaa97b /172.22.0.6     example-group-consumer
  1. fire up another consumer (one expects given the default "RANGE" stragetgy for things to split evenly across the same group :: 0->3 partitions one, 4 -> 7 on the other)

  2. what really happens

5a. kafka goes into re-balence mode


TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
test            6          760             760             0               -               -               -
test            0          742             745             3               -               -               -
test            7          725             727             2               -               -               -
test            5          741             745             4               -               -               -
test            1          720             722             2               -               -               -
test            4          726             727             1               -               -               -
test            3          713             714             1               -               -               -
test            2          768             772             4               -               -               -

then moments later still only ONE consumer is consuming (in this case the new one)

5b. Consumer 2 claims "everything"

2019-11-13T03:38:49.601Z	INFO	example	log/with.go:13	claimed kafka topics{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"], "claims": {"test":[0,1,2,3,4,5,6,7]}}

5b. the watcher on the kafka node itself confirming that the new one claimed everything

CONTAINER ID        IMAGE                                                         COMMAND                  CREATED             STATUS                            PORTS                                                        NAMES

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID
           HOST            CLIENT-ID
test            6          780             781             1               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            0          759             760             1               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            7          744             744             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            5          763             763             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            1          736             736             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            4          738             738             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            3          734             734             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
test            2          784             784             0               example-group-consumer-d237b3c9-b3ce-402c-8565-86
9f80cd6c7b /172.22.0.6     example-group-consumer
  1. Kill (stop) the consumer number 2
2019-11-13T03:41:39.495Z	INFO	example	log/with.go:13	consumer cleanup	{"lib": "xxxx", "name": "consumergroup", "topics": ["test"], "group": "group-example", "brokers": ["kafka:9092"]}
Consumer stopped:
 Messages consumed:  361
 Errors received:  0
  1. the consumer Number 1 never "takes over" and fills in the gaps, but sits dead

  2. Kafka watcher is also empty of consumers

Consumer group 'group-example' has no active members.

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
test            6          811             847             36              -               -               -
test            0          793             828             35              -               -               -
test            7          772             819             47              -               -               -
test            5          792             820             28              -               -               -
test            1          767             809             42              -               -               -
test            4          763             794             31              -               -               -
test            3          752             775             23              -               -               -
test            2          806             838             32              -               -               -

@vikrampunchh
Copy link

Same issue. @d1egoaz @meiyang1990

@gauravds
Copy link

Same with me.

@pavius
Copy link
Contributor

pavius commented Jan 22, 2020

The above is a symptom of not looping on Consume(). Consume() will exit without error when a rebalancing occurs and it is up to the user to call it again when this occurs.

Under the hood it seems like when a re-balance occurs all sessions are torn down completely (briefly no members exist and therefore no partitions are handled by anyone) and when you re-call Consume() a new session is brought up which should get its share of the partitions.

@ghost
Copy link

ghost commented Apr 21, 2020

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@ghost ghost added the stale Issues and pull requests without any recent activity label Apr 21, 2020
@d1egoaz
Copy link
Contributor

d1egoaz commented Apr 22, 2020

some weeks ago we added some comments to what @pavius commented above.
It was confusing, I had the same issue once.

#1602

@meiyang1990 @wyndhblb @vikrampunchh @meiyang1990 @gauravds
could you please confirm that this was the issue?

@ghost
Copy link

ghost commented Mar 16, 2021

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@ghost ghost added the stale Issues and pull requests without any recent activity label Mar 16, 2021
@TomWright
Copy link

@d1egoaz I am still getting this issue.

I am starting 3 consumers, all in the same consumer group.

The call to Consume() is within a loop and I can see (from logs) that the consumers are being rebalanced as the following consumers come online.

Before publishing any messages I can see that all 3 consumers have called Consume and are waiting for messages, however when publishing only the last consumer to come online actually receives any messages.

I have the consumer group rebalance strategy set to round robin.

Any ideas what's going on here?

@ghost ghost removed the stale Issues and pull requests without any recent activity label Mar 18, 2021
@d1egoaz
Copy link
Contributor

d1egoaz commented Mar 18, 2021

@d1egoaz I am still getting this issue.

I am starting 3 consumers, all in the same consumer group.

The call to Consume() is within a loop and I can see (from logs) that the consumers are being rebalanced as the following consumers come online.

Before publishing any messages I can see that all 3 consumers have called Consume and are waiting for messages, however when publishing only the last consumer to come online actually receives any messages.

I have the consumer group rebalance strategy set to round robin.

Any ideas what's going on here?

I wonder if you can paste some code about how's your consumer subscription/consume workflow.
What's the topic configuration?

@TomWright
Copy link

TomWright commented Mar 18, 2021

@d1egoaz I've created an example here: https://github.com/TomWright/sarama-example

There are some basic instructions in the readme.

It uses this repo to run the consumer: https://github.com/TomWright/gracesarama/blob/main/consumer.go#L48

@d1egoaz
Copy link
Contributor

d1egoaz commented Mar 18, 2021

@d1egoaz I've created an example here: https://github.com/TomWright/sarama-example

There are some basic instructions in the readme.

It uses this repo to run the consumer: https://github.com/TomWright/gracesarama/blob/main/consumer.go#L48

thanks!

I'm looking at this specifically:

KAFKA_CREATE_TOPICS: "my-topic:1:1"

edit:
I just noticed this is not used, and kafka is configured to create topics automatically when someone is producing to it, by default the topic will be created with only 1 partition.

if this is how your test is configured, that's the reason the message is only going to one member.

A claim (topic/partition) can only be read by a single member of a group, a member can consume multiple different partitions/topics, in your case, even if you have N members, only 1 (1 partitition) will consume messages. Partitions define the concurrency of the topic, if you want to partition/consume faster with more members, you should increase the partitions.

Try adding more partitions to your topic first.

@d1egoaz
Copy link
Contributor

d1egoaz commented Mar 18, 2021

after I manually deleted/created the topic with more partitions I can see more members consuming the messages

❯ kafka-topics --delete --zookeeper localhost:2181 --topic  my-topic

❯ kafka-topics --create --zookeeper localhost:2181 --partitions 3 --replication-factor 1 --topic my-topic

BTW, I don't think the confluent image supports KAFKA_CREATE_TOPICS, that's from this image https://github.com/wurstmeister/kafka-docker#automatically-create-topics
in your case, after changing the image it should be KAFKA_CREATE_TOPICS: "my-topic:3:1"

image

@TomWright
Copy link

Ah I see. That makes sense. Thank for your feedback on this! Now you mention that it's obvious 🤦‍♂️

@github-actions
Copy link

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@github-actions github-actions bot added the stale Issues and pull requests without any recent activity label Aug 24, 2023
@dnwe
Copy link
Collaborator

dnwe commented Aug 24, 2023

Reading back through this issue it looks like all problems were resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-info question stale Issues and pull requests without any recent activity
Projects
None yet
Development

No branches or pull requests

8 participants