-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First offset starts at 1 instead of 0 #1135
Comments
@bdelbosc this is not a bug. the first batch is a control batch (raft control batch). You will start seeing the same as soon as you start doing transactions. The kafka protocol allows for arbitrary control batches in the log. See this: and this: https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_msgset_reader.c#L860 |
Thanks for your feedback, I don't use transactions and I was not aware of these extra "control batch" messages. Is it possible to disable this first raft control batch? |
@bdelbosc - yeah, the java client should ignore control batches tho. cc: @mmaslankaprv and @dotnwat |
@bdelbosc i am reading your question again, and the kafka driver actually already handles that. That is, the consumer group advances the offset even if they are control batches. So to answer you question: you consume, and then check for the offset delta between the consumer group and tail of the log that should give you the number of offsets. but offsets != messages (it's very close) For example, as kafka keeps adding features, control batches will show up more and more in the log. |
I understand, still, when there is no transaction there is no "control batch" message and the lag can be given by the difference between I will check if the first offset of a topic is different from the first message offset in order to correct the lag (this check can be done only once). Thank you. |
Sure thing. That just happens to be how it is today. In the future, Kafka (upstream apache) may introduce arbitrary control batches in the log. It is part of the spec.
makes sense. can one do
Sure thing. I bet you can also introspect if it was a control batch or not. I haven't looked at the java api deeply, but it is usually pretty complete. There can be a couple of control batches if there is a lot of network instability, i.e.: we elect a new raft leader during some failure conditions (say you crash a node)
any time! |
This is causing problems when using the
consumer.endOffsets
API to get the number of records in a partition.Here a Java unit test that is failing with redpanda:
https://gist.github.com/bdelbosc/006ca2905f6922994c025f783e42afec
the expected output with Kafka:
The text was updated successfully, but these errors were encountered: