Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First offset starts at 1 instead of 0 #1135

Closed
bdelbosc opened this issue Apr 12, 2021 · 6 comments
Closed

First offset starts at 1 instead of 0 #1135

bdelbosc opened this issue Apr 12, 2021 · 6 comments

Comments

@bdelbosc
Copy link

bdelbosc commented Apr 12, 2021

This is causing problems when using the consumer.endOffsets API to get the number of records in a partition.

Here a Java unit test that is failing with redpanda:
https://gist.github.com/bdelbosc/006ca2905f6922994c025f783e42afec

CREATE topic test-1618228135744-testSendMessage
Message appended to topic: test-1618228135744-testSendMessage partition:0  offset:1
java.lang.AssertionError: 
Expected :1
Actual   :2

the expected output with Kafka:

CREATE topic test-1618227819300-testSendMessage
Message appended to topic: test-1618227819300-testSendMessage partition:0  offset:0
DELETE topic test-1618227819300-testSendMessage
@emaxerrno
Copy link
Contributor

@bdelbosc this is not a bug. the first batch is a control batch (raft control batch). You will start seeing the same as soon as you start doing transactions. The kafka protocol allows for arbitrary control batches in the log.

See this:

IBM/sarama#1898

and this:

https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_msgset_reader.c#L860

@bdelbosc
Copy link
Author

bdelbosc commented Apr 12, 2021

Thanks for your feedback, I don't use transactions and I was not aware of these extra "control batch" messages.
Unfortunately KafkaConsumer#beginningOffsets returns the offset of this "control batch" (offset 0 in my case)
so I have no proper way to know the number of records in a partition.

Is it possible to disable this first raft control batch?
or is there a way from the client to know that the target cluster is a redpanda, so it can avoid this off-by-one problem?

@emaxerrno
Copy link
Contributor

@bdelbosc - yeah, the java client should ignore control batches tho.

cc: @mmaslankaprv and @dotnwat

@emaxerrno
Copy link
Contributor

@bdelbosc i am reading your question again, and the kafka driver actually already handles that.

That is, the consumer group advances the offset even if they are control batches.

So to answer you question: you consume, and then check for the offset delta between the consumer group and tail of the log that should give you the number of offsets. but offsets != messages (it's very close)

For example, as kafka keeps adding features, control batches will show up more and more in the log.

@bdelbosc
Copy link
Author

I understand, still, when there is no transaction there is no "control batch" message and the lag can be given by the difference between KafkaConsumer#endOffsets and KafkaConsumer#committed (or KafkaConsumer#beginningOffsets if the consumer group has not committed any position).
This is efficient because there is no need to call KafkaConsumer#poll and the lag can be reported outside of the consumer processing.

I will check if the first offset of a topic is different from the first message offset in order to correct the lag (this check can be done only once).

Thank you.

@emaxerrno
Copy link
Contributor

I understand, still, when there is no transaction there is no "control batch" message and the lag can be given by the

Sure thing. That just happens to be how it is today. In the future, Kafka (upstream apache) may introduce arbitrary control batches in the log. It is part of the spec.

difference between KafkaConsumer#endOffsets and KafkaConsumer#committed (or KafkaConsumer#beginningOffsets if the consumer group has not committed any position).
This is efficient because there is no need to call KafkaConsumer#poll and the lag can be reported outside of the consumer processing.

makes sense. can one do poll(0) ?

I will check if the first offset of a topic is different from the first message offset in order to correct the lag (this check can be done only once).

Sure thing. I bet you can also introspect if it was a control batch or not. I haven't looked at the java api deeply, but it is usually pretty complete. There can be a couple of control batches if there is a lot of network instability, i.e.: we elect a new raft leader during some failure conditions (say you crash a node)

Thank you.

any time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants