Sarama (probably) incorrectly assumes FetchResponseV4 contains RecordBatches when it contains MessageSets #988

jurriaan · 2017-11-21T15:06:42Z

I got the error kafka: insufficient data to decode packet, more bytes expected while consuming from some partitions. I tracked this down (by placing panics in real_decoder.go, because there were no stack traces) to this line:
https://github.com/Shopify/sarama/blob/master/record_batch.go#L153

Somehow the array length was corrupted here. I dumped the raw response from kafka and compared it to the RecordBatch spec. It didn't match up, the magic number was incorrect (should be 2 according to https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol). But it was 1.

I eventually worked out that this thing Sarama was trying to decode wasn't a RecordBatch but a MessageSet.

It seems that https://github.com/Shopify/sarama/blob/master/fetch_response.go#L83 is not the correct way of determining if the response contains a messageset or a recordbatch.

If I understand the following comment in https://github.com/apache/kafka/blob/1.0/clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java#L94-L95 correctly it should be possible for FetchResponse v4 to contain the older format:

// The v4 Fetch Response adds features for transactional consumption (the aborted transaction list and the
// last stable offset). It also exposes messages with magic v2 (along with older formats).

I worked around this issue by switching back the version to V0_10_0_0 for now.

The text was updated successfully, but these errors were encountered:

jurriaan · 2017-11-21T15:10:06Z

cc @wladh

eapache · 2017-11-21T16:38:39Z

So, it looks like the correct thing to do is to peek 16 bytes forward and check the version, since it's at the same offset regardless of the format?

jurriaan · 2017-11-21T21:16:29Z

@eapache Exactly, I think that's why they designed it this way.

Dived into the Kafka source to see how it's handled there, and they also use the version (magic byte) for this:

https://github.com/apache/kafka/blob/06e662c0fd527c1175008f31cb8a9ba9c0c5ddb6/clients/src/main/java/org/apache/kafka/common/record/ByteBufferLogInputStream.java#L57-L69 :

        byte magic = buffer.get(buffer.position() + MAGIC_OFFSET);


        ByteBuffer batchSlice = buffer.slice();
        batchSlice.limit(batchSize);
        buffer.position(buffer.position() + batchSize);


        if (magic < 0 || magic > RecordBatch.CURRENT_MAGIC_VALUE)
            throw new CorruptRecordException("Invalid magic found in record: " + magic);


        if (magic > RecordBatch.MAGIC_VALUE_V1)
            return new DefaultRecordBatch(batchSlice);
        else
            return new AbstractLegacyRecordBatch.ByteBufferLegacyRecordBatch(batchSlice);

tcrayford · 2017-11-22T12:54:49Z

@jurriaan at a guess, I'd think it's potentially possible for kafka to return the older format if you upgraded your broker from a previous version to a 0.11+ version. Does that seem to fit with your case?

jurriaan · 2017-11-22T17:22:58Z

@tcrayford correct, forgot to mention that, before this happened I updated the message version in the broker config from 0.10 to 1.0

wladh · 2017-11-28T14:50:50Z

Yeah, it seems I was under mistaken assumption that the messages would be converted to v2 (it seems the conversion only happens downwards).
Are you preparing a PR or should I do it? (sorry for the delayed response, I was away for a couple of weeks).

eapache · 2017-11-28T15:20:02Z

No problem. I haven't had time to look at this properly yet, so if you want to grab it that would be great. You probably have more context in that chunk of code still anyway.

wladh mentioned this issue Nov 28, 2017

Determine the records type based on the magic number not API version #990

Merged

eapache closed this as completed Nov 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sarama (probably) incorrectly assumes FetchResponseV4 contains RecordBatches when it contains MessageSets #988

Sarama (probably) incorrectly assumes FetchResponseV4 contains RecordBatches when it contains MessageSets #988

jurriaan commented Nov 21, 2017 •

edited

Loading

jurriaan commented Nov 21, 2017

eapache commented Nov 21, 2017

jurriaan commented Nov 21, 2017

tcrayford commented Nov 22, 2017

jurriaan commented Nov 22, 2017

wladh commented Nov 28, 2017 •

edited

Loading

eapache commented Nov 28, 2017

Sarama (probably) incorrectly assumes FetchResponseV4 contains RecordBatches when it contains MessageSets #988

Sarama (probably) incorrectly assumes FetchResponseV4 contains RecordBatches when it contains MessageSets #988

Comments

jurriaan commented Nov 21, 2017 • edited Loading

jurriaan commented Nov 21, 2017

eapache commented Nov 21, 2017

jurriaan commented Nov 21, 2017

tcrayford commented Nov 22, 2017

jurriaan commented Nov 22, 2017

wladh commented Nov 28, 2017 • edited Loading

eapache commented Nov 28, 2017

jurriaan commented Nov 21, 2017 •

edited

Loading

wladh commented Nov 28, 2017 •

edited

Loading