Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to decode some gzip messages #635

Closed
tcrayford opened this issue Mar 27, 2016 · 3 comments
Closed

fail to decode some gzip messages #635

tcrayford opened this issue Mar 27, 2016 · 3 comments

Comments

@tcrayford
Copy link
Contributor

Hi,

I've been fuzzing Sarama's message encoding/decoding with go-fuzz. I think this has been done in the past, but there still appear to be some issues left. I haven't tried any other parts of the protocol except message.go.

The fuzzer for now looks like this:

package sarama

import "fmt"
import "reflect"

func Fuzz(data []byte) int {
    for _, codec := range []CompressionCodec{CompressionNone, CompressionGZIP, CompressionSnappy} {
        // first check value encoding
        message := Message{
            Value: data,
            Codec: codec,
        }
        packet, err := encode(&message)
        if err != nil {
            panic(fmt.Sprintf("couldn't encode message with fuzzed value: %v %v", err, message))
        }

        var decoded Message
        err = decode(packet, &decoded)

        if err != nil {
            panic(fmt.Sprintf("couldn't decode message with fuzzed value: err=%v, message=%v data=%v codec=%v", err, packet, data, codec))
        }

        if !reflect.DeepEqual(decoded.Value, data) {
            panic(fmt.Sprintf("decoded incorrect message payload %v from %v, should have had %v codec=%v", decoded.Value, packet, data, codec))
        }

        if decoded.Codec != codec {
            panic(fmt.Sprintf("decoded incorrect codec whilst fuzzing value %v", decoded.Codec))
        }

        // second check using the fuzz input as a key
        message = Message{
            Key:   data,
            Value: []byte{},
            Codec: codec,
        }
        packet, err = encode(&message)
        if err != nil {
            panic(fmt.Sprintf("couldn't encode message with fuzzed key: %v %v", err, message))
        }

        err = decode(packet, &decoded)

        if err != nil {
            panic(fmt.Sprintf("couldn't decode message with fuzzed key: %v, %v", err, packet))
        }

        if !reflect.DeepEqual(decoded.Key, data) {
            panic(fmt.Sprintf("decoded incorrect message key %v from %v, should have had %v", decoded.Key, packet, data))
        }

        if decoded.Codec != codec {
            panic(fmt.Sprintf("decoded incorrect codec whilst fuzzing key %v", decoded.Codec))
        }
    }
    return 1

}

The only crashes I've found so far are with GZIP encoding, in which sarama errors with kafka: error decoding packet: unexpected messageFormat. Here's a sample panic message (split up with some newlines for readability).

panic: couldn't decode message with fuzzed value: err=kafka: error decoding packet: unexpected messageFormat 48
message=[12 40 129 46 0 1 255 255 255 255 0 0 0 37 31 139 8 0 0 9 110 136 0 255 90 96 240 238 189 65 151 129 129 193 94 3 8 0 4 0 0 255 255 101 149 29 154 17 0 0 0]
data=[160 48 238 239 48 138 48 48 48 189 48 48 48 48 48 48 48]
codec=1

Breaking the message up further, it looks like there's some kind of bug with sarama's parsing code:

message=[
12 40 129 46 crc
0 message format (note: not 48)
1 attribute (gzip encoded)
255 255 255 255 key (empty)
0 0 0 37 message length
31 139 8 0 0 9 110 136 0 255 90 96 240 238 189 65 151 129 129 193 94 3 8 0 4 0 0 255 255 101 149 29 154 17 0 0 0
]

I don't fully understand what's going on here, so I'm working on narrowing it down, but I felt like it'd be worth opening an issue in the meantime in case anybody runs into this in production. Can you see any issues with the fuzzer? I copied and pasted liberally from message_test.go, applying the obvious "serialization should roundtrip" property.

@eapache
Copy link
Contributor

eapache commented Mar 27, 2016

The Kafka protocol requires the contents of a compressed message to be another message-set. Please see the spec for more details.

@tcrayford
Copy link
Contributor Author

ah, so the fuzz is wrong then. Thank you. I'm somewhat familiar with the spec as of a few years ago (I was a client implementor back then), but haven't kept completely up to date with it. As a note, with the compression codec fixed to CompressionNone, sarama fuzzes this test perfectly (at least so far).

thanks.

@eapache
Copy link
Contributor

eapache commented Mar 28, 2016

A few other misc notes:

  • Yes, I did some fuzzing in the past (see e.g. Fix two decoding bugs found by go-fuzz #523). For that pass I took the approach of trying to decode the fuzzed data as a full packet, permitting decoding errors (since the packet is likely malformed) but searching for panics and other misbehaviours. Because of the approach I don't think I ever bothered fuzzing that serialization round-trips.
  • This code is simple and well-tested enough that I doubt you'll find anything interesting, but by all means keep trying and let us know if you do.
  • The kafka protocol is subtle and a bit weird especially around compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants