Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

producer: ensure that the management message (fin) is never "leaked" #2181

Closed
wants to merge 1 commit into from

Conversation

niamster
Copy link
Contributor

Since async producer now support multiple inflight messages
thanks to #1686 and #2094, it now may "leak"
the "fin" internal management message to Kafka (and to the client)
when broker producer is reconnecting to Kafka broker and retries
multiple inflight messages at the same time.

Steps to reproduce: start async producer and restart Kafka broker (potentially multiple times).
This trigger leader update and retires current broker producer.
There's a chance that newly created broker producer is used to retry an
inflight message so it is also marked as "dying" but it does not contain
any explicit errors.
This results in a "fin" message to be sent to Kafka and reported as successful
operation to the client.

Versions affected: v1.31.0 and above.

Sarama logs of the event (with the patch applied):

[Sarama] 2022/03/16 09:54:31 producer/broker/10001 starting up
[Sarama] 2022/03/16 09:54:31 producer/broker/10001 state change to [open] on my-topic/26
[Sarama] 2022/03/16 09:54:31 producer/leader/my-topic/26 selected broker 10001
[Sarama] 2022/03/16 09:54:31 producer/leader/my-topic/26 state change to [flushing-1]
[Sarama] 2022/03/16 09:54:31 producer/leader/my-topic/26 state change to [normal]
[Sarama] 2022/03/16 09:54:31 producer/leader/my-topic/26 state change to [retrying-1]
[Sarama] 2022/03/16 09:54:31 producer/broker/10001 state change to [dying-1] on my-topic/26
[Sarama] 2022/03/16 09:54:31 producer/leader/my-topic/26 abandoning broker 10001
[Sarama] 2022/03/16 09:54:31 producer/broker/10001 input chan closed
[Sarama] 2022/03/16 09:54:33 producer/broker/10001 state change to [open] on my-topic/26
[Sarama] 2022/03/16 09:54:33 producer/leader/my-topic/26 selected broker 10001
[Sarama] 2022/03/16 09:54:33 producer/leader/my-topic/26 state change to [flushing-1]
[Sarama] 2022/03/16 09:54:33 producer/leader/my-topic/26 state change to [normal]

Kafka version: 2.7.1
Go version: 1.17.7

Kafka producer config:

cfg.Producer.Retry.Backoff = 2 * time.Second
cfg.Producer.Return.Successes = true
cfg.Producer.Return.Errors = true
cfg.Producer.Flush.Messages = 1000
cfg.Producer.Flush.Frequency = 100 * time.Millisecond
cfg.ChannelBufferSize = 1024

Since async producer now support multiple inflight messages
thanks to IBM#1686 and
IBM#2094, it now may "leak"
the "fin" internal management message to Kafka (and to the client)
when broker producer is reconnecting to Kafka broker and retries
multiple inflight messages at the same time.
@niamster
Copy link
Contributor Author

Hum, CLA CI check still fails 🤔

@niamster niamster closed this Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant