regression: enabling rackawareness causes severe throughput drops #2071

lizthegrey · 2021-11-26T20:33:15Z

Versions

Sarama	Kafka	Go
v1.29.0	v3.0.0 (Confluent 7)	v1.17.1

Regression has been bisected to 1aac8e5

See #1927 (comment) for another report than just mine.

Configuration

Pertinent config variables:

	c.conf = conf
	if (c.LD != nil && c.LD.BoolVariationCtx(
		ctx,
		launchdarkly.FlagKafkaRackAwareFollowerFetch,
		types.UserAndTeam{User: nil, Team: nil})) {
		c.rackID = os.Getenv("AZ")
	}

broker.rack=<%= node["ec2"]["availability_zone"] %>
replica.selector.class=<%= node["kafka"]["replica"]["selector"]["class"] %>

default['kafka']['replica']['selector']['class'] = "org.apache.kafka.common.replica.RackAwareReplicaSelector"

Problem Description

When AZ is not populated (due to a bug on our side, hah), or when FlagKafkaRackAwareFollowerFetch is false, things behave normally. But throughput drops by 75% if the flag is set and the Sarama library is at or past 1aac... https://share.getcloudapp.com/nOu54vNq
There is a corresponding lag in timestamps between producer and consumer: https://share.getcloudapp.com/DOu6LBAQ

The text was updated successfully, but these errors were encountered:

lizthegrey · 2021-11-26T20:33:42Z

@dnwe opened as per your request

Historically (before protocol version 11) if we attempted to consume from a follower, we would get a NotLeaderForPartition response and move our consumer to the new leader. However, since v11 the Kafka broker treats us just like any other follower and permits us to consume from any replica and it is up to us to monitor metadata to determine when the leadership has changed. Modifying the handleResponse func to check the topic partition leadership against the current broker (in the absence of a preferredReadReplica) and trigger a re-create of the consumer for that partition Contributes-to: #1927

dnwe · 2021-11-26T21:05:14Z

Thank you!

lizthegrey · 2021-12-01T16:46:37Z

anything I can do to assist here? rolling back the commit might help, but also would break that fix so...

dnwe · 2021-12-01T18:57:26Z

@lizthegrey apologies, I've actually been off sick this week so hadn't had a chance to code up a fix for this yet, but I do aim to look at it soon.

Essentially I believe the issue is that Kafka only ever computes a "preferred read replica" when your FetchRequest has gone to the leader of the partition and you've provided a client RackID. In that case the FetchResponse contains the preferred replica (if it differs from the leader) and omits any data, then a well behaved client should disconnect and start fetching from that preferred replica instead. However, when you then send FetchRequests to the follower, the "preferred read replica" field is omitted from its responses and hence #1936 kicks in because it thinks no preference exists and you're consuming from a non-leader, so it forces the consumer back to the real leader. Hence the massive drop in throughput you see as the consumer is flip flopping FetchResponses

FetchResponse from a follower will _not_ contain a PreferredReadReplica. It seems like the partitionConsumer would overwrite the previously assigned value from the leader with -1 which would then trigger the "reconnect to the current leader" changes from #1936 causing a flip-flop effect. Contributes-to: #2071 Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>

dnwe · 2021-12-01T19:16:22Z

@lizthegrey so I've pushed up #2076 as a tentative fix, but (full disclaimer) I haven't yet actually tried this out against a real cluster so I'll need to run through with some of that and then code up some test cases too before I can merge, but if you'd like to try it out in the meantime that would be really helpful :)

lizthegrey · 2021-12-01T19:52:27Z

We have test clusters for this exact reason. I will flip on and let you know.

lizthegrey · 2021-12-04T19:54:31Z

Fixed by #2076 and confirmed working in Honeycomb production.

mtj075 · 2023-04-13T08:08:11Z

@lizthegrey apologies, I've actually been off sick this week so hadn't had a chance to code up a fix for this yet, but I do aim to look at it soon.

Essentially I believe the issue is that Kafka only ever computes a "preferred read replica" when your FetchRequest has gone to the leader of the partition and you've provided a client RackID. In that case the FetchResponse contains the preferred replica (if it differs from the leader) and omits any data, then a well behaved client should disconnect and start fetching from that preferred replica instead. However, when you then send FetchRequests to the follower, the "preferred read replica" field is omitted from its responses and hence #1936 kicks in because it thinks no preference exists and you're consuming from a non-leader, so it forces the consumer back to the real leader. Hence the massive drop in throughput you see as the consumer is flip flopping FetchResponses

sorry, i think this sarama's code change made the consumer client ignore kafka's leader replica(preferred replica ) .Once the isr change ,the consumer client can not fetch from leader replica. kafka had fix preferred replica from isr. https://github.com/apache/kafka/pull/12877/files#diff-78812e247ffeae6f8c49b1b22506434701b1e1bafe7f92ef8f8708059e292bf0

dnwe mentioned this issue Dec 1, 2021

fix: only update preferredReadReplica if valid #2076

Merged

lizthegrey closed this as completed Dec 4, 2021

bpereto mentioned this issue Jan 25, 2022

UDP Socket Memory Limit hit / Overflow netsampler/goflow2#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression: enabling rackawareness causes severe throughput drops #2071

regression: enabling rackawareness causes severe throughput drops #2071

lizthegrey commented Nov 26, 2021 •

edited

Loading

lizthegrey commented Nov 26, 2021

dnwe commented Nov 26, 2021

lizthegrey commented Dec 1, 2021

dnwe commented Dec 1, 2021

dnwe commented Dec 1, 2021

lizthegrey commented Dec 1, 2021

lizthegrey commented Dec 4, 2021

mtj075 commented Apr 13, 2023 •

edited

Loading

regression: enabling rackawareness causes severe throughput drops #2071

regression: enabling rackawareness causes severe throughput drops #2071

Comments

lizthegrey commented Nov 26, 2021 • edited Loading

Versions

Configuration

Problem Description

lizthegrey commented Nov 26, 2021

dnwe commented Nov 26, 2021

lizthegrey commented Dec 1, 2021

dnwe commented Dec 1, 2021

dnwe commented Dec 1, 2021

lizthegrey commented Dec 1, 2021

lizthegrey commented Dec 4, 2021

mtj075 commented Apr 13, 2023 • edited Loading

lizthegrey commented Nov 26, 2021 •

edited

Loading

mtj075 commented Apr 13, 2023 •

edited

Loading