Refactor subscriber #121

horkhe · 2017-10-23T19:06:07Z

Subscriber used to set watches only on the change in the number of consumer group members. To detect changes in member subscriptions (list of topics they consume) a trick had been used to remove registration and then add it again with an updated topic list. To avoid unnecessary rebalancing on such temporary deregistrations an internal RebalancingDelay timeout was used to postpone rebalancing in expectation that registration may follow.

In this PR we vendored a mailgun branch of kazoo library that allows setting watches on member subscriptions so that deregistration to update a list of topics became unnecessary. As a result the logic became simpler.

Besides a several unnecessary optimizations were removed and one of them was a cause of #120.

thrawn01 · 2017-10-23T19:46:47Z

config/config.go

@@ -419,7 +417,7 @@ func defaultProxyWithClientID(clientID string) *Proxy {
 	c.Producer.RetryMax = 6
 	c.Producer.ShutdownTimeout = 30 * time.Second

-	c.Consumer.AckTimeout = 15 * time.Second
+	c.Consumer.AckTimeout = 45 * time.Second


Why are the Ack timeouts so aggressive, If scout moves to a model of only ack after processing an event successfully we will hit this timeout in the case of a network outage or temporary disruption. In the case of a temporary outage, this will result in the event being processed twice by scout. It is better to have a longer timeout to mitigate such instances. 300 seconds is relatively standard for network communication. In scouts case gocql has a network "connectivity" timeout of 30 seconds, but will retry each node in the cluster with a 3 seconds connection timeout. Which means by the time gocql has returned an error, scout will log the event as un-processed and will not ack the event. scout will then ask pixy for the same event again and try processing the event again, all within the 300 seconds ack timeout window.

Also, shouldn't c.Consumer.SubscriptionTimeout be larger than the AckTimeout ?

Your thoughts please,

We can make it 300 seconds, no problem. But note that we already override this default in Radar setting it to 90 seconds.

There used to be a requirement to make SubscriptionTimeout larger than AckTimeout for otherwise AckTimeout was not honored on rebalancing, but there is no such requirement anymore. A relation between the timeouts is as follows: if consume requests for a group-topic stop coming for SubscritionTimeout, then rebalancing will happen as soon as the last pending request is acked, or AckTimeout elapses. Until rebalancing is triggered any incoming requests will reset SubscritionTimeout and consumption of the group-topic will be resumed.

Note that Scout consumes events in auto-ack mode so AckTimeout is not used.

We used to set watch only on the group membership. To notify group members about subscription changes members had to deregister and then register again.

thrawn01 · 2017-10-24T16:29:18Z

Alot of this is over my head, because I don't have a full running theory of how kafka-pixy works. At some point in the future I would love to corner you and ask a bunch of questions. =)

thrawn01 suggested changes Oct 23, 2017

View reviewed changes

horkhe added 6 commits October 24, 2017 10:04

Adjust some default config values

92e7492

Do not ignore redundant updates (fixes #120)

9c89cb8

Vendor latest zookeeper-go and kazoo

eddc23a

Switch to Mailgun clone of wvanbergen/kazoo

1393a2f

Watch for group member subscription changes

d2bf590

We used to set watch only on the group membership. To notify group members about subscription changes members had to deregister and then register again.

Build with Golang v1.9.1

615fc7e

horkhe force-pushed the maxim/develop branch from 4107324 to 615fc7e Compare October 24, 2017 07:35

horkhe added 2 commits October 24, 2017 11:04

Fix flaky subscriber TestMembershipChanges test

446d6f1

Report triggered ZooKeeper watch in log

10958c6

thrawn01 approved these changes Oct 24, 2017

View reviewed changes

horkhe added 2 commits October 25, 2017 11:55

Make use of kazoo UpdateRegistration function

93c0b10

Resubmit topics on claim failure

e645105

horkhe force-pushed the maxim/develop branch from 16b6ea1 to e645105 Compare October 25, 2017 11:58

horkhe merged commit 6354fd1 into master Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor subscriber #121

Refactor subscriber #121

horkhe commented Oct 23, 2017 •

edited

Loading

thrawn01 Oct 23, 2017

horkhe Oct 23, 2017

thrawn01 commented Oct 24, 2017

Refactor subscriber #121

Refactor subscriber #121

Conversation

horkhe commented Oct 23, 2017 • edited Loading

thrawn01 Oct 23, 2017

Choose a reason for hiding this comment

horkhe Oct 23, 2017

Choose a reason for hiding this comment

thrawn01 commented Oct 24, 2017

horkhe commented Oct 23, 2017 •

edited

Loading