Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

Set subscription readiness based on consumer group status #182

Merged
merged 10 commits into from
Dec 1, 2020

Conversation

devguyio
Copy link
Contributor

@devguyio devguyio commented Nov 11, 2020

Fixes #134
Fixes #98
Fixes knative/eventing-contrib#1556

Proposed Changes

  • Set subscription readiness based on consumer group status in the consolidated Kafka
  • Set's KafkaChannel consolidated dispatcher to read-only
  • Enables HA in KafkaChannel consolidated dispatcher

Release Note

- A bug was fixed in the consolidated KafkaChannel where subscriptions would show up in the channel's `status.subscribers` before the dispatcher becomes ready to dispatch messages for those subscribers.
- Consolidated KafkaChannel dispatcher's horizontal scalability works now seamlessly with reconciler leader election.

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2020
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 11, 2020
@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 11, 2020
@codecov
Copy link

codecov bot commented Nov 12, 2020

Codecov Report

Merging #182 (d2d1e09) into master (e71a1b1) will decrease coverage by 0.77%.
The diff coverage is 62.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #182      +/-   ##
==========================================
- Coverage   75.99%   75.21%   -0.78%     
==========================================
  Files         112      114       +2     
  Lines        4274     4459     +185     
==========================================
+ Hits         3248     3354     +106     
- Misses        831      901      +70     
- Partials      195      204       +9     
Impacted Files Coverage Δ
...consolidated/reconciler/controller/kafkachannel.go 43.43% <28.84%> (-2.97%) ⬇️
pkg/channel/consolidated/kafka/admin.go 79.41% <79.41%> (ø)
...ed/reconciler/controller/consumer_group_watcher.go 79.71% <79.71%> (ø)
...annel/distributed/common/metrics/stats_reporter.go 25.00% <0.00%> (-45.84%) ⬇️
...el/distributed/dispatcher/dispatcher/dispatcher.go 68.08% <0.00%> (-7.45%) ⬇️
pkg/apis/bindings/v1beta1/kafka_types.go 100.00% <0.00%> (ø)
pkg/apis/bindings/v1alpha1/kafka_types.go 100.00% <0.00%> (ø)
pkg/apis/bindings/v1beta1/kafka_lifecycle.go 50.00% <0.00%> (ø)
pkg/apis/bindings/v1alpha1/kafka_lifecycle.go 50.00% <0.00%> (ø)
...rce/reconciler/source/resources/receive_adapter.go 100.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e71a1b1...d2d1e09. Read the comment docs.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2020
@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 20, 2020
@devguyio devguyio changed the title [WIP] [POC] Set subscription readiness based on consumer group status [POC] Set subscription readiness based on consumer group status Nov 23, 2020
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 23, 2020
@devguyio devguyio changed the title [POC] Set subscription readiness based on consumer group status [WIP] Set subscription readiness based on consumer group status Nov 23, 2020
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 23, 2020
@matzew
Copy link
Contributor

matzew commented Nov 30, 2020

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 30, 2020
@slinkydeveloper
Copy link
Contributor

/lgtm
robust work!

@pierDipi
Copy link
Member

pierDipi commented Nov 30, 2020

Very nice! I want this to become a library usable by the Broker impl as well! (not in this PR)

@@ -35,8 +35,5 @@ func main() {
ctx = injection.WithNamespaceScope(ctx, ns)
}

// Do not run the dispatcher in leader-election mode
ctx = sharedmain.WithHADisabled(ctx)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting that we keep leader election disabled since this is just a read only reconciler? are there down sides of enabling it and observing how it behaves in this mode? since now we set up the consumers on all replicas with ObserveKind and on the leader with ReconcileKind

Copy link
Member

@pierDipi pierDipi Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end result is the same since we're doing the same thing in both ObserveKind and ReconcileKind but we don't really need to have replicas competing to acquire locks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that the preference is to have HA enabled, with ReconcileKind and ObserveKind , over disabling HA and having replicas with ReconcileKind since this works as intended. I might be mistaken. In all cases, I'd appreciate if this detail doesn't stop the merging of the PR if it's functionally correct.

I'm working on implementing a mix of ReconcileKind and ObserveKind in the dispatcher in a PoC as was suggested here knative/eventing-contrib#1446 (comment) , so I need HA eventually anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need to have replicas competing to acquire locks.

That. Allowing HA seems confusing to me as there is no need it.

* Use sets.String for internal cache instead of string slice
* Allow for a single callback per watcher
* Synchronization on Terminate and Forget
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 1, 2020
@devguyio
Copy link
Contributor Author

devguyio commented Dec 1, 2020

@pierDipi @matzew all comments addressed

@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-sandbox-eventing-kafka-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/channel/consolidated/kafka/admin.go Do not exist 84.6%
pkg/channel/consolidated/reconciler/controller/consumer_group_watcher.go Do not exist 81.5%
pkg/channel/consolidated/reconciler/controller/kafkachannel.go 53.5% 52.0% -1.5

@@ -35,8 +35,5 @@ func main() {
ctx = injection.WithNamespaceScope(ctx, ns)
}

// Do not run the dispatcher in leader-election mode
ctx = sharedmain.WithHADisabled(ctx)

Copy link
Member

@pierDipi pierDipi Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end result is the same since we're doing the same thing in both ObserveKind and ReconcileKind but we don't really need to have replicas competing to acquire locks.

Comment on lines 611 to 621
func (r *Reconciler) FinalizeKind(ctx context.Context, kc *v1beta1.KafkaChannel) pkgreconciler.Event {
// Do not attempt retrying creating the client because it might be a permanent error
// in which case the finalizer will never get removed.
if kafkaClusterAdmin, err := r.createClient(ctx, kc); err == nil && r.kafkaConfig != nil {
if kafkaClusterAdmin, err := r.createClient(); err == nil && r.kafkaConfig != nil {
if err := r.deleteTopic(ctx, kc, kafkaClusterAdmin); err != nil {
return err
}
}
r.consumerGroupWatcher.Forget(string(kc.ObjectMeta.UID))
return newReconciledNormal(kc.Namespace, kc.Name) //ok to remove finalizer
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be called on non-leader replicas, so to properly clean up resources we have to implement ReadOnlyFinalizer or keep HA disabled and have only ReconcileKind and FinalizeKind.
I think the latter solution is better (less code, no wasted resource to acquire locks, etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierDipi sorry I got a little confused, this is the controller not the dispatcher. The controller runs in HA with a single leader, and stand-by replicas. We don't need this called on non-leader replicas cause they don't set watchers first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused too :)

But I think the comment is valid (in the wrong place) since the dispatcher doesn't implement Finalizer at all, and that's a bug that we can address later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue: #245

Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 1, 2020
@slinkydeveloper
Copy link
Contributor

/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: devguyio, pierDipi, slinkydeveloper

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2020
@knative-prow-robot knative-prow-robot merged commit 87238dc into knative-extensions:master Dec 1, 2020
@devguyio devguyio deleted the cgs-sync branch December 1, 2020 12:42
devguyio added a commit to devguyio/eventing-contrib that referenced this pull request Dec 1, 2020
Set subscription readiness based on consumer group status (knative#182)

    * WIP: Set subscription readiness based on consumer group status

    * Refactor KafkaWatcher to an edge-triggered, level driven design

    * Refactor the test to prevent a map race

    * goimports and gofmt comments

    * Add tests for the KafkaWatcher and reconnect for AdminClient

    * Improve consumer group watcher code aethetics and robustness

    * Make variable names more descriptive for consumer group watcher and adimn client
    * Forget watcher when channel is deleted
    * Terminate old watcher and free the cache when config is created

    * Set the KafkaChannel consolidated dispatcher reconciler to be read-only

    * Enable HA for KafkaChannel(consolidated) dispatcher

    * Run update-codegen.sh

    * Optimize consumer group watcher

    * Use sets.String for internal cache instead of string slice
    * Allow for a single callback per watcher
    * Synchronization on Terminate and Forget
knative-prow-robot pushed a commit to knative/eventing-contrib that referenced this pull request Dec 1, 2020
Set subscription readiness based on consumer group status (#182)

    * WIP: Set subscription readiness based on consumer group status

    * Refactor KafkaWatcher to an edge-triggered, level driven design

    * Refactor the test to prevent a map race

    * goimports and gofmt comments

    * Add tests for the KafkaWatcher and reconnect for AdminClient

    * Improve consumer group watcher code aethetics and robustness

    * Make variable names more descriptive for consumer group watcher and adimn client
    * Forget watcher when channel is deleted
    * Terminate old watcher and free the cache when config is created

    * Set the KafkaChannel consolidated dispatcher reconciler to be read-only

    * Enable HA for KafkaChannel(consolidated) dispatcher

    * Run update-codegen.sh

    * Optimize consumer group watcher

    * Use sets.String for internal cache instead of string slice
    * Allow for a single callback per watcher
    * Synchronization on Terminate and Forget
@matzew
Copy link
Contributor

matzew commented Dec 2, 2020

@devguyio can you backport this to the release-0.19 branch ?

devguyio added a commit to devguyio/eventing-kafka that referenced this pull request Dec 2, 2020
…tensions#182)

* WIP: Set subscription readiness based on consumer group status

* Refactor KafkaWatcher to an edge-triggered, level driven design

* Refactor the test to prevent a map race

* goimports and gofmt comments

* Add tests for the KafkaWatcher and reconnect for AdminClient

* Improve consumer group watcher code aethetics and robustness

* Make variable names more descriptive for consumer group watcher and adimn client
* Forget watcher when channel is deleted
* Terminate old watcher and free the cache when config is created

* Set the KafkaChannel consolidated dispatcher reconciler to be read-only

* Enable HA for KafkaChannel(consolidated) dispatcher

* Run update-codegen.sh

* Optimize consumer group watcher

* Use sets.String for internal cache instead of string slice
* Allow for a single callback per watcher
* Synchronization on Terminate and Forget
knative-prow-robot pushed a commit that referenced this pull request Dec 2, 2020
* WIP: Set subscription readiness based on consumer group status

* Refactor KafkaWatcher to an edge-triggered, level driven design

* Refactor the test to prevent a map race

* goimports and gofmt comments

* Add tests for the KafkaWatcher and reconnect for AdminClient

* Improve consumer group watcher code aethetics and robustness

* Make variable names more descriptive for consumer group watcher and adimn client
* Forget watcher when channel is deleted
* Terminate old watcher and free the cache when config is created

* Set the KafkaChannel consolidated dispatcher reconciler to be read-only

* Enable HA for KafkaChannel(consolidated) dispatcher

* Run update-codegen.sh

* Optimize consumer group watcher

* Use sets.String for internal cache instead of string slice
* Allow for a single callback per watcher
* Synchronization on Terminate and Forget
matzew pushed a commit to matzew/eventing-kafka that referenced this pull request Dec 9, 2020
…tensions#182) (knative-extensions#249)

* WIP: Set subscription readiness based on consumer group status

* Refactor KafkaWatcher to an edge-triggered, level driven design

* Refactor the test to prevent a map race

* goimports and gofmt comments

* Add tests for the KafkaWatcher and reconnect for AdminClient

* Improve consumer group watcher code aethetics and robustness

* Make variable names more descriptive for consumer group watcher and adimn client
* Forget watcher when channel is deleted
* Terminate old watcher and free the cache when config is created

* Set the KafkaChannel consolidated dispatcher reconciler to be read-only

* Enable HA for KafkaChannel(consolidated) dispatcher

* Run update-codegen.sh

* Optimize consumer group watcher

* Use sets.String for internal cache instead of string slice
* Allow for a single callback per watcher
* Synchronization on Terminate and Forget
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
7 participants