Remove vmi/vm watch and reduce default sync time to 90 seconds #216

davidvossel · 2023-01-06T16:18:31Z

Issue #100 and #78 have two things in common, they both involve issues derived from tracking VM/VMIs on external infra.

So far, we've been treating external infra as an advanced usecase, and not the default use case. The result is that external infra has been treated as a second class citizen to the use case where the capi components and VM/VMIs are running on the same k8s cluster.

I'd like to return to a single code path that satisfies the use case where the capk/capi components are running on the same cluster as the VM/VMIs (centralized infra use case) and the use case where the controller components run on a separate cluster from the VM/VMIs (external infra use case)

To achieve this, we can't assume that the VM/VMI objects are even registered on the same cluster as the capi/capk controllers. Which means we can't watch these objects by default using the default in cluster config. I propose we return back to depending on syncing the KubeVirtMachine and KubeVirtCluster objects regularly in order to pickup VM/VMI changes (basically polling). For polling to be responsive enough to pick up things like IP changes after a VM reboot, I think we should lower the default polling interval to 60 seconds.

Reduce default sync time to 90 seconds for reconcile loops

Signed-off-by: David Vossel <davidvossel@gmail.com>

k8s-ci-robot · 2023-01-06T16:18:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [davidvossel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

davidvossel · 2023-01-06T16:20:32Z

/ok-to-test

coveralls · 2023-01-06T16:20:57Z

Pull Request Test Coverage Report for Build 3856719842

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+1.2%) to 52.655%

Totals
Change from base Build 3599569808:	1.2%
Covered Lines:	952
Relevant Lines:	1808

💛 - Coveralls

qinqon · 2023-01-06T16:56:09Z

If think we can do both, so we have responsiveness and we don't miss stuff, what do you think ?

davidvossel · 2023-01-06T17:13:44Z

If think we can do both, so we have responsiveness and we don't miss stuff, what do you think ?

How would you do this.

qinqon · 2023-01-09T08:23:57Z

If think we can do both, so we have responsiveness and we don't miss stuff, what do you think ?

How would you do this.

You have a pair of "knobs":

The global SyncPeriod that defaults to 10h "https://github.com/kubernetes-sigs/controller-runtime/blob/007d240d6c95ce87ba01e7a679e5b85ba294fdab/pkg/cluster/cluster.go#L98"
And you can return at RequeueAfter at the Reconcile response https://github.com/kubernetes-sigs/controller-runtime/blob/d3120b6cf6a68705cdec8646fbc9e5b9f6855c77/pkg/reconcile/reconcile.go#L33

I think with both you can have both features.

davidvossel · 2023-01-09T15:49:48Z

I think with both you can have both features.

The problem is that we need the capk controller to be able to function when the VM/VMI objects are not registered in the infra cluster. The watches fail when there are no VM/VMI crds registered.

I could dynamically detect of vm/vmis are present at launch, and only watch if they are, but i'd rather have a single code path to test rather than multiple (one when crds are available, one when they are not). So just using a reduced sync period with no watches seems to satisfy that requirement (at the cost of efficiency, which might be okay for us at our current scale)

qinqon · 2023-01-09T15:57:50Z

I think with both you can have both features.

The problem is that we need the capk controller to be able to function when the VM/VMI objects are not registered in the infra cluster. The watches fail when there are no VM/VMI crds registered.

I could dynamically detect of vm/vmis are present at launch, and only watch if they are, but i'd rather have a single code path to test rather than multiple (one when crds are available, one when they are not). So just using a reduced sync period with no watches seems to satisfy that requirement (at the cost of efficiency, which might be okay for us at our current scale)

An alternative is to run the polling for external clusters and as a result enqueue to the controller so they enter the Reconcile function, this way we don't have to remove the watchers.

qinqon · 2023-01-10T07:14:41Z

/lgtm
Let's hope we don't hit control plane too much.

Remove vmi/vm watch and reduce default sync time to 90 seconds

6e31f47

Signed-off-by: David Vossel <davidvossel@gmail.com>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 6, 2023

k8s-ci-robot requested review from nunnatsa and qinqon January 6, 2023 16:18

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 6, 2023

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 6, 2023

davidvossel mentioned this pull request Jan 6, 2023

Option to poll externally managed VMs on a recurring interval #189

Closed

k8s-ci-robot assigned qinqon Jan 10, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2023

k8s-ci-robot merged commit cb28bf6 into kubernetes-sigs:main Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove vmi/vm watch and reduce default sync time to 90 seconds #216

Remove vmi/vm watch and reduce default sync time to 90 seconds #216

davidvossel commented Jan 6, 2023

k8s-ci-robot commented Jan 6, 2023

davidvossel commented Jan 6, 2023

coveralls commented Jan 6, 2023

qinqon commented Jan 6, 2023

davidvossel commented Jan 6, 2023

qinqon commented Jan 9, 2023

davidvossel commented Jan 9, 2023 •

edited

Loading

qinqon commented Jan 9, 2023 •

edited

Loading

qinqon commented Jan 10, 2023

Remove vmi/vm watch and reduce default sync time to 90 seconds #216

Remove vmi/vm watch and reduce default sync time to 90 seconds #216

Conversation

davidvossel commented Jan 6, 2023

k8s-ci-robot commented Jan 6, 2023

davidvossel commented Jan 6, 2023

coveralls commented Jan 6, 2023

Pull Request Test Coverage Report for Build 3856719842

💛 - Coveralls

qinqon commented Jan 6, 2023

davidvossel commented Jan 6, 2023

qinqon commented Jan 9, 2023

davidvossel commented Jan 9, 2023 • edited Loading

qinqon commented Jan 9, 2023 • edited Loading

qinqon commented Jan 10, 2023

davidvossel commented Jan 9, 2023 •

edited

Loading

qinqon commented Jan 9, 2023 •

edited

Loading