Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove vmi/vm watch and reduce default sync time to 90 seconds #216

Merged
merged 1 commit into from
Jan 10, 2023

Conversation

davidvossel
Copy link
Contributor

Issue #100 and #78 have two things in common, they both involve issues derived from tracking VM/VMIs on external infra.

So far, we've been treating external infra as an advanced usecase, and not the default use case. The result is that external infra has been treated as a second class citizen to the use case where the capi components and VM/VMIs are running on the same k8s cluster.

I'd like to return to a single code path that satisfies the use case where the capk/capi components are running on the same cluster as the VM/VMIs (centralized infra use case) and the use case where the controller components run on a separate cluster from the VM/VMIs (external infra use case)

To achieve this, we can't assume that the VM/VMI objects are even registered on the same cluster as the capi/capk controllers. Which means we can't watch these objects by default using the default in cluster config. I propose we return back to depending on syncing the KubeVirtMachine and KubeVirtCluster objects regularly in order to pickup VM/VMI changes (basically polling). For polling to be responsive enough to pick up things like IP changes after a VM reboot, I think we should lower the default polling interval to 60 seconds.

Reduce default sync time to 90 seconds for reconcile loops

Signed-off-by: David Vossel <davidvossel@gmail.com>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 6, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 6, 2023
@davidvossel
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 6, 2023
@coveralls
Copy link

Pull Request Test Coverage Report for Build 3856719842

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.2%) to 52.655%

Totals Coverage Status
Change from base Build 3599569808: 1.2%
Covered Lines: 952
Relevant Lines: 1808

💛 - Coveralls

@qinqon
Copy link
Contributor

qinqon commented Jan 6, 2023

If think we can do both, so we have responsiveness and we don't miss stuff, what do you think ?

@davidvossel
Copy link
Contributor Author

If think we can do both, so we have responsiveness and we don't miss stuff, what do you think ?

How would you do this.

@qinqon
Copy link
Contributor

qinqon commented Jan 9, 2023

If think we can do both, so we have responsiveness and we don't miss stuff, what do you think ?

How would you do this.

You have a pair of "knobs":

I think with both you can have both features.

@davidvossel
Copy link
Contributor Author

davidvossel commented Jan 9, 2023

I think with both you can have both features.

The problem is that we need the capk controller to be able to function when the VM/VMI objects are not registered in the infra cluster. The watches fail when there are no VM/VMI crds registered.

I could dynamically detect of vm/vmis are present at launch, and only watch if they are, but i'd rather have a single code path to test rather than multiple (one when crds are available, one when they are not). So just using a reduced sync period with no watches seems to satisfy that requirement (at the cost of efficiency, which might be okay for us at our current scale)

@qinqon
Copy link
Contributor

qinqon commented Jan 9, 2023

I think with both you can have both features.

The problem is that we need the capk controller to be able to function when the VM/VMI objects are not registered in the infra cluster. The watches fail when there are no VM/VMI crds registered.

I could dynamically detect of vm/vmis are present at launch, and only watch if they are, but i'd rather have a single code path to test rather than multiple (one when crds are available, one when they are not). So just using a reduced sync period with no watches seems to satisfy that requirement (at the cost of efficiency, which might be okay for us at our current scale)

An alternative is to run the polling for external clusters and as a result enqueue to the controller so they enter the Reconcile function, this way we don't have to remove the watchers.

@qinqon
Copy link
Contributor

qinqon commented Jan 10, 2023

/lgtm
Let's hope we don't hit control plane too much.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit cb28bf6 into kubernetes-sigs:main Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants