Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-13558: fix: refactor node removal controller #479

Conversation

jakobmoellerdev
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev commented Nov 8, 2023

This refactors the node removal controller to not require a finalizer anymore on the node.

It does this by watching LVMVolumeGroupNodeStatus instead of Node as its base reconciled object. This allows us to drop the Field Indexer for Nodes and also 2 permissions on the Node reconciler. Additionally it should make the deletion much more stable and we now run the reconciler not only on SNO, but on all topologies because even SNO can technically add worker nodes.

Cases:

  1. Node and NodeStatus are present - nothing happens
  2. Node is present but NodeStatus isn't - removal reconciler ignores the node because its not finding a status to check
  3. NodeStatus is present but Node isn't - removal controller fetches the node status on startup and on cache refresh, sees that the node is gone, and triggers a delete for the node status
  4. NodeStatus is present and Node is about to get deleted - removal controller gets an event for the Node deletion and immediately reacts by removing the node status

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 8, 2023
@openshift-ci-robot
Copy link

@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13588, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to target the "4.15.0" version, but no target version was set
  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is Closed (Not a Bug) instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This refactors the node removal controller to not require a finalizer anymore on the node.

It does this by watching LVMVolumeGroupNodeStatus instead of Node as its base reconciled object.

Cases:

  1. Node and NodeStatus are present - nothing happens
  2. Node is present but NodeStatus isn't - removal reconciler ignores the node because its not finding a status to check
  3. NodeStatus is present but Node isn't - removal controller fetches the node status on startup and on cache refresh, sees that the node is gone, and triggers a delete for the node status
  4. NodeStatus is present and Node is about to get deleted - removal controller gets an event for the Node deletion and immediately reacts by removing the node status

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 8, 2023
Copy link
Contributor

openshift-ci bot commented Nov 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jakobmoellerdev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2023
Signed-off-by: Jakob Möller <jmoller@redhat.com>
@jakobmoellerdev jakobmoellerdev force-pushed the OCPBUGS-13588-always-start-node-removal-controller branch from 028b28c to 18828f8 Compare November 8, 2023 11:10
@jakobmoellerdev jakobmoellerdev changed the title OCPBUGS-13588: fix: refactor node removal controller OCPBUGS-13558: fix: refactor node removal controller Nov 8, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Nov 8, 2023
@openshift-ci-robot
Copy link

@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @radeore

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This refactors the node removal controller to not require a finalizer anymore on the node.

It does this by watching LVMVolumeGroupNodeStatus instead of Node as its base reconciled object. This allows us to drop the Field Indexer for Nodes and also 2 permissions on the Node reconciler. Additionally it should make the deletion much more stable and we now run the reconciler not only on SNO, but on all topologies because even SNO can technically add worker nodes.

Cases:

  1. Node and NodeStatus are present - nothing happens
  2. Node is present but NodeStatus isn't - removal reconciler ignores the node because its not finding a status to check
  3. NodeStatus is present but Node isn't - removal controller fetches the node status on startup and on cache refresh, sees that the node is gone, and triggers a delete for the node status
  4. NodeStatus is present and Node is about to get deleted - removal controller gets an event for the Node deletion and immediately reacts by removing the node status

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 8, 2023
@openshift-ci openshift-ci bot requested a review from radeore November 8, 2023 11:12
@jakobmoellerdev
Copy link
Contributor Author

/retest

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

2 similar comments
@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

@jakobmoellerdev
Copy link
Contributor Author

/testlvm-operator-e2e-aws

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

1 similar comment
@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

@jakobmoellerdev
Copy link
Contributor Author

/retest

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

@suleymanakbas91
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD cf96cf3 and 2 for PR HEAD 18828f8 in total

@jakobmoellerdev
Copy link
Contributor Author

/test lvm-operator-e2e-aws

Copy link
Contributor

openshift-ci bot commented Nov 8, 2023

@jakobmoellerdev: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 7e6c201 into openshift:main Nov 8, 2023
@openshift-ci-robot
Copy link

@jakobmoellerdev: Jira Issue OCPBUGS-13558: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-13558 has been moved to the MODIFIED state.

In response to this:

This refactors the node removal controller to not require a finalizer anymore on the node.

It does this by watching LVMVolumeGroupNodeStatus instead of Node as its base reconciled object. This allows us to drop the Field Indexer for Nodes and also 2 permissions on the Node reconciler. Additionally it should make the deletion much more stable and we now run the reconciler not only on SNO, but on all topologies because even SNO can technically add worker nodes.

Cases:

  1. Node and NodeStatus are present - nothing happens
  2. Node is present but NodeStatus isn't - removal reconciler ignores the node because its not finding a status to check
  3. NodeStatus is present but Node isn't - removal controller fetches the node status on startup and on cache refresh, sees that the node is gone, and triggers a delete for the node status
  4. NodeStatus is present and Node is about to get deleted - removal controller gets an event for the Node deletion and immediately reacts by removing the node status

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants