Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cleanup volume groups on CR deletion #53

Closed
wants to merge 1 commit into from
Closed

feat: cleanup volume groups on CR deletion #53

wants to merge 1 commit into from

Conversation

leelavg
Copy link
Contributor

@leelavg leelavg commented Dec 30, 2021

Signed-off-by: Leela Venkaiah G lgangava@redhat.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 30, 2021
@leelavg leelavg requested review from sp98 and nbalacha and removed request for sp98 and nbalacha December 30, 2021 07:14
// get limited number of logical volume CRs to verify their presence
lvs := &topolvmv1.LogicalVolumeList{}
if err := r.Client.List(ctx, lvs, &crc.ListOptions{Limit: int64(1)}); err == nil {
// do not proceed with cleanup if there are logicalVolumes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we suppose to check if the LV CR is mapped with the VG somehow? Right now we are aborting VG deletion if at least one LV CR is found. This won't be very useful in case of the cleaning up individual VGs (for example user removes a VG from the spec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • that can be tweaked but I don't want to do that in this PR
  • the flow will be more or less same however in controller itself there'll be some extra checks as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the question would be even if the user is deleting the entire cluster, do we want to skip all VG deletion even if one LV CR is available? or we delete whatever VGs can be deleted in the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, until we have implementation of device selector and specific vg deletion, I'd like to abort any deletion.

When we implement the option 2, we can spend time on extra processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have a vgCleanupAll function which calls a vgCleanup (vgname) for each vg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current function does the same, it'll remove volume group one by one based on vgName in lvmd config file

}

// set node uid as finalizer on CR
if !controllerutil.ContainsFinalizer(lvmCluster, vgFinalizer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use NodeName itself as the finalizer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • IMHO, UID is the standard way for these type of tasks.
  • We can customize the finalizer to have prefix node-uid/<uid> for easy ref

pkg/vgmanager/vgmanager_controller.go Outdated Show resolved Hide resolved
pkg/vgmanager/vgmanager_controller.go Outdated Show resolved Hide resolved
pkg/vgmanager/lvm.go Outdated Show resolved Hide resolved
controllers/vgmanager_daemonset.go Outdated Show resolved Hide resolved
// get limited number of logical volume CRs to verify their presence
lvs := &topolvmv1.LogicalVolumeList{}
if err := r.Client.List(ctx, lvs, &crc.ListOptions{Limit: int64(1)}); err == nil {
// do not proceed with cleanup if there are logicalVolumes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have a vgCleanupAll function which calls a vgCleanup (vgname) for each vg.

}

// get all the volume groups from lvmd file
lvmdConfig := &lvmCMD.Config{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a global var in the package or an argument to the function. Or an arg in the reconciler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should this be the case? my rationale is, if we fail to delete any VG then we need to read remaining content from config file in next reconcile and then this struct has to be re-declared again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct should be the same one created by the VGmanager when creating the file. Don't regenerate it each time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually scratch the last comment. What we should be doing is saving the socket and filename that was used when the file was created and using those in the struct.

pkg/vgmanager/cleanup.go Show resolved Hide resolved
pkg/vgmanager/cleanup.go Show resolved Hide resolved
pkg/vgmanager/vgmanager_controller.go Outdated Show resolved Hide resolved
func RemoveVolumeGroup(name string) error {

// find volume group which contains name and list of PVs belonging to VG
vg, err := FindVolumeGroup(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this pr but don't export this function if it is not being used outside the pkg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • there are many functions like this, will take this in a separate PR

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 1, 2022
@leelavg leelavg marked this pull request as ready for review January 6, 2022 11:26
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 6, 2022
var fileName = "vg-pool-"
var vgName = "test-vg"

func setup(name string) string { //nolint:golint,unused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the test creating a new VG and deleting that?
If yes, then do we really need to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Currently this test will not work in CI as nsenter doesn't work and being skipped, need to refactor that to be testable
  2. The intention is to create a volume group manually (eventually do that via vgmanager functions) and remove that via function being used in this PR, incase if deletion fails teardown will remove it quietly

Copy link
Contributor

@jmolmo jmolmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok to me, just take a look to my comment about the error log when findvolume fails.

pkg/vgmanager/cleanup.go Show resolved Hide resolved
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmolmo, leelavg

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@leelavg leelavg requested a review from nbalacha January 10, 2022 06:30
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2022
- Remove volumes groups on the node when LVMCluster CR is deleted
- Remove LVM PVs which are part of volume group as well
- On every reconcile try to delete VGs sequentially and on failure
  update the lvmd config file and retry
- Do not touch VGs that are not created by VG Manager

Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>
Copy link
Contributor

@sp98 sp98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are always assuming that we would delete all the VGs when the user uninstalls lvm operator. Can there be scenarios where user wants to delete the operator, but still want to use Persistent Volumes created by operator. That is, delete the operator but don't delete any data.

// not able to start cleanup, no need to overwrite config file, just return error
return fmt.Errorf("failed to initiate cleanup of Volume Groups")
} else if done == len(lvmdConfig.DeviceClasses)-1 || len(lvmdConfig.DeviceClasses) == 0 {
// every volume group is deleted and config file can be deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When one of the vg deletion request fails, will done be still equal to len(lvmdConfig.DeviceClasses)-1 and we will still remove the config file ?

for _, dc := range lvmdConfig.DeviceClasses {
pending = append(pending, dc.Name)
}
r.Log.Info("device classes are pending for deletion", "deviceClasses", pending)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need another list pending over here or can be just use lvmdConfig.DeviceClasses again?

Suggested change
r.Log.Info("device classes are pending for deletion", "deviceClasses", pending)
r.Log.Info("device classes are pending for deletion", "deviceClasses", lvmdConfig.DeviceClasses)


// set node uid as finalizer on CR
if !controllerutil.ContainsFinalizer(lvmCluster, vgFinalizer) {
controllerutil.AddFinalizer(lvmCluster, vgFinalizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each vgManager daemonset pod is adding its own finalizer to the LVMCluster CR, then we might have issues in cleaning up of the finalizers. For example: if NodeSelector of the Vgmanager daemonset is updated, we might have stale finalizers in the LVMCluster CR finalizer.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 14, 2022

@leelavg: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 14, 2022
@leelavg
Copy link
Contributor Author

leelavg commented Jan 17, 2022

@leelavg leelavg closed this Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants