-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cleanup volume groups on CR deletion #53
Conversation
// get limited number of logical volume CRs to verify their presence | ||
lvs := &topolvmv1.LogicalVolumeList{} | ||
if err := r.Client.List(ctx, lvs, &crc.ListOptions{Limit: int64(1)}); err == nil { | ||
// do not proceed with cleanup if there are logicalVolumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we suppose to check if the LV CR is mapped with the VG somehow? Right now we are aborting VG deletion if at least one LV CR is found. This won't be very useful in case of the cleaning up individual VGs (for example user removes a VG from the spec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- that can be tweaked but I don't want to do that in this PR
- the flow will be more or less same however in controller itself there'll be some extra checks as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the question would be even if the user is deleting the entire cluster, do we want to skip all VG deletion even if one LV CR is available? or we delete whatever VGs can be deleted in the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, until we have implementation of device selector and specific vg deletion, I'd like to abort any deletion.
When we implement the option 2, we can spend time on extra processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have a vgCleanupAll function which calls a vgCleanup (vgname) for each vg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current function does the same, it'll remove volume group one by one based on vgName in lvmd config file
} | ||
|
||
// set node uid as finalizer on CR | ||
if !controllerutil.ContainsFinalizer(lvmCluster, vgFinalizer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use NodeName
itself as the finalizer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMHO, UID is the standard way for these type of tasks.
- We can customize the finalizer to have prefix
node-uid/<uid>
for easy ref
// get limited number of logical volume CRs to verify their presence | ||
lvs := &topolvmv1.LogicalVolumeList{} | ||
if err := r.Client.List(ctx, lvs, &crc.ListOptions{Limit: int64(1)}); err == nil { | ||
// do not proceed with cleanup if there are logicalVolumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have a vgCleanupAll function which calls a vgCleanup (vgname) for each vg.
} | ||
|
||
// get all the volume groups from lvmd file | ||
lvmdConfig := &lvmCMD.Config{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a global var in the package or an argument to the function. Or an arg in the reconciler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should this be the case? my rationale is, if we fail to delete any VG then we need to read remaining content from config file in next reconcile and then this struct has to be re-declared again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct should be the same one created by the VGmanager when creating the file. Don't regenerate it each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually scratch the last comment. What we should be doing is saving the socket and filename that was used when the file was created and using those in the struct.
func RemoveVolumeGroup(name string) error { | ||
|
||
// find volume group which contains name and list of PVs belonging to VG | ||
vg, err := FindVolumeGroup(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this pr but don't export this function if it is not being used outside the pkg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- there are many functions like this, will take this in a separate PR
pkg/vgmanager/lvm_test.go
Outdated
var fileName = "vg-pool-" | ||
var vgName = "test-vg" | ||
|
||
func setup(name string) string { //nolint:golint,unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the test creating a new VG and deleting that?
If yes, then do we really need to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Currently this test will not work in CI as
nsenter
doesn't work and being skipped, need to refactor that to be testable - The intention is to create a volume group manually (eventually do that via vgmanager functions) and remove that via function being used in this PR, incase if deletion fails teardown will remove it quietly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ok to me, just take a look to my comment about the error log when findvolume fails.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jmolmo, leelavg The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
- Remove volumes groups on the node when LVMCluster CR is deleted - Remove LVM PVs which are part of volume group as well - On every reconcile try to delete VGs sequentially and on failure update the lvmd config file and retry - Do not touch VGs that are not created by VG Manager Signed-off-by: Leela Venkaiah G <lgangava@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are always assuming that we would delete all the VGs when the user uninstalls lvm operator. Can there be scenarios where user wants to delete the operator, but still want to use Persistent Volumes created by operator. That is, delete the operator but don't delete any data.
// not able to start cleanup, no need to overwrite config file, just return error | ||
return fmt.Errorf("failed to initiate cleanup of Volume Groups") | ||
} else if done == len(lvmdConfig.DeviceClasses)-1 || len(lvmdConfig.DeviceClasses) == 0 { | ||
// every volume group is deleted and config file can be deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When one of the vg deletion request fails, will done
be still equal to len(lvmdConfig.DeviceClasses)-1
and we will still remove the config file ?
for _, dc := range lvmdConfig.DeviceClasses { | ||
pending = append(pending, dc.Name) | ||
} | ||
r.Log.Info("device classes are pending for deletion", "deviceClasses", pending) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need another list pending
over here or can be just use lvmdConfig.DeviceClasses
again?
r.Log.Info("device classes are pending for deletion", "deviceClasses", pending) | |
r.Log.Info("device classes are pending for deletion", "deviceClasses", lvmdConfig.DeviceClasses) |
|
||
// set node uid as finalizer on CR | ||
if !controllerutil.ContainsFinalizer(lvmCluster, vgFinalizer) { | ||
controllerutil.AddFinalizer(lvmCluster, vgFinalizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If each vgManager daemonset pod is adding its own finalizer to the LVMCluster CR, then we might have issues in cleaning up of the finalizers. For example: if NodeSelector of the Vgmanager daemonset is updated, we might have stale finalizers in the LVMCluster CR finalizer.
@leelavg: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Signed-off-by: Leela Venkaiah G lgangava@redhat.com