Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataVolume Controller uses VolumeCloneSource Populator #2750

Merged
merged 19 commits into from
Jun 29, 2023

Conversation

mhenriks
Copy link
Member

@mhenriks mhenriks commented Jun 13, 2023

What this PR does / why we need it:

The DataVolume controller should use our populators internally for CSI storage provisioners. This PR builds off of #2722 to include VolumeCloneSource populators.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

I think everything is here but I bet there will be more whackamole on func tests

Release note:

DataVolume Controller uses VolumeCloneSource Populator

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jun 13, 2023
@mhenriks
Copy link
Member Author

/hold

@kubevirt-bot kubevirt-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL labels Jun 13, 2023
@@ -468,20 +468,32 @@ func (r *PvcCloneReconciler) isSourceReadyToClone(datavolume *cdiv1.DataVolume)

// detectCloneSize obtains and assigns the original PVC's size when cloning using an empty storage value
func (r *PvcCloneReconciler) detectCloneSize(syncState *dvSyncState) (bool, error) {
var targetSize int64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

special note on all the changes done in this function...was all done to get this crazy test to work:

It("bz:2079781 Should clone data from filesystem to block, when using storage API ", func() {

@mhenriks
Copy link
Member Author

/test pull-containerized-data-importer-e2e-ceph

Copy link
Collaborator

@alromeros alromeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good start! Just some comments after an initial review.

@@ -84,7 +85,7 @@ const (
// MessageSmartCloneInProgress provides a const to form snapshot for smart-clone is in progress message
MessageSmartCloneInProgress = "Creating snapshot for smart-clone is in progress (for pvc %s/%s)"
// MessageCloneFromSnapshotSourceInProgress provides a const to form clone from snapshot source is in progress message
MessageCloneFromSnapshotSourceInProgress = "Creating PVC from snapshot source is in progress (for snapshot %s/%s)"
MessageCloneFromSnapshotSourceInProgress = "Creating PVC from snapshot source is in progress (for %s %s/%s)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are now requiring the source type as argument, maybe change the message so it can be used for both snapshots and pvcs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that error string can be improved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I'm not sure what you're asking...the message is used whenever we create a pvc from a snapshot. That happens with pvc-smart clone and snapshot-clone. Are you suggesting the message be use when a pvc is snapshotted as well? That only happens with pvc-smart clone. Examples would be great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the variable should change maybe drop the "snapshot"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...I think the source part may be the most confusing since someone may be thinking of the DataVolume source and not the pvc dataSource.

}

// MergePatch patches a resource
func MergePatch(ctx context.Context, args *PatchArgs) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -178,6 +187,67 @@ func (r *CloneReconcilerBase) ensureExtendedToken(pvc *corev1.PersistentVolumeCl
return nil
}

func (r *CloneReconcilerBase) reconcileVolumeCloneSourceCR(syncState *dvSyncState, kind string) error {
dv := syncState.dvMutated
cloneSource := &cdiv1.VolumeCloneSource{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd improve readability to use volumeCloneSource and volumeCloneSourceName to differentiate from the actual clone source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

sourcePvc, err := r.findSourcePvc(syncState.dvMutated)
if err != nil {
return false, err
}

// because of filesystem overhead calculations when cloning
// even if storage size is requested we have to calucuale source size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: calucuale

// TODO: Fix this in next PR that uses actual size also in validation
isPermissiveClone = sourceCapacity.CmpInt64(targetSize) == 1
} else {
isPermissiveClone = requestedSize.CmpInt64(targetSize) >= 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this has to do with the test you commented above, right? But why would we need to set the permissiveClone annotation if the comparison is equal or larger than 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestedSize may still be smaller than source capacity (as in that contrived test case) but populator layer doesn't know that and will understandably not let you clone your 4G source to a 2G target

@@ -203,10 +203,15 @@ func (r *UploadReconciler) reconcilePVC(log logr.Logger, pvc *corev1.PersistentV
}

if len(podsUsingPVC) > 0 {
es, err := cc.GetAnnotatedEventSource(context.TODO(), r.client, pvc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bother this much about styling but I feel like these variable names work for more common objects like dvs or pvcs. For most cases I think it's better to use the full version, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are suggesting here...you don't like 'es'?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've seen some of these in the PR and I always think it'd be better to have the full name, like cloneType instead of ct. Just a small complaint though, not a must.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily disagree with you. I typically use descriptive names (I think). But I also like to use short names to signify "this is something you shouldn't really care about". Similar to 'i' in loops. Variables that are very short in scope.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2023
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2023
@mhenriks
Copy link
Member Author

/retest-required

Copy link
Contributor

@ShellyKa13 ShellyKa13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! have some comments, as usual going over commit by commit some were already addressed..

@@ -20,7 +20,6 @@ import (
"context"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do love the bye bye commit message, but maybe share why is it being removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure you know why, but since populators are handling all CSI storage the DataVolume controller will only be doing host assisted clones

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant share in the commit message

@@ -582,3 +590,77 @@ func (r *SnapshotCloneReconciler) isSnapshotValidForClone(snapshot *snapshotv1.V
}
return true, nil
}

func newPvcFromSnapshot(obj metav1.Object, name string, snapshot *snapshotv1.VolumeSnapshot, targetPvcSpec *corev1.PersistentVolumeClaimSpec) (*corev1.PersistentVolumeClaim, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function relevant to this this commit? I dont see it being used anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was in smart clone controller file which was deleted but this function and a couple other defs are needed

const snapshotCloneControllerName = "datavolume-snapshot-clone-controller"
const (
//AnnSmartCloneRequest sets our expected annotation for a CloneRequest
AnnSmartCloneRequest = "k8s.io/SmartCloneRequest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if newPvcFromSnapshot is not relevant to this commit (as asked in another comment) then I guess this const are neither?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remnants from a deleted file

// GetCloneSourceNameAndNamespace returns the name and namespace of the cloning source
func GetCloneSourceNameAndNamespace(dv *cdiv1.DataVolume) (name, namespace string) {
var sourceName, sourceNamespace string
// GetCloneSourceInfo returns the name and namespace of the cloning source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also update the comment with returns sourceType

@@ -84,7 +85,7 @@ const (
// MessageSmartCloneInProgress provides a const to form snapshot for smart-clone is in progress message
MessageSmartCloneInProgress = "Creating snapshot for smart-clone is in progress (for pvc %s/%s)"
// MessageCloneFromSnapshotSourceInProgress provides a const to form clone from snapshot source is in progress message
MessageCloneFromSnapshotSourceInProgress = "Creating PVC from snapshot source is in progress (for snapshot %s/%s)"
MessageCloneFromSnapshotSourceInProgress = "Creating PVC from snapshot source is in progress (for %s %s/%s)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the variable should change maybe drop the "snapshot"

cc.AddAnnotation(claimCpy, AnnCloneError, lastError.Error())

if !apiequality.Semantic.DeepEqual(pvc, claimCpy) {
if err := r.client.Update(ctx, claimCpy); err != nil {
if err := r.patchClaim(ctx, log, pvc, claimCpy); err != nil {
r.log.V(1).Info("error setting error annotations")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use here log instead of r.log?

},
Spec: cdiv1.VolumeCloneSourceSpec{
Source: corev1.TypedLocalObjectReference{
Kind: "PersistentVolumeClaim",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you wanted to put Kind: kind

return nil
}

return r.reconcileVolumeCloneSourceCR(syncState, "PersistentVolumeClaim")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling to this function even when not using populators will work but not sure if we should..

var desiredCloneAnnotations []string

func init() {
desiredCloneAnnotations = append(desiredCloneAnnotations, desiredAnnotations...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just add the AnnCloneOf to the desiredannotations list.. it wont hurt anything, maybe the name is not the best and should change maybe you have better name

}

if dv.DeletionTimestamp != nil {
if err := r.reconcileVolumeCloneSourceCR(syncState); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we dont want to call this also when dv succeeded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get called here:

if err := r.reconcileVolumeCloneSourceCR(&syncRes); err != nil {

Copy link
Contributor

@ShellyKa13 ShellyKa13 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is an unexpected error (which is retried), it should get called. But I suppose the behavior of the base controller could change and mess things up as well. This is such spaghetti code. IMO controllers should always execute the same code block at each reconcile. I don't like that the derived controller sync is not called if the DataVolume is deleted.

Anyway, my intention was to reduce special cases and have the cleanup function do less. Because I don't think it should exist in the first place.

@mhenriks mhenriks changed the title [WIP] DataVolume Controller uses VolumeCloneSource Populator DataVolume Controller uses VolumeCloneSource Populator Jun 22, 2023
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 22, 2023
@mhenriks
Copy link
Member Author

@arnongilboa - please check general integration with DataVolume controllers. A lot of code is gone!
@akalenyu - please check out clone from snapshot path

@akalenyu
Copy link
Collaborator

akalenyu commented Jun 22, 2023

Not sure what that consistent after suite failure is about, seems some DV has a finalizer remaining

Some content in the namespace has finalizers remaining: cdi.kubevirt.io/dataVolumeFinalizer in 1 resource instances}]}}]

We might want to dump out some info about dangling resources after func test runs

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akalenyu - please check out clone from snapshot path

Looks good, we should also have most permutations covered by snapclone func tests
so that makes me feel better

My only concern was that we are losing coverage in that host assisted path

return syncRes, err
}
targetHostAssistedPvc, err := r.createPvcForDatavolume(datavolume, pvcSpec, r.updateAnnotations)
targetPvc, err := r.createPvcForDatavolume(datavolume, pvcSpec, pvcModifier)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and debugged the sizeless clone failure from volumesnapshot clones:
Since we now create the target PVC through createPvcForDatavolume, nothing takes
care of filling out the missing size

spec.resources[storage]: Invalid value: \"0\": must be greater than zero

I think we can simply have something similar to this func

syncState.pvcSpec.Resources.Requests[corev1.ResourceStorage] = targetCapacity

(but way simpler, just grabs the size over from the snapshot)

@@ -772,11 +723,6 @@ var _ = Describe("all clone tests", func() {
targetDiskImagePath = testBaseDir
}

if cloneType == "snapshot" && sourceRef {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

}

f.ForceBindIfWaitForFirstConsumer(targetPvc)

cloneType := utils.GetCloneType(f.CdiClient, dataVolume)
if cloneType != "copy" {
Skip("only valid for copy clone")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't this skip enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With populators don't know clone type until populator starts working (first consumer triggered)

Expect(pvc.Annotations[controller.AnnCloneRequest]).To(HaveSuffix(suffix))
Expect(pvc.Spec.DataSource).To(BeNil())
Expect(pvc.Spec.DataSourceRef).To(BeNil())
if pvc.Spec.DataSourceRef == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should go for a more correct way of ensuring dumbclone took place?
IIUC when this runs on a CSI lane the check is not exercised

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for checking non-csi case. With populators, the pvc that was used for dumb clone is deleted after rebinding to target. But can check clonetype annotation on the dv

@akalenyu
Copy link
Collaborator

/test pull-containerized-data-importer-e2e-destructive
/test pull-cdi-linter

destructive lane fails because of #2744
apidocs fails because of [kubevirt-dev] k/k: pull-kubevirt-api failing due to missing gradle dependency

@mhenriks
Copy link
Member Author

/retest-required

@akalenyu
Copy link
Collaborator

akalenyu commented Jun 25, 2023

One failure seems to be exclusive to this PR:
[test_id:1360]: DV conditions have Ready=False but "phase": "Succeeded":
https://search.ci.kubevirt.io/?search=1360&maxAge=336h&context=1&type=bug%2Bissue%2Bjunit&name=containerized-data-importer&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Another one isn't:
[test_id:8569]: the DV Update call fails (confusing from gomega output but this is what happens):
https://search.ci.kubevirt.io/?search=8569&maxAge=336h&context=1&type=bug%2Bissue%2Bjunit&name=containerized-data-importer&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Expect(f.CdiClient.CdiV1beta1().DataVolumes(dvNamespace).Update(context.TODO(), dv, metav1.UpdateOptions{})).Error().ToNot(HaveOccurred())

@akalenyu
Copy link
Collaborator

/test pull-containerized-data-importer-e2e-nfs
/test pull-containerized-data-importer-e2e-destructive

https://search.ci.kubevirt.io/?search=+preallocate+data+on+target+PVC&maxAge=336h&context=1&type=bug%2Bissue%2Bjunit&name=containerized-data-importer&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
This seems to fail exclusively on this PR - suggests preallocation not applied sometimes?

Let me know how you want to proceed... for the other flakes could use a review at
#2769
#2744

@mhenriks
Copy link
Member Author

/test pull-containerized-data-importer-e2e-nfs /test pull-containerized-data-importer-e2e-destructive

https://search.ci.kubevirt.io/?search=+preallocate+data+on+target+PVC&maxAge=336h&context=1&type=bug%2Bissue%2Bjunit&name=containerized-data-importer&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job This seems to fail exclusively on this PR - suggests preallocation not applied sometimes?

Let me know how you want to proceed... for the other flakes could use a review at #2769 #2744

Thanks @akalenyu I think I have a lead on the failure in this pr. I'll take a look at the fixes you reference

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
also add more validation to some snapshot clone tests

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Using the controller runtime Patch API with controller runtime cached client seems to be a pretty bad fit

At least given the way the CR API is designed where an old object is compared to new.

I like patch in theory though and will revisit

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
It was possible to miss "preallocation applied" annotation otherwise

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
@akalenyu
Copy link
Collaborator

/test pull-containerized-data-importer-e2e-nfs

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Should have been done back when annotations were addded to "progress"

Also, if pvc is bound do not call phase Reconcile functions only Status

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
@mhenriks
Copy link
Member Author

/unhold

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 27, 2023
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Jun 28, 2023

@mhenriks: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdi-apidocs 99bf0dd link false /test pull-cdi-apidocs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mhenriks
Copy link
Member Author

/retest-required

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 28, 2023
@akalenyu
Copy link
Collaborator

whoops
/lgtm cancel
/approve

@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 28, 2023
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akalenyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2023
Copy link
Contributor

@ShellyKa13 ShellyKa13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2023
@kubevirt-bot kubevirt-bot merged commit 4ce9272 into kubevirt:main Jun 29, 2023
@mhenriks
Copy link
Member Author

/cherrypick release-v1.57

@kubevirt-bot
Copy link
Contributor

@mhenriks: new pull request created: #2783

In response to this:

/cherrypick release-v1.57

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants