Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ clusterctl: migrate CRDs during clusterctl upgrade #6749

Merged
merged 1 commit into from
Jun 30, 2022

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Jun 27, 2022

Credits to @fabriziopandini for the initial implementation

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #6674
Fixes #6691

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 27, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 27, 2022
@sbueringer sbueringer force-pushed the pr-crd-migration branch 2 times, most recently from b81de81 to 347a719 Compare June 27, 2022 13:29
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 27, 2022
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member Author

/cherry-pick release-1.1

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.1 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/cherry-pick release-1.2

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer sbueringer changed the title [WIP] ✨ clusterctl: migrate CRDs during clusterctl upgrade ✨ clusterctl: migrate CRDs during clusterctl upgrade Jun 27, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 27, 2022
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member Author

PR should be ready for review. I wrote something about the limitations here: #6674 (comment)

@sbueringer
Copy link
Member Author

@fabriziopandini
Copy link
Member

@furkatgofurov7 @schrej it will be great if you can test if the proposed solution fixes the issue

@sbueringer
Copy link
Member Author

Flake
/test pull-cluster-api-e2e-full-main

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks very clean and good to me.

Just some small nits / questions from my side 👍

cmd/clusterctl/main.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/cluster/crd_migration.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/cluster/crd_migration.go Outdated Show resolved Hide resolved
cmd/clusterctl/client/cluster/crd_migration.go Outdated Show resolved Hide resolved
@MaxRink
Copy link
Contributor

MaxRink commented Jun 28, 2022

@fabriziopandini @sbueringer we just tried that upgrading with this PR and it failed upgrading metal³
logs: https://gist.github.com/MaxRink/fa3b27b4193a1a4bb484e47197713b88
On a first glance it looks to me linke the order of operation is wrong, the migration should run before any of the providers get scaled down so that conversion webhooks are still available

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 29, 2022
@sbueringer
Copy link
Member Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 29, 2022

@sbueringer: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-full-main 3a30fbc link false /test pull-cluster-api-e2e-full-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member Author

/retest

@sbueringer
Copy link
Member Author

@MaxRink @schrej @furkatgofurov7 Latest version should work now. I verified it by upgrading CAPI: v0.3.14=>v0.4.8=>main (and dropped v1alpha3 from main).

Would be great if you can verify.

@fabriziopandini
Copy link
Member

/lgtm
/approve

/hold
for @MaxRink @schrej @furkatgofurov7 to make another round of tests

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2022
@furkatgofurov7
Copy link
Member

furkatgofurov7 commented Jun 29, 2022

@MaxRink @schrej @furkatgofurov7 Latest version should work now. I verified it by upgrading CAPI: v0.3.14=>v0.4.8=>main (and dropped v1alpha3 from main).

Would be great if you can verify.

/lgtm /approve

/hold for @MaxRink @schrej @furkatgofurov7 to make another round of tests

@sbueringer @fabriziopandini awesome work, thanks! we have tested it on Metal3 provider and it works pretty well 🎉

Here the results I got for reference:

Before this patch: Using the clusterctl binary built from the main branch and upgrading from v1a3==>v1a4=>v1beta1 (provider versions v1a4==>v1a5=>v1beta1):

./clusterctl upgrade apply --contract v1beta1
Checking cert-manager version...
Deleting cert-manager Version="v1.5.3"
Installing cert-manager Version="v1.8.2"
I0629 12:16:03.291677 3755760 request.go:601] Waited for 1.032853447s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:38293/apis/bootstrap.cluster.x-k8s.io/v1alpha4?timeout=30s
Waiting for cert-manager to be available...
Performing upgrade...
Scaling down Provider="infrastructure-metal3" Version="v0.5.10" Namespace="capm3-system"
Deleting Provider="infrastructure-metal3" Version="v0.5.10" Namespace="capm3-system"
Installing Provider="infrastructure-metal3" Version="v1.1.2" TargetNamespace="capm3-system"
Error: action failed after 10 attempts: failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io "metal3clusters.infrastructure.cluster.x-k8s.io" is invalid: status.storedVersions[0]: Invalid value: "v1alpha4": must appear in spec.versions

After this patch: next, with the clusterctl binary built using the changes on this branch:

./clusterctl upgrade apply --contract v1beta1
Checking cert-manager version...
Cert-manager is already up to date
Performing upgrade...
CR migration required kind="Metal3Cluster" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3Cluster"
CR migration required kind="Metal3DataClaim" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3DataClaim"
CR migration required kind="Metal3Data" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3Data"
CR migration required kind="Metal3DataTemplate" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3DataTemplate"
CR migration required kind="Metal3Machine" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3Machine"
CR migration required kind="Metal3MachineTemplate" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3MachineTemplate"
CR migration required kind="Metal3Remediation" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3Remediation"
CR migration required kind="Metal3RemediationTemplate" storedVersionsToDelete="v1alpha4" storedVersionsToPreserve="v1alpha5"
Migrating CRs, this operation may take a while... kind="Metal3RemediationTemplate"
Scaling down Provider="infrastructure-metal3" Version="v0.5.10" Namespace="capm3-system"
Deleting Provider="infrastructure-metal3" Version="v0.5.10" Namespace="capm3-system"
Installing Provider="infrastructure-metal3" Version="v1.1.2" TargetNamespace="capm3-system"

./clusterctl upgrade plan
Checking cert-manager version...
Cert-Manager is already up to date

Checking new release availability...

Latest release available for the v1beta1 API Version of Cluster API (contract):

NAME                    NAMESPACE                           TYPE                     CURRENT VERSION   NEXT VERSION
bootstrap-kubeadm       capi-kubeadm-bootstrap-system       BootstrapProvider        v1.1.4            Already up to date
control-plane-kubeadm   capi-kubeadm-control-plane-system   ControlPlaneProvider     v1.1.4            Already up to date
cluster-api             capi-system                         CoreProvider             v1.1.4            Already up to date
infrastructure-metal3   capm3-system                        InfrastructureProvider   v1.1.2            Already up to date

You are already up to date!

@chrischdi
Copy link
Member

Just a single question.

/lgtm

Copy link
Contributor

@ykakarap ykakarap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sbueringer
Copy link
Member Author

/hold cancel

Give the positive feedback + it would be nice to have it running in CI a bit before the v1.1.5 release

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2022
@k8s-ci-robot k8s-ci-robot merged commit c8053fb into kubernetes-sigs:main Jun 30, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Jun 30, 2022
@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #6792

In response to this:

/cherry-pick release-1.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #6793

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer sbueringer deleted the pr-crd-migration branch June 30, 2022 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle status.storedVersions migration during upgrades Clusterctl upgrade fails due to cert-manager CRDs
8 participants