Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade capm3, bmo and ironic version to v0.4.2 #554

Closed
jgu17 opened this issue Jun 1, 2021 · 20 comments
Closed

Upgrade capm3, bmo and ironic version to v0.4.2 #554

jgu17 opened this issue Jun 1, 2021 · 20 comments
Assignees
Labels
1-Core Relates to airshipctl core components (i.e. go code) enhancement New feature or request priority/critical Items critical to be implemented, usually by the next release size l
Milestone

Comments

@jgu17
Copy link
Contributor

jgu17 commented Jun 1, 2021

Problem description (if applicable)
metal3 cluster api provider (capm3, baremetal oprator) has released v0.4 (latest v0.4.2 released on June 1st, 2021). There are a list of new features available: https://github.com/metal3-io/cluster-api-provider-metal3/releases.

v0.4.0
V1alpha4 API Support for Metadata and Network data in Cloud-init
Metadata and network data templating for Machine deployments and KCPs
Raw image streaming
BMO is now deployed as part of CAPM3
Support for IP Address Management as part of metadata templating
Pivoting support

v0.4.1
Adopt BMO kubebuilder changes (#137)
Add Support for Ironic Basic auth and TLS in deployment templates (#134)

v0.4.2
Add live-iso support to CAPI Metal3 provider (#189)
Add unit test and documentation for TemplateReference (#190)
Add nodeReuse feature (#169)
Uplift BMO in go.mod (#193)
Add golint and generate test in travis (#192)
Adds a new controller to synchronize labels between BMHs and K Nodes (#152)
Install kustomize also in Mac OS. (#188)
Update Essential tooling for CAPM3 (#181)
Update go.mod (#153)
Add manifest linting script (#149)
Add nodeDrainTimeout for example cluster template (#142)

Proposed change
Upgrade to latest release of capm3 and bmo

Potential impacts
V0.4.1 added TLS and Basic auth vars for capm3 v0.4.1 release that AS2 could potentially take advantage?

@jgu17 jgu17 added enhancement New feature or request triage Needs evaluation by project members labels Jun 1, 2021
@jgu17 jgu17 added this to the Future milestone Jun 1, 2021
@jgu17
Copy link
Contributor Author

jgu17 commented Jun 1, 2021

A few notes:

  • In bmo v0.4, the ironic deployment manifests become standalone. Following suite in AS2
  • after kubebuider changes in BMO, some of the images become distroless, and no longer have shell. entry point needs adjusted.
  • don't see the hardware profile properties in the bmo crd from upstream. Do we still to keep a local crd schema as it is in AS 2.0?

@eak13
Copy link

eak13 commented Jun 1, 2021

@jgu17, we have this issue #518 to investigate upgrading the components but wanted to wait until after v2.1 was out the door to understand what impact the upgrades might have in Airship before proceeding. Thinking we might be able to combine these 2 issues. If you're OK with it, I can just copy over the #518 description to this issue seeing you've provided more detail here. LMK if that's ok.

@jgu17
Copy link
Contributor Author

jgu17 commented Jun 1, 2021

@jgu17
Copy link
Contributor Author

jgu17 commented Jun 1, 2021

@ak3216 Sure, makes sense to combine. It doesn't matter to me which one you want to keep.

@eak13
Copy link

eak13 commented Jun 1, 2021

Before upgrading we need to understand the following (pulled over from #518).

  • Do we want to expand this to include all of CAPI?
  • Are there breaking changes that require coding/configuration changes for Metal3 and CAPI components?
  • Are there bug fixes in the upgrades of either CAPI or Metal3 that fix issues encountered when deploying? If so, are there workarounds in place today that would need to be removed or deprecated? May need input from @sb464f and others who have been running deployments.
  • Are there any new features in the upgrades of either CAPI or Metal3 that address missing capabilities which have hindered the deployments? In utilizing these new features, what workarounds are in place today that would need to be removed or deprecated? May need input from @sb464f and others who have been running deployments.
  • Can we upgrade the components separately, or do they need to be upgraded together?

Sources

Cluster API: https://github.com/kubernetes-sigs/cluster-api/releases
Metal3: https://github.com/metal3-io/cluster-api-provider-metal3
The output of this issue should answer the above questions & enable discussion during a Design Call to discuss the path forward. From there new issues will be created to address upgrading the components.

@eak13
Copy link

eak13 commented Jun 1, 2021

Notes from @Arvinderpal from #518
Here are my notes from the PTG on upgrading. Please let me know if further info is needed.

Upgrade to v0.4 (aka v1alpha4)

Upgrading CAPI and Provider Components (e.g. v0.3.16 --> v0.4.0)

The clusterctl upgrade command can be used to upgrade the version of the Cluster API providers (CRDs, controllers) installed into a management cluster. See CAPI upgrade docs.

Upgrading API object (e.g. v1alpha3 --> v1alpha4)

clusterctl does not upgrade Cluster API objects (Clusters, MachineDeployments, Machine etc.); upgrading such objects are the responsibility of the provider’s controllers.
Controllers like CAPM3 have conversion functions built in, so conversion should be seamless.

Clusterctl Library

airshipctl consumes clusterctl as a library. As of April, 16th, we are importing v0.3.13.

  • Unless we plan to adopt the CAPI Provider Operator (Proposal Doc), airshipctl code changes should be minimal with respect to clusterctl.

K Version Upgrades

  • CAPI / KCP v1alpha4 won't be able to manage Kubernetes clusters < v1.18
  • In general, supported versions will be limited. See thread here.
  • Requirement comes primarily through dependence on kubeadm. Currently kubeadm bootstrapper imports kubeadm API types.
  • CAPI/kubeadm do not handle users API objects (e.g. Deployments, ConfigMaps, etc). For example, if an API is depreciated, the user must either use kubectl or helm to update their manifests.

@jgu17
Copy link
Contributor Author

jgu17 commented Jun 1, 2021

@digambar's comments: "CAPM3 has hard dependency on BMO and ip-address-manager so you end up getting dependency mismatch error if you don't upgrade both the controller. I recommend to upgrade both CAPM3 and ip-address-manager along with BMO."

It starts to look like that we may need to have a single patchset to upgrade capm3, bmo, ironic and ipam because of the hard dependency among them.

@jgu17 jgu17 changed the title Upgrade capm4, bmo and ironic version to v0.4 Upgrade capm3, bmo and ironic version to v0.4 Jun 1, 2021
@jezogwza jezogwza modified the milestones: Future, v2.2 Jun 2, 2021
@jezogwza jezogwza added 1-Core Relates to airshipctl core components (i.e. go code) and removed triage Needs evaluation by project members labels Jun 2, 2021
@SirishaGopigiri
Copy link
Contributor

please assign it to me

@eak13
Copy link

eak13 commented Jun 24, 2021

@SirishaGopigiri done!

@jgu17
Copy link
Contributor Author

jgu17 commented Jun 24, 2021

please assign it to me

@SirishaGopigiri I already started the work on it: https://review.opendev.org/c/airship/airshipctl/+/793254. Had been on a temporary hold because I needed to have a virtual dev env to continue testing/development and also it is planned only for 2.2 release. I just rebased the PS yesterday and was going to resume the work now that I got a lab env. Do you want to work on it together since it is actually quite a big piece or want to take over entirely?

@eak13 eak13 assigned jgu17 and unassigned SirishaGopigiri Jun 24, 2021
@michaelfix
Copy link

@jgu17 , please include either the Relates-To or Closes tags in your PS(s) as they're related to issues. This tag helps us track and correlate the issues-to-PSs. And, if the Closes tag is used, a bot will automatically close the respective GitHub issue. https://github.com/airshipit/airshipctl/blob/master/CONTRIBUTING.md#submitting-changes. Thanks.

@jezogwza jezogwza added the priority/critical Items critical to be implemented, usually by the next release label Jul 7, 2021
@jezogwza jezogwza modified the milestones: v2.2, v2.1 Jul 7, 2021
@eak13
Copy link

eak13 commented Jul 9, 2021

We need this uplift as a stepping stone before #518 so we're not skipping versions from 0.3.x to 0.5.x. Additionally, upgrading should address #558 & #559, though #558 has a dependency on metal3-io/ironic-image#266

@eak13 eak13 changed the title Upgrade capm3, bmo and ironic version to v0.4 Upgrade capm3, bmo and ironic version to v0.4.2 Jul 9, 2021
@jgu17
Copy link
Contributor Author

jgu17 commented Jul 14, 2021

upgrade bmo and ironic is working to the extent that capm3 is throwing an error about node01 not found when updating the baremetal. I suspect it could a version mismtach issue. so I am going to update capm3 version to the same subminor version as the bmo and ironic. @SirishaGopigiri I want to double check with you if you already have an y work in progress in capm3 version upgrade?

@SirishaGopigiri
Copy link
Contributor

@jgu17 the intention is to use capm3 0.5.0 version, but I think it is still not out. So I haven't started anything yet.

@Arvinderpal
Copy link
Contributor

@jgu17 Can you paste the exact error in capm3? Make sure that capm3 and bmo are installed in the same namespace - for exampl: capm3-system

@jgu17
Copy link
Contributor Author

jgu17 commented Jul 14, 2021

@Arvinderpal here the error message from the capm3 pod:
I0714 06:16:01.350175 1 metal3cluster_controller.go:108] Metal3Cluster-controller "msg"="Reconciling metal3Cluster" "cluster"="tget-cluster" "metal3-cluster"={"Namespace":"target-infra","Name":"target-cluster"}
I0714 06:16:01.350717 1 metal3machine_manager.go:721] controllers/Metal3Machine/Metal3Machine-controller "msg"="Annotated host n found" "cluster"="target-cluster" "machine"="cluster-controlplane-pk64z" "metal3-cluster"="target-cluster" "metal3-machine"={"Namespa":"target-infra","Name":"cluster-controlplane-kpslt"} "host"="target-infra/node01"
I0714 06:16:01.350907 1 metal3machine_manager.go:607] controllers/Metal3Machine/Metal3Machine-controller "msg"="Updating machine"cluster"="target-cluster" "machine"="cluster-controlplane-pk64z" "metal3-cluster"="target-cluster" "metal3-machine"={"Namespace":"tart-infra","Name":"cluster-controlplane-kpslt"}
I0714 06:16:01.351102 1 metal3machine_manager.go:721] controllers/Metal3Machine/Metal3Machine-controller "msg"="Annotated host n found" "cluster"="target-cluster" "machine"="cluster-controlplane-pk64z" "metal3-cluster"="target-cluster" "metal3-machine"={"Namespa":"target-infra","Name":"cluster-controlplane-kpslt"} "host"="target-infra/node01"
E0714 06:16:01.352444 1 controller.go:237] controller "msg"="Reconciler error" "error"="failed to update BaremetalHost: host notound for machine cluster-controlplane-pk64z" "controller"="metal3machine" "name"="cluster-controlplane-kpslt" "namespace"="target-infr "reconcilerGroup"="infrastructure.cluster.x-k8s.io" "reconcilerKind"="Metal3Machine"
I0714 06:16:01.372335 1 metal3cluster_controller.go:108] Metal3Cluster-controller "msg"="Reconciling metal3Cluster" "cluster"="tget-cluster" "metal3-cluster"={"Namespace":"target-infra","Name":"target-cluster"}
I0714 06:16:02.353483 1 metal3machine_manager.go:721] controllers/Metal3Machine/Metal3Machine-controller "msg"="Annotated host n found" "cluster"="target-cluster" "machine"="cluster-controlplane-pk64z" "metal3-cluster"="target-cluster" "metal3-machine"={"Namespa":"target-infra","Name":"cluster-controlplane-kpslt"} "host"="target-infra/node01"
I0714 06:16:02.353549 1 metal3machine_manager.go:607] controllers/Metal3Machine/Metal3Machine-controller "msg"="Updating machine"cluster"="target-cluster" "machine"="cluster-controlplane-pk64z" "metal3-cluster"="target-cluster" "metal3-machine"={"Namespace":"tart-infra","Name":"cluster-controlplane-kpslt"}
I0714 06:16:02.353605 1 metal3machine_manager.go:721] controllers/Metal3Machine/Metal3Machine-controller "msg"="Annotated host n found" "cluster"="target-cluster" "machine"="cluster-controlplane-pk64z" "metal3-cluster"="target-cluster" "metal3-machine"={"Namespa":"target-infra","Name":"cluster-controlplane-kpslt"} "host"="target-infra/node01"

@Arvinderpal
Copy link
Contributor

What is {"Namespace":"tart-infra", ? Is that a typo?

All the resources should be in the same namespace. For example, M3M and BMH should be in the same namespace (e.g. metal3). Can you kubectl get on those two and see if they are in the same namespace?

@jgu17
Copy link
Contributor Author

jgu17 commented Jul 14, 2021

tart-infra must have been a copy and paste error when I pasted the logs here. The namespaces for the capm3 and bmo/ironic are actually different even before this upgrade effort. here is the kubectl output for the capm3 pods and bom/ironic pods (same namespace names before and after the bmo version upgrade):
capm3-system capm3-controller-manager-6d8c546c5-86lhd 2/2 Running 0 4m34s
capm3-system capm3-ipam-controller-manager-6465c8fbf5-zckqm 2/2 Running 0 4m34s
metal3 ironic-dc446964-2ppsr 4/4 Running 0 8m
metal3 metal3-baremetal-operator-549c5598f-sxzvl 3/3 Running 0 8m

This worked before the upgrade and also works for the first controller node baremetal provisioning after the version upgrade. But after the version upgrade, it failed when trying to repurpose the ephemeral node.

@Arvinderpal
Copy link
Contributor

Ok. The second part of my question was where do the M3M and BMH resources reside? The error you pasted says that the capm3 could not find the BMH in target-infra/node01. Do you see node01 in target-infra?
Also it might just be easier to jump on a call to debug this further.

@lb4368
Copy link

lb4368 commented Sep 7, 2021

Closing this as upgrade is being handled via #518 and #558.

@lb4368 lb4368 closed this as completed Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-Core Relates to airshipctl core components (i.e. go code) enhancement New feature or request priority/critical Items critical to be implemented, usually by the next release size l
Projects
None yet
Development

No branches or pull requests

7 participants