Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Rancher to v2.7.9 and RKE2 to v1.26.11+rke2r1 #605

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

bk201
Copy link
Member

@bk201 bk201 commented Nov 28, 2023

Problem:

We need to bump Rancher to v2.7.9 and RKE2 to v1.26.11+rke2r1.

Solution:
Bump Rancher to v2.7.9 and RKE2 to v1.26.11+rke2r1.

Related Issue:
harvester/harvester#4782

Test plan:

@bk201
Copy link
Member Author

bk201 commented Nov 28, 2023

New nodes can't join and Rancher keep complaing:

2023/11/17 08:55:44 [INFO] [planner] rkecluster fleet-local/local: non-ready bootstrap machine(s) custom-3eecdaaeac24 and join url to be available on bootstrap node

@FrankYang0529
Copy link
Member

There is a new deployment cattle-provisioning-capi-system/capi-controller-manager. It keeps showing error like:

controller.go:329] "Reconciler error" err="failed to create cluster accessor: error creating client for self-hosted cluster: error creating dynamic rest mapper for remote cluster \"fleet-local/local\": the server has asked for the client to provide credentials" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-local/custom-df4677630f12" namespace="fleet-local" name="custom-df4677630f12" reconcileID=23ddc77d-3015-4699-970a-4b359211d2e4

The error is from https://github.com/kubernetes-sigs/cluster-api/blob/1c3a1526f101d4b07d2eec757fe75e8701cf6212/controllers/remote/cluster_cache.go#L164-L168.

If we check the rest-config, it's from secret fleet-local/local-kubeconfig. The content is like:

data:
  apiServerCA: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJ2VENDQVdPZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQkdNUnd3R2dZRFZRUUtFeE5rZVc1aGJXbGoKYkdsemRHVnVaWEl0YjNKbk1TWXdKQVlEVlFRRERCMWtlVzVoYldsamJHbHpkR1Z1WlhJdFkyRkFNVGN3TVRNNQpOakV6T0RBZUZ3MHlNekV5TURFd01qQXlNVGhhRncwek16RXhNamd3TWpBeU1UaGFNRVl4SERBYUJnTlZCQW9UCkUyUjVibUZ0YVdOc2FYTjBaVzVsY2kxdmNtY3hKakFrQmdOVkJBTU1IV1I1Ym1GdGFXTnNhWE4wWlc1bGNpMWoKWVVBeE56QXhNemsyTVRNNE1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRW1FWG1acGhpTXd1UQpOTm1qR21xQk9lL25VYkx5NGdoNE1QendoT2JKMlI0QjBIajZvZzNMZ2NwdThveXN6SDZINDVqWGlhWU1wS2o0ClczSXlmaXluQjZOQ01FQXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CQWY4d0hRWUQKVlIwT0JCWUVGQUFTM3RtdVZlSDdHcGFOR2tJTjY4NUJ2bGRXTUFvR0NDcUdTTTQ5QkFNQ0EwZ0FNRVVDSUhrZwppRm1hZUpWcE43L0xPcjR3V3ZpdlhSU3Y4OFR1eENmcWU3Zk5RT2ZzQWlFQXZtNUdESzc2YjZpUER1UmNXSFJ1CmFtaTM5MThpKzVtTTd2WUNCSEJaQVZnPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  apiServerURL: aHR0cHM6Ly8xMC41My4xNzkuMjIz
  token: dS1tbzc3M3l0dHQ0Ojl3dHc2NzZoN3Bmajh0NmticDhtOThqdHZtcGtobjl0cGRic2tsd3NyNzdqY3ZwMm1qbmQ1Zw==
  value: YXBpVmVyc2lvbjogdjEKY2x1c3RlcnM6Ci0gY2x1c3RlcjoKICAgIGNlcnRpZmljYXRlLWF1dGhvcml0eS1kYXRhOiBMUzB0TFMxQ1JVZEpUaUJEUlZKVVNVWkpRMEZVUlMwdExTMHRDazFKU1VKMlZFTkRRVmRQWjBGM1NVSkJaMGxDUVVSQlMwSm5aM0ZvYTJwUFVGRlJSRUZxUWtkTlVuZDNSMmRaUkZaUlVVdEZlRTVyWlZjMWFHSlhiR29LWWtkc2VtUkhWblZhV0VsMFlqTktiazFUV1hkS1FWbEVWbEZSUkVSQ01XdGxWelZvWWxkc2FtSkhiSHBrUjFaMVdsaEpkRmt5UmtGTlZHTjNUVlJOTlFwT2FrVjZUMFJCWlVaM01IbE5la1Y1VFVSRmQwMXFRWGxOVkdoaFJuY3dlazE2UlhoTmFtZDNUV3BCZVUxVWFHRk5SVmw0U0VSQllVSm5UbFpDUVc5VUNrVXlValZpYlVaMFlWZE9jMkZZVGpCYVZ6VnNZMmt4ZG1OdFkzaEtha0ZyUW1kT1ZrSkJUVTFJVjFJMVltMUdkR0ZYVG5OaFdFNHdXbGMxYkdOcE1Xb0tXVlZCZUU1NlFYaE5lbXN5VFZSTk5FMUdhM2RGZDFsSVMyOWFTWHBxTUVOQlVWbEpTMjlhU1hwcU1FUkJVV05FVVdkQlJXMUZXRzFhY0docFRYZDFVUXBPVG0xcVIyMXhRazlsTDI1VllreDVOR2RvTkUxUWVuZG9UMkpLTWxJMFFqQklhalp2WnpOTVoyTndkVGh2ZVhONlNEWklORFZxV0dsaFdVMXdTMm8wQ2xjelNYbG1hWGx1UWpaT1EwMUZRWGRFWjFsRVZsSXdVRUZSU0M5Q1FWRkVRV2RMYTAxQk9FZEJNVlZrUlhkRlFpOTNVVVpOUVUxQ1FXWTRkMGhSV1VRS1ZsSXdUMEpDV1VWR1FVRlRNM1J0ZFZabFNEZEhjR0ZPUjJ0SlRqWTROVUoyYkdSWFRVRnZSME5EY1VkVFRUUTVRa0ZOUTBFd1owRk5SVlZEU1Voclp3cHBSbTFoWlVwV2NFNDNMMHhQY2pSM1YzWnBkbGhTVTNZNE9GUjFlRU5tY1dVM1prNVJUMlp6UVdsRlFYWnROVWRFU3pjMllqWnBVRVIxVW1OWFNGSjFDbUZ0YVRNNU1UaHBLelZ0VFRkMldVTkNTRUphUVZablBRb3RMUzB0TFVWT1JDQkRSVkpVU1VaSlEwRlVSUzB0TFMwdAogICAgc2VydmVyOiBodHRwczovLzEwLjUzLjE3OS4yMjMvazhzL2NsdXN0ZXJzL2xvY2FsCiAgbmFtZTogY2x1c3Rlcgpjb250ZXh0czoKLSBjb250ZXh0OgogICAgY2x1c3RlcjogY2x1c3RlcgogICAgdXNlcjogdXNlcgogIG5hbWU6IGRlZmF1bHQKY3VycmVudC1jb250ZXh0OiBkZWZhdWx0CmtpbmQ6IENvbmZpZwpwcmVmZXJlbmNlczoge30KdXNlcnM6Ci0gbmFtZTogdXNlcgogIHVzZXI6CiAgICB0b2tlbjogdS1tbzc3M3l0dHQ0Ojl3dHc2NzZoN3Bmajh0NmticDhtOThqdHZtcGtobjl0cGRic2tsd3NyNzdqY3ZwMm1qbmQ1Zwo=

The apiServerURL points to service cattle-system/rancher. I think we may have some credentials error between CAPI controller and rancher.

@FrankYang0529
Copy link
Member

After few days investigation, the root cause may be that CAPI can't connect to rancher service. From Machine CR [1], its NodeHealth is false with WaitingForNodeRef reason. From CAPI controller source code, the Machine CR can pass ProviderID check, but it gets error in r.Tracker.GetClient, so the controller can't run remaining code to fix Machine CR (marking NodeHealth as true).

The error message in CAPI controller is:

"Reconciler error" err="failed to create cluster accessor: error creating client for self-hosted cluster: error creating dynamic rest mapper for remote cluster \"fleet-local/local\":
the server has asked for the client to provide credentials"
controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-local/custom-11a1422a6c10" namespace="fleet-local" name="custom-11a1422a6c10" reconcileID=77a0f08a-c816-4431-a2d9-e082c2d37064

Someone say the error is caused from different version between client-go and server [2]. In Rancher v2.7.9, it uses CAPI v1.4.4, CAPI uses controller-runtime v0.14.5, controller-runtime uses client-go v0.26.1. I tried to use different RKE2 versions like v1.26.1+rke2r1 or v1.26.10+rke2r2 with this Rancher update, but both can't fix the issue.

[1] Machine CR

apiVersion: cluster.x-k8s.io/v1beta1
kind: Machine
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/5yTT3PTMBDFvwqzZzu15L/xDBcYTh04BIb7WlonIrLkkdYpTCffnbEpIWlpoD3aem/1232rexiIUSMjtPeAznlGNt7F+dN330hxJF4F41cKmS2tjL8xGlqY3IAOt6TTAdXOOILkWYO/cxTS7WEPLdwcRPLm1jj99jOpQPxPm8OBoAU1RfZDKgSKQkqslMj+yxpHVLPfeoUWjgmoQEuLX8xAkXEYoXWTtQlY7MgujSs7Raaw+p7umziXe/jxm+VXreQvOu84eJuOFt0s5DBdm8sO4w5ayIu6qmWmSYs+y8tcNiXKpirWFeZZjVRlWvX9WkACYU9n/mewHonOmdLg7RnYpZJY6auCh6TTZQHWpS67qsmwrJu+kE0lhawKWWdCrEXZ1LXO1lkptGwaRNkpKYuyyLtc9b3U+mnxOx/2FC7uPyZwNf3zfHtLxOkp5TiSmqPsvOfIAcclV+96s91Qvyz7aL5SiMY7aB+hHOZR742b29zcfnh3qvFanuNpVz5dRmVcHzBymBRPgV5G9n5h+Hh6fa9Gi4w8xYtpbQj1D2h7tJGeUv45Ox5/BgAA///dMbyVQgQAAA
    objectset.rio.cattle.io/id: unmanaged-machine
    objectset.rio.cattle.io/owner-gvk: /v1, Kind=Secret
    objectset.rio.cattle.io/owner-name: custom-11a1422a6c10
    objectset.rio.cattle.io/owner-namespace: local
    pre-terminate.delete.hook.machine.cluster.x-k8s.io/rke-bootstrap-cleanup: rke-bootstrap-controller
  creationTimestamp: "2023-12-12T15:27:40Z"
  finalizers:
  - machine.cluster.x-k8s.io
  generation: 3
  labels:
    cluster.x-k8s.io/cluster-name: local
    cluster.x-k8s.io/control-plane: "true"
    objectset.rio.cattle.io/hash: 3476720ded1f0353285a286496a307ae60dcff91
    rke.cattle.io/cluster-name: local
    rke.cattle.io/control-plane-role: "true"
    rke.cattle.io/etcd-role: "true"
    rke.cattle.io/machine-id: 95d5b680a578f42862126427011915877d09051d288aa2bc224543b3cff2dde
    rke.cattle.io/worker-role: "true"
  name: custom-11a1422a6c10
  namespace: fleet-local
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: local
    uid: 0ae3d452-3dda-4aa2-855b-98dd57dd81e7
  resourceVersion: "3531"
  uid: a80a16c1-6313-45b5-a960-22118529f0da
spec:
  bootstrap:
    configRef:
      apiVersion: rke.cattle.io/v1
      kind: RKEBootstrap
      name: custom-11a1422a6c10
      namespace: fleet-local
    dataSecretName: custom-11a1422a6c10-machine-bootstrap
  clusterName: local
  infrastructureRef:
    apiVersion: rke.cattle.io/v1
    kind: CustomMachine
    name: custom-11a1422a6c10
    namespace: fleet-local
  nodeDeletionTimeout: 10s
  providerID: rke2://harvester-node-0
status:
  addresses:
  - address: 192.168.3.30
    type: InternalIP
  - address: harvester-node-0
    type: Hostname
  bootstrapReady: true
  conditions:
  - lastTransitionTime: "2023-12-12T15:27:40Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-12-12T15:27:40Z"
    status: "True"
    type: BootstrapReady
  - lastTransitionTime: "2023-12-12T15:27:40Z"
    status: "True"
    type: InfrastructureReady
  - lastTransitionTime: "2023-12-12T15:27:40Z"
    reason: WaitingForNodeRef
    severity: Info
    status: "False"
    type: NodeHealthy
  - lastTransitionTime: "2023-12-12T15:27:40Z"
    status: "True"
    type: PlanApplied
  - lastTransitionTime: "2023-12-12T15:27:54Z"
    status: "True"
    type: Reconciled
  infrastructureReady: true
  lastUpdated: "2023-12-12T15:27:54Z"
  observedGeneration: 2
  phase: Provisioned

[2] https://stackoverflow.com/questions/76177011/error-validating-data-the-server-has-asked-for-the-client-to-provide-credential

@bk201 bk201 changed the title Bump Rancher to v2.7.9 Bump Rancher to v2.7.9 and RKE2 to v1.26.11+rke2r1 Dec 13, 2023
@bk201 bk201 force-pushed the bump-rancher-279 branch 2 times, most recently from 47e6d04 to f942ed5 Compare December 18, 2023 09:54
@bk201 bk201 marked this pull request as ready for review December 19, 2023 08:28
Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>
Signed-off-by: Kiefer Chang <kiefer.chang@suse.com>
Copy link
Member

@FrankYang0529 FrankYang0529 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I can upgrade from v1.2.1 and provision in airgap environment.

Copy link
Contributor

@ibrokethecloud ibrokethecloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks.

@bk201 bk201 merged commit 6fe0734 into harvester:master Jan 3, 2024
5 checks passed
@FrankYang0529
Copy link
Member

@mergify backport v1.2

Copy link

mergify bot commented Feb 2, 2024

backport v1.2

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants