Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue during installation when rancher is not ready #627

Closed
alknopfler opened this issue Aug 1, 2024 · 2 comments · Fixed by #705
Closed

issue during installation when rancher is not ready #627

alknopfler opened this issue Aug 1, 2024 · 2 comments · Fixed by #705
Assignees
Labels
area/build-and-release Indicates issue or PR related to build or release kind/enhancement Categorizes issue or PR as related to a new feature.

Comments

@alknopfler
Copy link

What steps did you take and what happened?

To reproduce you can do:

  • Helm chart to install rancher
  • Helm chart to install rancher-turtles

To reproduce you have to launch both helm chart at the same time, because rancher will take more time, and the rancher turtles fails because some CRD are not present.

+ helm_v3 install --namespace rancher-turtles-system --create-namespace --version 0.1.0+up0.9.1 --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rancher-turtles /tmp/rancher-turtles.tgz
Error: INSTALLATION FAILED: failed pre-install: unable to build kubernetes object for deleting hook rancher-turtles/templates/pre-install-job.yaml: resource mapping not found for name: "embedded-cluster-api" namespace: "" from "": no matches for kind "Feature" in version "management.cattle.io/v3"
ensure CRDs are installed first
+ exit

What did you expect to happen?

The expected will be:

  • Rancher-turtles should wait to be installed. Helm will use the retries to try to install it again once CRD and resources from rancher are created.
  • but the problem is that a clean-up process starts in rancher-turtles, and the cleanup process fails due to:
rancher-turtles-system            rancher-capiprovider-cleanup-bwwd5                       0/1     Error              0               7m14s
rancher-turtles-system            rancher-capiprovider-cleanup-jq6cn                       0/1     Error              0               11m
rancher-turtles-system            rancher-capiprovider-cleanup-lgk5t                       0/1     Error              0               9m55s
rancher-turtles-system            rancher-capiprovider-cleanup-n8rs4                       0/1     Error              0               113s
rancher-turtles-system            rancher-capiprovider-cleanup-rbj72                       0/1     Error              0               12m
rancher-turtles-system            rancher-capiprovider-cleanup-tnn5j                       0/1     Error              0               11m
rancher-turtles-system            rancher-capiprovider-cleanup-xtdlk                       0/1     Error              0               12m

and the logs for that pods shows:

mgmt-cluster-network:~ # k logs rancher-capiprovider-cleanup-rbj72 -n rancher-turtles-system
error: the server doesn't have a resource type "capiproviders"

so the installation cannot be uninstalled.

  • After timeout, hel cleanup pods disapears, and the helm-install-rancher-turtle pod is in crashlookbackoff:
mgmt-cluster-network:~ # helm uninstall rancher-turtles --namespace rancher-turtles-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/rancher/rke2/rke2.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /etc/rancher/rke2/rke2.yaml
Error: 1 error occurred:
	* timed out waiting for the condition

and the following retry to install rancher-turtles the logs shows:

+ helm_v3 install --namespace rancher-turtles-system --create-namespace --version 0.1.0+up0.9.1 --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rancher-turtles /tmp/rancher-turtles.tgz
Error: INSTALLATION FAILED: cannot re-use a name that is still in use

because the previous one could not be deleted with the cleanup process.

How to reproduce it?

  • helm install rancher
  • helm install rancher-turtles

Rancher Turtles version

0.9.1

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug

@salasberryfin
Copy link
Contributor

Thanks @alknopfler for reporting this. After some investigation, this is not only an issue when installing Turtles before Rancher is ready, which is a hard dependency for Turtles. The issue can also occur when any errors during installation leave with missing CRDs, specifically capiproviders which is then cleaned up during pre-delete. This causes the cleanup pod to keep erroring and prevents the chart from being completely uninstalled. We should consider protecting users from this situation.

@kkaempf kkaempf added this to the August release milestone Aug 6, 2024
@salasberryfin
Copy link
Contributor

Thanks to @alknopfler who confirmed that they have been able to implement a temporary workaround to this issue, making sure Turtles installation is only applied after Rancher pods are available (and therefore Rancher CRDs).

The issue originates because they have to build their own stack until rancher/highlander#47 is unblocked and we can provide CAPI-related images via Prime registry.

After discussing how to proceed, we thought that the sensible thing to do is to try to have the publishing of images to Prime registry unblocked so Edge don't have to maintain this ad-hoc stack they had to create. While this is resolved, their temporary solution can be used instead and we'll re-evaluate when images are finally available. If this situation persists, we may need to revisit this.

@kkaempf kkaempf added kind/enhancement Categorizes issue or PR as related to a new feature. area/build-and-release Indicates issue or PR related to build or release labels Aug 13, 2024
hardys added a commit to hardys/charts that referenced this issue Aug 28, 2024
hardys added a commit to hardys/charts that referenced this issue Aug 29, 2024
This is a workaround for:
rancher/turtles#627

Also it appears this is not currently working correctly ref:
rancher/turtles#704
@Danil-Grigorev Danil-Grigorev self-assigned this Aug 30, 2024
hardys added a commit to hardys/charts that referenced this issue Sep 24, 2024
hardys added a commit to suse-edge/charts that referenced this issue Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-and-release Indicates issue or PR related to build or release kind/enhancement Categorizes issue or PR as related to a new feature.
Development

Successfully merging a pull request may close this issue.

5 participants