-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to new_cluster.go break OpenStack creation (and others?) #14546
Comments
@olemarkus @zetaab Any ideas? |
@kciredor could you please provide the exact command you ran to reproduce? |
Looks specific to OpenStack, as that is the only cloud provider other than AWS that calls into the cloud to get the default instance type. |
Here's the full command I used @johngmyers:
Besides the bug I guess it would make sense to be able to specify the bastion instance type from the commandline. I can specify master and nodes already. |
I added a |
Right, makes sense @johngmyers, I generally like to "pin" things so that would be my motivation for setting the bastion instance type from the cli. What remains is that for now I can't deploy Kops 1.25 on OpenStack. Perhaps @olemarkus or @zetaab can assist? |
Let me have a look at how easy of a fix this is. I unfortunately don't have access to an openstack environment anymore, but hopefully this can be reproduced with integration tests. |
If you want I can test patch(es) on our OS, let me know please if I can help. |
You can test this one: #14630 |
I was not able to build your branch @olemarkus, so instead applied your patch to the v1.25.2 branch, which builds fine. Now it proceeds beyond the previous panic but then panics again:
I've also tried your patch on the master branch but unable to build, same as your branch:
|
/reopen |
@olemarkus: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This one is a bit more tricky to solve with tests, I think ... @zetaab have you encountered this? |
We are not using this part of the code at all. We create all clusters using yaml spec, not from cli parameters |
That last error seems to be from a task though ... |
Same command as before @olemarkus |
The stack trace suggests that no |
@olemarkus do you think @johngmyers latest comment could be of help with the latest issue? |
Yeah it most certainly is. |
I looked through the code and it's quite convoluted as @johngmyers said. I refactored a bit in #14742 so you may try that one. There was one case that would result in a Still can you give that patch a try? If that one doesn't work, can you paste your |
Thanks @olemarkus. I was able to build your commit with the change and this resulted in:
My spec.cloudConfig.openStack:
|
Thanks. Doesn't seem like it helped much ... but at least the code is cleaner :P |
#14743 should make the npr go away, but still ... you shouldn't end up there with the config above. |
#14744 should probably fix things. |
I've tried #14744 by building your fork branch @olemarkus and running it on two OpenStack environments. Output:
So it seems to go through the lb part, but ends up with a different problem now. |
Progress on some level at least :) |
There is some weird logic around the use of private topology, gossip, and public load balancer. |
One more step: #14806 |
Thanks @olemarkus. With kOps on OpenStack after creating a cluster with a working kOps version I end up with a loadbalancer in front of the k8s api having an external ip address taken from the pool, which is firewalled by a security group for my given cidr. So I don't know the api external ip until after the cluster has been created. Output using #14806:
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@olemarkus this is still a thing actually |
I believe this has been fixed, yes. /close |
@olemarkus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
v1.25.2
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.v1.25.x
3. What cloud provider are you using?
OpenStack
4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster
according to https://kops.sigs.k8s.io/getting_started/openstack/5. What happened after the commands executed?
Panic, stack trace:
6. What did you expect to happen?
Creation of a cluster without problems.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
No problems with kOps 1.24.x, this started with 1.25.x. I've tried to find out where the issue occurs in the code and traced it to https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L423 which has a reference to cloud which is nil because it's declared here https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L276 but not assigned any value. The only cloud that gets assigned a value appears to be AWS here https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L286.
The reason this cloud = nil problem hurts is because you cannot assign a machine type from the kops create cluster commandline to bastion hosts, you can only assign machine types to masters and nodes. So it's trying to think of a machine type itself and calls a defaultMachineType which requires a cloud instance.
It appears this part of the code went through quite a large refactor between 1.24.x and 1.25.x and now it's broken at least on OpenStack. I've tried two different OpenStack cloud providers.
The text was updated successfully, but these errors were encountered: