-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve validation of PodCIDR and ServiceClusterIPRange #15623
Conversation
/retest |
/cc @justinsb |
@@ -460,6 +461,7 @@ func Test_Validate_AdditionalPolicies(t *testing.T) { | |||
Networking: kops.NetworkingSpec{ | |||
NetworkCIDR: "10.10.0.0/16", | |||
NonMasqueradeCIDR: "100.64.0.0/10", | |||
PodCIDR: "100.96.0.0/11", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't PodCIDR
and ServiceClusterIPRange
be both part of NonMasqueradeCIDR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServiceClusterIPRange doesn't have to be part of NonMasqueradeCIDR. That is because only things within the cluster network can talk to ServiceClusterIPRange and kube-proxy reroutes ServiceClusterIPRange before masquerading. GCE currently doesn't require ServiceClusterIPRange to be in NonMasqueradeCIDR; this PR drops that validation for all of the other cloud providers as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, does is the NonMasqueradeCIDR still used for anything? It was dropped by kubelet when Docker shim was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used by ContainerdBuilder and AWSCloudControllerManagerOptionsBuilder. It is also used as an exclusion for the egress proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use by AWSCloudControllerManagerOptionsBuilder looks to me like it's a bug. It should be using the PodCIDR instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, same thought. Not sure about the other places, but I suspect we may get rid of it also.
The PR looks ok as is. We can merge it and see about NonMasqueradeCIDR later. Your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle AWSCloudControllerManagerOptionsBuilder as a separate PR. That one we might want to backport.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hakman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
we are running a number of production clusters that were setup with kops 1.18 or even earlier on AWS. We’ve been upgrading kops and k8s over the last years and we are currently on kops 1.27 and k8s 1.25. We are expanding our infra into other regions and in aim to keep our older clusters upgraded too. We’ve recently kicked off creation of new clusters, that we do directly with kops 1.28. We are following the same subnetting scheme, that we used to apply for all of our clusters, but we’ve hit some problems in kops 1.28, which leaves us in doubt how to continue further. Here’s the particular problem that we are facing when trying to create a new cluster:
Those subnets indeed overlap, but that hasn’t been an issue until kops 1.28 and we’ve been able to track that change down to this PR, which introduced the serviceClusterIPRange check, i.e.:
Since our clusters used to work well while those subnets were overlapping, could we make this check optional? Is there any risk adding a command line parameters that disables this check? Will we hit any issues in kops 1.28 and later and k8s 1.28 if this check is not done? @johngmyers may I ask you to elaborate about the reason of this change? |
@minkimipt thanks for reporting and sorry about the issue. IIRC when we merged this we believed this was not supported / caused other problems. However, as it's a regression and existings clusters at least appear to be working fine, I think we should reduce this to a warning / disable it entirely while we track down exactly what goes wrong when these CIDRs overlap, and if it is indeed a problem, figure out how to move the CIDRs to be non-overlapping. I propose we track as #16340 |
@justinsb @johngmyers we are running into the same issue, except for the subnet range. We are trying to update a cluster where the subnet and service IP range overlap. We are hitting this validation, preventing the update: I have the same questions here:
I see we've reverted the pod overlap validation: a1bba9d |
@ebdekock we were able to upgrade kops to 1.28.5 where that validation was disabled in a cluster where we had those networks overlapping. Just to confirm that it's not a problem anymore. |
I was waiting for the 1.28.5 release, as I thought it would also fix my issue, but our overlap is with the subnet CIDR and not the PodCIDR |
The issue is that if the podCDR and serviceClusterIPRange overlap, then a pod could be assigned the same IP as a service. Then traffic intended for the pod would be routed to the wrong place. Similarly, if the subnet range overlaps with the serviceClusterIPRange, then a node could be assigned the same IP as a service, causing routing problems for host-network pods. I would say the "appear to be working fine" is merely that problems are rarely detected, despite being there. I would not be in favor of weakening the validations and would suggest the clusters' networking configurations be fixed. |
Yup, that makes total sense and we agree that fixing the underlying issue is the best way forward. We would like to get the cluster updated and need to prioritize getting the networking fixed. Would it be possible to get this validation behind a temporary flag (that's off by default) to give us time to correct the issue? |
We have fixed the networking like you suggested and no longer need this, thanks! |
Fixes #15034
Removes the requirement (on non-GCE) that the ServiceClusterIPRange be within NonMasqueradeCIDR, as kube-proxy reroutes the ServiceClusterIPRange before masquerading.
Prohibits overlap between the PodCIDR and the ServiceClusterIPRange
Prohibits overlap between either the PodCIDR and the ServiceClusterIPRange and any subnet CIDR or IPv6CIDR. (except IPv6 doesn't have a PodCIDR, so can't overlap).
Adds validation of any podCIDR and requires it be within any NonMasqueradeCIDR.