Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multiple Cluster CIDRs #2594

Merged
merged 1 commit into from
Sep 3, 2021

Conversation

rahulkjoshi
Copy link
Contributor

Issue: #2593

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @rahulkjoshi!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @rahulkjoshi. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 1, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 1, 2021
@rahulkjoshi rahulkjoshi changed the title Support Multiple Cluster CIDRs WIP: Support Multiple Cluster CIDRs Apr 1, 2021
@rahulkjoshi rahulkjoshi force-pushed the cluster-cidr branch 2 times, most recently from 94f8c10 to 36849ed Compare April 1, 2021 19:54
@aojea
Copy link
Member

aojea commented Apr 1, 2021

/cc

@k8s-ci-robot k8s-ci-robot requested a review from aojea April 1, 2021 21:28
@aramase
Copy link
Member

aramase commented Apr 1, 2021

/cc

@rahulkjoshi
Copy link
Contributor Author

/cc @sdmodi

@k8s-ci-robot
Copy link
Contributor

@rahulkjoshi: GitHub didn't allow me to request PR reviews from the following users: sdmodi.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @sdmodi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved

Today, IP ranges for podCIDRs for nodes are allocated from a single range allocated to the cluster (cluster CIDR). Each node gets a range of a fixed size from the overall cluster CIDR. The size is specified during cluster startup time and cannot be modified later on.

This proposal enhances how pod CIDRs are allocated for nodes by adding a new CIDR allocator that can be controlled by a new resource `ClusterCIDRRange`. This would enable users to dynamically allocate more IP ranges for pods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tying into what was discussed in the SIG meeting, it would be worth considering the possibility that instead of adding a new config object for extending the apiserver's --allocate-node-cidrs functionality, that instead we should deprecate --allocate-node-cidrs and add a new config object so that the network plugin can explain to other components how it is allocating pod IPs itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the way this is scoped, we do not require existing Node IPAM allocators to populate what ranges have been allocated. If we did this, then it would force existing implementations to create a new object when a new range is allocated.

I view this as a good second step if there is consensus that this is a good idea. We can start with just requiring the object for input purposes and then enhance it to be consumable by other parts of the system

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I just meant having the network plugin create an object/objects saying "The following CIDRs are in use by the pod network: ...". Not necessarily saying which specific subnets of it have been allocated where.


This proposal enhances how pod CIDRs are allocated for nodes by adding a new CIDR allocator that can be controlled by a new resource 'CIDRRange'. This enables users to dynamically allocate more IP ranges for pods. In addition, it gives users the capability to control what ranges are allocated to specific nodes as well as the size of the pod CIDR allocated to these nodes.

### User Stories (Optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good user story is "I want to add more pods to a node after filling up its CIDR, but I don't want to have to kill and recreate the node's existing pods". This is something Calico (IIRC) supports, by letting you add additional CIDRs to a node rather than making you drop its existing CIDR before assigning it a new one.

(And yes, it would be complicated to add this to kcm, and it would require changing the semantics of the podCIDRs field on the node, etc. My point was more that this is something people are doing already, without having changed kcm, and maybe that's a better model.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is NOT the story being told here. Unless I gravely misunderstand, this proposes multiple CIDRs for the cluster, but still one per node. (we probably want that TOO but it's not this KEP)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We are still saying only one per node.

We do need to decide whether it is a better model to let CNI implementations do this. The fundamental question is whether we want Kubernetes to do the IPAM or not. Today Kubernetes does do IPAM out of the box and it is lacking a lot of functionality. We are proposing we enhance these capabilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the extreme other end of the spectrum: "We never should have done this, we're sorry to have teased you, please do it yourself."

We'd still need a resource (or 2) and admission controller(s). Fundamentally the model would be the same, just not built-in.

I'm not outright against adding this as another built-in, as long as we can do it in reasonable complexity. It definitely pushed the bounds and we need to think hard about it.

keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved
keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved
keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved
keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved
keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved
keps/sig-network/2594-multiple-cluster-cidrs/README.md Outdated Show resolved Hide resolved

The controller will wait until the first Node is added before picking a mode. So
users who want to specify dual-stack must first add 2 `CluterCIDRConfigs` (one
each for IPv4 and IPv6). Only after creating those resources should Nodes be added.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm, migrating a node from single-stack to dual-stack is not possible, you have single-stack nodes, you have to create new nodes, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. What I'm unsure of is whether we want to support migrating the cluster from single-stack to dual-stack while nodes are running.

For example, the model proposed here says that if any Node is in single-stack mode, the controller will only allocate single-stack CIDRs to new nodes even if there are IPv4 and IPv6 ClusterCIDRConfigs provided. The only way to migrate your cluster is to add both IPv4 and IPv6 ClusterCIDRConfigs and then delete all nodes. Then once you start adding nodes again, the controller will allocate them in dual-stack mode. Is this an acceptable workflow for us -- I have seen concern about mixing single-stack and dual-stack nodes hence this setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go for implementing something simple, and let the users do the heavylifting with the APIs we provide, they can delete nodes and reasign and deal with the pod migration, blah,blaa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've reworked the section to that effect. Let me know if it sounds good.


The initial release of this feature will be as a CRD and custom controller.
However, as part of the alpha -> beta graduation, this feature will be merged
into the core Kubernetes API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gateway friends did this dance recently kubernetes-sigs/gateway-api#707 , we should ask them about the experience
cc: @robscott

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can omit this comment if we don't start with CRDs/custom controller #2594 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we can dodge this work 😄

@rahulkjoshi
Copy link
Contributor Author

@thockin @aojea @mskrocki Let's get this finalized and merged in. The KEP deadline is fast approaching and we still need PRR sign-off. I think they stop accepting new requests soon.

@rahulkjoshi rahulkjoshi force-pushed the cluster-cidr branch 2 times, most recently from 79fade9 to 7d0fc9c Compare August 31, 2021 20:52
@mskrocki
Copy link
Contributor

mskrocki commented Sep 1, 2021

LGTM

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rahulkjoshi, sdmodi, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.