Skip to content

Commit

Permalink
Implement MultiClusterCIDR API in flannel
Browse files Browse the repository at this point in the history
This API requires Kubernetes 1.26 and is available for vxlan,
wireguard and host-gw backends.

Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
  • Loading branch information
thomasferrandiz committed Dec 12, 2022
1 parent 402276f commit 243cf7e
Show file tree
Hide file tree
Showing 22 changed files with 866 additions and 191 deletions.
137 changes: 137 additions & 0 deletions Documentation/MultiClusterCIDR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
Flannel provides experimental support for the new [MultiClusterCIDR API](https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2593-multiple-cluster-cidrs) introduced as an alpha feature in Kubernetes 1.26.

## Prerequisites
* A cluster running Kubernetes 1.26 (this was tested on version `1.26.0-alpha.1`)
* Use flannel version `0.21.0` or later
* The MultiClusterCIDR API can be used with vxlan, wireguard and host-gw backend

*Note*: once a PodCIDR is allocated to a node, it cannot be modified or removed. So you need to configure the MultiClusterCIDR before you add the new nodes to your cluster.

## How to use the MultiClusterCIDR API
### Enable the new API in the control plane
* Edit `/etc/kubernetes/manifests/kube-controller-manager.yaml` and add the following lines in the `spec.containers.command` section:
```
- --cidr-allocator-type=MultiCIDRRangeAllocator
- --feature-gates=MultiCIDRRangeAllocator=true
```

* Edit `/etc/kubernetes/manifests/kube-apiserver.yaml` and add the following line in the `spec.containers.command` section:
```
- --runtime-config=networking.k8s.io/v1alpha1
```

Both components should restart automatically and a default ClusterCIDR resource will be created based on the usual `pod-network-cidr` parameter.

For example:
```bash
$ kubectl get clustercidr
NAME PERNODEHOSTBITS IPV4 IPV6 AGE
default-cluster-cidr 8 10.244.0.0/16 2001:cafe:42::/112 24h

$ kubectl describe clustercidr default-cluster-cidr
Name: default-cluster-cidr
Labels: <none>
Annotations: <none>
NodeSelector:
PerNodeHostBits: 8
IPv4: 10.244.0.0/16
IPv6: 2001:cafe:42::/112
Events: <none>
```

### Enable the new feature in flannel
This feature is disabled by default. To enable it, add the following flag to the args of the `kube-flannel` container:
```
- --use-multi-cluster-cidr
```

Since you will specify the subnets to use for pods IP addresses through the new API, you do not need the `Network` and `IPv6Network` sections in the flannel configuration. Thus your flannel configuration could look like this:
```json
{
"EnableIPv6": true,
"Backend": {
"Type": "host-gw"
}
}
```


If you let them in, they will simply be ignored by flannel.
NOTE: this only applies when using the MultiClusterCIDR API.

### Configure the required `clustercidr` resources
Before adding nodes to the cluster, you need to add new `clustercidr` resources.

For example:
```yaml
apiVersion: networking.k8s.io/v1alpha1
kind: ClusterCIDR
metadata:
name: my-cidr-1
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "worker1"
perNodeHostBits: 8
ipv4: 10.248.0.0/16
ipv6: 2001:cafe:43::/112
---
apiVersion: networking.k8s.io/v1alpha1
kind: ClusterCIDR
metadata:
name: my-cidr-2
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "worker2"
perNodeHostBits: 8
ipv4: 10.247.0.0/16
ipv6: ""
```
For more details on the `spec` section, see the [feature specification page](https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2593-multiple-cluster-cidrs#expected-behavior).

*WARNING*: all the fields in the `spec` section are immutable.

For more information on Node Selectors, see [the Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).

### Add nodes to the cluster
The new nodes will be allocated a `PodCIDR` based on the configured `clustercidr`.
flannel will ensure connectivity between all the pods regardless of the subnet in which the pod's IP address has been allocated.

## Notes on the subnet.env file
flanneld writes a file (located by default at /run/flannel/subnet.env) that is used by the flannel cni plugin which is called by the kubelet every time a pod is added or removed from the node. This file changes slightly with the new API. The `FLANNEL_NETWORK` and `FLANNEL_IPV6_NETWORK` become lists of CIDRs instead of sigle CIDR entry. They will hold the list of CIDRs declared in the `clustercidr` resource of the API. The file is updated by flanneld every time a new `clustercidr` is created.

As an example, it could look like this:
```bash
FLANNEL_NETWORK=10.42.0.0/16,192.168.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_IPV6_NETWORK=2001:cafe:42::/56
FLANNEL_IPV6_SUBNET=2001:cafe:42::1/64,2001:cafd:42::1/64
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
```

## Notes on using IPv6 with the MultiClusterCIDR API
The feature is fully compatible with IPv6 and dual-stack networking.
Each `clustercidr` resource can include an IPv4 and/or an IPv6 subnet.
If both are provided, the PodCIDR allocated based on this `clustercidr` will be dual-stack.
The controller allows you to use IPv4, IPv6 and dual-stack `clustercidr` resources all at the same time to facilitate cluster migrations.
As a result, it is up to you to ensure the coherence of your IP allocation.

If you want to use dual-stack networking with the new API, we recommend that you do not specify the `--pod-network-cidr` flag to `kubeadm` when installing the cluster so that you can manually configure the controller later.
In that case, when you edit `/etc/kubernetes/manifests/kube-controller-manager.yaml`, add:
```
- --cidr-allocator-type=MultiCIDRRangeAllocator
- --feature-gates=MultiCIDRRangeAllocator=true
- --cluster-cidr=10.244.0.0/16,2001:cafe:42::/112 #replace with your own default clusterCIDR
- --node-cidr-mask-size-ipv6=120
- --allocate-node-cidrs
```
4 changes: 2 additions & 2 deletions Documentation/kube-flannel-psp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,8 @@ spec:
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
#image: flannelcni/flannel-cni-plugin:v1.1.2 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.2
command:
- cp
args:
Expand Down
15 changes: 11 additions & 4 deletions Documentation/kube-flannel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,13 @@ rules:
- nodes/status
verbs:
- patch
- apiGroups:
- "networking.k8s.io"
resources:
- clustercidrs
verbs:
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -123,8 +130,8 @@ spec:
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
#image: flannelcni/flannel-cni-plugin:v1.1.2 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.2
command:
- cp
args:
Expand All @@ -135,7 +142,7 @@ spec:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
#image: flannelcni/flannel:v0.20.2 for ppc64le and mips64le (dockerhub limitations may apply)
#image: flannelcni/flannel:v0.20.2 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.2
command:
- cp
Expand All @@ -150,7 +157,7 @@ spec:
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
#image: flannelcni/flannel:v0.20.2 for ppc64le and mips64le (dockerhub limitations may apply)
#image: flannelcni/flannel:v0.20.2 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.2
command:
- /opt/bin/flanneld
Expand Down
7 changes: 6 additions & 1 deletion backend/ipip/ipip.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,12 @@ func (be *IPIPBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGroup,
return nil, fmt.Errorf("failed to acquire lease: %v", err)
}

link, err := be.configureIPIPDevice(n.SubnetLease, subnet.GetFlannelNetwork(config))
net, err := config.GetFlannelNetwork(&n.SubnetLease.Subnet)
if err != nil {
return nil, err
}

link, err := be.configureIPIPDevice(n.SubnetLease, net)

if err != nil {
return nil, err
Expand Down
6 changes: 5 additions & 1 deletion backend/udp/udp_amd64.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,15 @@ func (be *UdpBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGroup, c
return nil, fmt.Errorf("failed to acquire lease: %v", err)
}

net, err := config.GetFlannelNetwork(&l.Subnet)
if err != nil {
return nil, err
}
// Tunnel's subnet is that of the whole overlay network (e.g. /16)
// and not that of the individual host (e.g. /24)
tunNet := ip.IP4Net{
IP: l.Subnet.IP,
PrefixLen: subnet.GetFlannelNetwork(config).PrefixLen,
PrefixLen: net.PrefixLen,
}

return newNetwork(be.sm, be.extIface, cfg.Port, tunNet, l)
Expand Down
12 changes: 10 additions & 2 deletions backend/vxlan/vxlan.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,12 +191,20 @@ func (be *VXLANBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGroup,
// This IP is just used as a source address for host to workload traffic (so
// the return path for the traffic has an address on the flannel network to use as the destination)
if config.EnableIPv4 {
if err := dev.Configure(ip.IP4Net{IP: lease.Subnet.IP, PrefixLen: 32}, subnet.GetFlannelNetwork(config)); err != nil {
net, err := config.GetFlannelNetwork(&lease.Subnet)
if err != nil {
return nil, err
}
if err := dev.Configure(ip.IP4Net{IP: lease.Subnet.IP, PrefixLen: 32}, net); err != nil {
return nil, fmt.Errorf("failed to configure interface %s: %w", dev.link.Attrs().Name, err)
}
}
if config.EnableIPv6 {
if err := v6Dev.ConfigureIPv6(ip.IP6Net{IP: lease.IPv6Subnet.IP, PrefixLen: 128}, subnet.GetFlannelIPv6Network(config)); err != nil {
net, err := config.GetFlannelIPv6Network(&lease.IPv6Subnet)
if err != nil {
return nil, err
}
if err := v6Dev.ConfigureIPv6(ip.IP6Net{IP: lease.IPv6Subnet.IP, PrefixLen: 128}, net); err != nil {
return nil, fmt.Errorf("failed to configure interface %s: %w", v6Dev.link.Attrs().Name, err)
}
}
Expand Down
11 changes: 10 additions & 1 deletion backend/wireguard/device.go
Original file line number Diff line number Diff line change
Expand Up @@ -219,12 +219,21 @@ func (dev *wgDevice) upAndAddRoute(dst *net.IPNet) error {
return fmt.Errorf("failed to set interface %s to UP state: %w", dev.attrs.name, err)
}

err = dev.addRoute(dst)
if err != nil {
return fmt.Errorf("failed to add route to destination (%s) to interface (%s): %w", dst, dev.attrs.name, err)
}
return nil
}

func (dev *wgDevice) addRoute(dst *net.IPNet) error {
route := netlink.Route{
LinkIndex: dev.link.Attrs().Index,
Scope: netlink.SCOPE_LINK,
Dst: dst,
}
err = netlink.RouteAdd(&route)

err := netlink.RouteAdd(&route)
if err != nil {
return fmt.Errorf("failed to add route %s: %w", dev.attrs.name, err)
}
Expand Down
16 changes: 12 additions & 4 deletions backend/wireguard/wireguard.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ func (be *WireguardBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGr
}
publicKey = dev.attrs.publicKey.String()
} else {
return nil, fmt.Errorf("No valid Mode configured")
return nil, fmt.Errorf("no valid Mode configured")
}

subnetAttrs, err := newSubnetAttrs(be.extIface.ExtAddr, be.extIface.ExtV6Addr, config.EnableIPv4, config.EnableIPv6, publicKey)
Expand All @@ -168,17 +168,25 @@ func (be *WireguardBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGr
}

if config.EnableIPv4 {
err = dev.Configure(lease.Subnet.IP, subnet.GetFlannelNetwork(config))
net, err := config.GetFlannelNetwork(&lease.Subnet)
if err != nil {
return nil, err
}
err = dev.Configure(lease.Subnet.IP, net)
if err != nil {
return nil, err
}
}

if config.EnableIPv6 {
ipv6net, err := config.GetFlannelIPv6Network(&lease.IPv6Subnet)
if err != nil {
return nil, err
}
if cfg.Mode == Separate {
err = v6Dev.ConfigureV6(lease.IPv6Subnet.IP, subnet.GetFlannelIPv6Network(config))
err = v6Dev.ConfigureV6(lease.IPv6Subnet.IP, ipv6net)
} else {
err = dev.ConfigureV6(lease.IPv6Subnet.IP, subnet.GetFlannelIPv6Network(config))
err = dev.ConfigureV6(lease.IPv6Subnet.IP, ipv6net)
}
if err != nil {
return nil, err
Expand Down
Loading

0 comments on commit 243cf7e

Please sign in to comment.