Proposal: enhance cluster networking capabilities. #637

DrmagicE · 2021-11-29T08:27:11Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

other Note

openyurt-bot · 2021-11-29T08:27:19Z

@DrmagicE: GitHub didn't allow me to assign the following users: your_reviewer.

Note that only openyurtio members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?
other Note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Congrool · 2021-11-30T09:27:06Z

Hi, it's really an exciting feature, but I still have some questions here.

How do nodepools get their podCIDR? Is it the responsibility of this network solution or dependent on other components, such as yurt-app-manager?
We know that flannel will also allocate the podCIDR for each node, how to ensure that the nodepool podCIDR contains podCIDRs of all its member nodes. (It may be one of the CNI compatibility problems that we will encounter later）
If we select some nodes to be a new nodepool and want to join it into the network, do original pods need to restart to get their new podIP according to the nodepool podCIDR? Or, on the contary, is it the original podIPs that determine the nodepool podCIDR? (In the later case, we can also solve the problem2 if there's no conflict)
Host network subnet conflict is really a diffcult problem to solve, because we cannot determine which nodepool to send the packet. I think, in such scenario, the network situation can not be absolutely transparent to the application. Maybe we can support it in the application layer and let the application itself make the determination.

DrmagicE · 2021-11-30T12:57:21Z

@Congrool Hi, thanks for your feedback.

How do nodepools get their podCIDR? Is it the responsibility of this network solution or dependent on other components, such as yurt-app-manager?

We know that flannel will also allocate the podCIDR for each node, how to ensure that the nodepool podCIDR contains podCIDRs of all its member nodes. (It may be one of the CNI compatibility problems that we will encounter later

podCIDR is a field of the node resource. We can track all podCIDR belonging to a nodepool by list/watch mechanism.

[root@master-1 /]# kubectl get nodes master-1 -oyaml | grep podCIDR
        f:podCIDR: {}
        f:podCIDRs:
  podCIDR: 10.244.0.0/24
  podCIDRs:

But we should notice that not all CNI respects podCIDR. If we use flannel, we will be fine because flannel respect podCIDR.
However, some CNI like calico does not respect podCIDR. We have to figure out another way to get podCIDR of a node for such CNI. In other words, how to get podCIDR of a node may vary from CNI to CNI.

I suggest we start with flannel CNI, which is simple and widely use in OpenYurt.

If we select some nodes to be a new nodepool and want to join it into the network, do original pods need to restart to get their new podIP according to the nodepool podCIDR? Or, on the contary, is it the original podIPs that determine the nodepool podCIDR? (In the later case, we can also solve the problem2 if there's no conflict)

No they don't need a restart. The solution introduced in this proposal will not change the podCIDR on the node, and will not affect IP allocation of pods, IP allocation is still managed by the CNI.

Host network subnet conflict is really a diffcult problem to solve, because we cannot determine which nodepool to send the packet. I think, in such scenario, the network situation can not be absolutely transparent to the application. Maybe we can support it in the application layer and let the application itself make the determination.

Yes, still trying to figure out a way to solve the problem. It is an inevitable issue If we want this solution to replace the YurtTunnel.

rambohe-ch · 2021-11-30T14:56:59Z

@DrmagicE @Congrool podCIDR is allocated to every node by rangeAllocator in kube-controller-manager, and NodePool contains name info for nodes that resides in the NodePool, so YurtGateway can calculate all podCIDRs through list/watch nodePool and nodes, and not need to consider the cni solution is flannel or calico.

DrmagicE · 2021-11-30T16:03:55Z

@rambohe-ch Thanks for your reply.
I am not very experienced with calico, but I found some information here:
projectcalico/calico#2592 (comment)
This comment indicates that calico's IPAM plugin doesn't respect the values given to Node.Spec.PodCIDR, which means the pod IP allocated by calico's IPAM may not belong to Node.Spec.PodCIDR. In such a case, Node.Spec.PodCIDR does not cover all pod IPs of that node. Some of the pod IPs will not include in the VPN tunnel, thus losing connectivity.

rambohe-ch · 2021-12-01T02:10:27Z

@DrmagicE Thanks for your feedback. if calico IPAM does not belong to node.Spec.PodCIDR, maybe we need to consider the other solution instead of PodCIDR. At first, we can start by flannel.

yixingjia · 2021-12-01T05:44:14Z

docs/proposals/20211123-enhancement-of-cluster-networking.md

+A new component that is responsible for configuring route table, route policy and other network-related configurations on nodes.
+YurtRouter is a daemonset and is deployed on all nodepools that are participating in the VPN tunnel. 
+
+### Network Reachability Requirement


This requirement is pretty challenge on edge side, Do you consider running a service on cloud to help establish the tunnel between different node-pools?

@yixingjia edge yurt-gateway can connect with each other through cloud yurt-gateway. and the detail info will be added after next community meeting discussion.

Ok, then the yurt-gateway on cloud can consider to change name like yurt-cloud-gateway and the gateway on the edge called yurt-edge-gateway. SDN have similar implementation for those kinds of requirements.

@yixingjia we will discuss component name in the other proposal.

vincent-pli · 2021-12-02T01:58:33Z

docs/proposals/20211123-enhancement-of-cluster-networking.md

+Thus, YurtGateway is not aware of failover of the other side, and when failover occurs, the VPN tunnel is broken.
+
+To fix that:
+1. YurtGateway should be able to detect the VPN status. Once it detects failover on the other side, it will try to connect the other backup.


How the yurtGateway to know the new active gateway in target nodepool when a failover occurred in target nodepool, I remember we have leader election to handle the SPOF, but how others know exactly who win the election?

@vincent-pli Hi, sorry for the late reply. Base on discussions on our latest community meeting, we may not support H/A in our first release.We need to think more carefully about how to achieve H/A, especially in node autonomy circumstances.

Welcome to get involved and share your ideas.

adamzhoul · 2021-12-15T13:00:43Z

hi @DrmagicE

can you add some detail about how to config vxlan to redirect traffic to the gateway node
more importantly how vxlan device route packet to IPsec on the gateway node

I'm a little confused.

thanks

DrmagicE · 2021-12-16T03:03:48Z

hi @DrmagicE

can you add some detail about how to config vxlan to redirect traffic to the gateway node more importantly how vxlan device route packet to IPsec on the gateway node

I'm a little confused.

thanks

Hi, "redirect vxlan traffic to gateway node" is based on IP packet forwarding. We can configure the IP route table of non-gateway nodes via the ip r add command. I think from the perspective of the Linux routing table, there is not much difference between routing vxlan packets and normal IP packets.

For vxlan mode, the routing rules are as same as host-gw mode. Here is an example shown in the proposal:

$ ip rule
0:	from all lookup local
# Set up route policy for pod CIDR of nodepoolB to use cross_edge_table.
32764:	from all to 10.244.2.0/24 lookup cross_edge_table
32765:	from all to 10.244.3.0/24 lookup cross_edge_table
32766:	from all lookup main
32767:	from all lookup default
$ ip r list table cross_edge_table
# 10.0.20.13 is the private IP of the local gateway node.
# We need to set a smaller MTU for IPsec traffic, the concrete number is yet to be determined.
default via 10.0.20.13 dev eth0 mtu 1400

rambohe-ch · 2021-12-17T02:10:27Z

@DrmagicE I will merge this pull request, and the detail design like API will be discussed in raven repo(htts://github.com/openyurtio/raven).

DrmagicE · 2021-12-17T02:55:50Z

@rambohe-ch Ok.

rambohe-ch · 2021-12-17T06:09:42Z

/lgtm
/approve

adamzhoul · 2021-12-20T02:08:03Z

hi @DrmagicE
can you add some detail about how to config vxlan to redirect traffic to the gateway node more importantly how vxlan device route packet to IPsec on the gateway node
I'm a little confused.
thanks

Hi, "redirect vxlan traffic to gateway node" is based on IP packet forwarding. We can configure the IP route table of non-gateway nodes via the ip r add command. I think from the perspective of the Linux routing table, there is not much difference between routing vxlan packets and normal IP packets.

For vxlan mode, the routing rules are as same as host-gw mode. Here is an example shown in the proposal:
$ ip rule
0:	from all lookup local
# Set up route policy for pod CIDR of nodepoolB to use cross_edge_table.
32764:	from all to 10.244.2.0/24 lookup cross_edge_table
32765:	from all to 10.244.3.0/24 lookup cross_edge_table
32766:	from all lookup main
32767:	from all lookup default
$ ip r list table cross_edge_table
# 10.0.20.13 is the private IP of the local gateway node.
# We need to set a smaller MTU for IPsec traffic, the concrete number is yet to be determined.
default via 10.0.20.13 dev eth0 mtu 1400

thanks for the reply @DrmagicE . and sorry for the late.

personal tried to lead traffic from nodeA -> nodeB using vxlan device.
and encountered problems.

on both nodes, add vxlan device

ip link add vxlan type vxlan \
    id 1 \
    dstport 4789 \
    local ${localMachineIp} \
    nolearning \
    dev eth0

add a testIp and config to using vxlan on nodeA

# manual add a route to test traffic
 ip route add ${githubIp} dev vxlan

bridge fdb append ${vxlanMacNodeB} dev vxlan dst ${nodeBIP}
ip neigh add ${githubIp}  lladdr ${vxlanMacNodeB} dev vxlan

ping $githubIp on nodeA

tcpdump -i vxlan on nodeB
capture packet with ICMP request, no ICMP reply.
capture no packet on eth0

in all, simply adding vxlan、route table is not enough to redirect traffic.

what I did to make it work:

give both vxlan device IP, then we can capture packet on eth0
add iptables -t nat -A POSTROUTING -s {vxlanIP}/16 -j MASQUERADE to do SNAT when going through eth0

maybe I make things complicated.
if I misunderstood your point or you have a better solution
please let me know, thanks.

Oh, by the way. I can't simply use the route table to redirect traffic from nodeA -> nodeB.
because in cloud env, there arp is answered by the gateway device, packet out of nodeA arrived at gateway will never find nodeB(dst IP is podIP, not nodeB IP). this is why we have to rely on vxlan.

rambohe-ch · 2021-12-22T02:45:31Z

@adamzhoul we can discuss the details of raven at https://github.com/openyurtio/raven, and i will merge this pull request at first.

rambohe-ch · 2021-12-22T02:45:52Z

/lgtm
/approve

openyurt-bot · 2021-12-22T02:46:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DrmagicE, rambohe-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rambohe-ch]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: zhanglifang@chinatelecom.cn <zhanglifang@chinatelecom.cn>

openyurt-bot added the kind/feature kind/feature label Nov 29, 2021

openyurt-bot requested review from huangyuqi and rambohe-ch November 29, 2021 08:27

openyurt-bot added the size/L size/L: 100-499 label Nov 29, 2021

DrmagicE changed the title ~~add cluster networking enhancement proposal~~ Proposal: enhance cluster networking capabilities. Nov 29, 2021

yixingjia reviewed Dec 1, 2021

View reviewed changes

vincent-pli reviewed Dec 2, 2021

View reviewed changes

rambohe-ch mentioned this pull request Dec 7, 2021

[Question] How tunnel handle connections #659

Closed

DrmagicE mentioned this pull request Dec 8, 2021

[Vote] About the naming of the networking project. #663

Closed

add cluster networking enhancement proposal

573a7d0

openyurt-bot assigned rambohe-ch Dec 22, 2021

openyurt-bot added the lgtm lgtm label Dec 22, 2021

openyurt-bot added the approved approved label Dec 22, 2021

openyurt-bot merged commit d37ce4f into openyurtio:master Dec 22, 2021

MrGirl pushed a commit to MrGirl/openyurt that referenced this pull request Mar 29, 2022

add cluster networking enhancement proposal (openyurtio#637)

7992bf8

Co-authored-by: zhanglifang@chinatelecom.cn <zhanglifang@chinatelecom.cn>

njucjc mentioned this pull request Jun 1, 2022

not absolutely compatible with calico openyurtio/raven#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: enhance cluster networking capabilities. #637

Proposal: enhance cluster networking capabilities. #637

DrmagicE commented Nov 29, 2021

openyurt-bot commented Nov 29, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

other Note

Congrool commented Nov 30, 2021 •

edited

Loading

DrmagicE commented Nov 30, 2021

rambohe-ch commented Nov 30, 2021

DrmagicE commented Nov 30, 2021

rambohe-ch commented Dec 1, 2021

yixingjia Dec 1, 2021

rambohe-ch Dec 1, 2021

yixingjia Dec 2, 2021

rambohe-ch Dec 16, 2021

vincent-pli Dec 2, 2021 •

edited

Loading

DrmagicE Dec 16, 2021

adamzhoul commented Dec 15, 2021

DrmagicE commented Dec 16, 2021

rambohe-ch commented Dec 17, 2021

DrmagicE commented Dec 17, 2021

rambohe-ch commented Dec 17, 2021

adamzhoul commented Dec 20, 2021

rambohe-ch commented Dec 22, 2021

rambohe-ch commented Dec 22, 2021

openyurt-bot commented Dec 22, 2021

Proposal: enhance cluster networking capabilities. #637

Proposal: enhance cluster networking capabilities. #637

Conversation

DrmagicE commented Nov 29, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

other Note

openyurt-bot commented Nov 29, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

other Note

Congrool commented Nov 30, 2021 • edited Loading

DrmagicE commented Nov 30, 2021

rambohe-ch commented Nov 30, 2021

DrmagicE commented Nov 30, 2021

rambohe-ch commented Dec 1, 2021

yixingjia Dec 1, 2021

Choose a reason for hiding this comment

rambohe-ch Dec 1, 2021

Choose a reason for hiding this comment

yixingjia Dec 2, 2021

Choose a reason for hiding this comment

rambohe-ch Dec 16, 2021

Choose a reason for hiding this comment

vincent-pli Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

DrmagicE Dec 16, 2021

Choose a reason for hiding this comment

adamzhoul commented Dec 15, 2021

DrmagicE commented Dec 16, 2021

rambohe-ch commented Dec 17, 2021

DrmagicE commented Dec 17, 2021

rambohe-ch commented Dec 17, 2021

adamzhoul commented Dec 20, 2021

rambohe-ch commented Dec 22, 2021

rambohe-ch commented Dec 22, 2021

openyurt-bot commented Dec 22, 2021

Congrool commented Nov 30, 2021 •

edited

Loading

vincent-pli Dec 2, 2021 •

edited

Loading