-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making IPIP/tunnel and override-nexthop independent #1025
Making IPIP/tunnel and override-nexthop independent #1025
Conversation
This is a fix for the issue described in #1022 |
@murali-reddy I think that the concern that you had in #518 is still valid. When it creates the tunnel interface it is going to connect it to what kube-router thinks of in terms of nodeIP, however, the traffic might be ingressing on a different interface because of I can see how some users who understand their network well, may desire this functionality and I think that kube-router should be flexible enough to provide it. However, I'm thinking that this PR should also include some sort of documentation warning users about combining these options and the potential risks involved. What are your thoughts? |
Tunnels are established across the nodes in different subnets with only Kubernetes node IP's as source and destinations. I only thought off between nodes (in the same or different subnets) doing iBGP peering they should see "next hop" as only Kubernetes node IP's. So in @yydzhou use-case they dont want to advertise pod CIDR's to external BGP peer's so they will not have any issues. But you are right we need to understand the impact on network toplogies where a node is multi-homed and connected to two upstream routers for redundency and they advertise the pod CIDR to the external BGP peers.
In general we need to document how to use these knobs (iBGP peering, external BGP peering, route reflectors etc) for different topolgies and bit of prescriptive configuration of these knobs for various topolgies. |
I had some experiment and found that there is still an issue, even if I didn't advertise the pod CIDR to external BGP. @murali-reddy @aauren Is this behavior expected? or should be fixed as a bug in this case?
|
@yydzhou Looks like a bug
your external BGP peer advertised a default route to the node.
tunnel Ideally this check should be a check if next hop can be reached directly by L2 that would solve the problem. |
@murali-reddy Thanks for the info. Yes, the default route was advertised by our public network router, so the tunnel was created based on that. However, there is still a problem as you pointed out. The check https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/routing/network_routes_controller.go#L536 I think we should have a separate PR to handle that? As I can imagine there would be some specific discussion regarding how to handle it. |
4eec4f2
to
5893bc1
Compare
@aauren could you please point out what's documentation contents is needed for this PR? I am happy to enhance the documentation but I am not sure if I can have an accurate description about the potential issue. |
13fce0b
to
5893bc1
Compare
Sorry its taken me a while to respond. The logic here is complex and the possible ways that people might combine options can be nuanced. I think that at a bare minimum we need to warn people that are doing all of the following together:
The warning here is that when they use Specifically, people need to take care when combining Unfortunately, at this point, the number of resulting combinations and flows and possibilities is pretty vast, and probably the best thing to do would be to warn them and then add the code references to the documentation, specifically that So anyway, some sort of warning about that added to the BGP doc: https://github.com/cloudnativelabs/kube-router/blob/master/docs/bgp.md Additionally, adding your use-case to some sort of BGP use-case documentation that we could add to, would be helpful for the project and for others. |
6837ed6
to
78b90b9
Compare
78b90b9
to
def5302
Compare
@aauren The documentation is added, mostly copied your comments with a little editing. I didn't find a seperate doc to add use case so I just put the example use case within the description. |
docs/bgp.md
Outdated
In a scenario there are multiple groups of nodes in different subnets and user wants to peer mulitple upstream routers for each of their node (.e.g., a cluster has seperate public and private networking, and the nodes in this cluster are located into two different racks, which has their own routers. So there would be two upstream routers for each node, and there are two different subnets in this case), The override-nexthop and tunnel cross subnet features need to be used together to achive the goal. | ||
|
||
to support the above case, user need to set `--enable-overlay` and `--override-nexthop` to true together. This configuration would have the following effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a scenario there are multiple groups of nodes in different subnets and user wants to peer mulitple upstream routers for each of their node (.e.g., a cluster has seperate public and private networking, and the nodes in this cluster are located into two different racks, which has their own routers. So there would be two upstream routers for each node, and there are two different subnets in this case), The override-nexthop and tunnel cross subnet features need to be used together to achive the goal. | |
to support the above case, user need to set `--enable-overlay` and `--override-nexthop` to true together. This configuration would have the following effect. | |
A common scenario exists where each node in the cluster is connected to two upstream routers that are in two different subnets. For example, one router is connected to a public network subnet and the other router is connected to a private network subnet. Additionally, nodes may be split across different subnets (e.g. different racks) each of which has their own routers. | |
In this scenario, `--override-nexthop` can be used to correctly peer with each upstream router, ensuring that the BGP next-hop attribute is correctly set to the node's IP address that faces the upstream router. The `--enable-overlay` option can be set to allow overlay/underlay tunneling across the different subnets to achieve an interconnected pod network. | |
This configuration would have the following effects: |
I took a second pass at the wording to hopefully make it a little more concise. However, before changing anything I would recommend getting feedback from @murali-reddy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aauren Thank you very much for the feedback. I have integrated your comments in new commit.
@murali-reddy Would you please take a look and let me know if there is anything else I need to add?
- if (!sameSubnet || nrc.overlayType == "full") && !nrc.overrideNextHop && nrc.enableOverlays {
+ if (!sameSubnet || nrc.overlayType == "full") && nrc.enableOverlays { This is the only code change. Which any way make sense, since tunnels should be established only with peer kubernetes nodes and not with external peers, check for
Yes. We still need to check tunnels are established or rather overlay network is confined to kubernetes nodes only. Will open a seperate tracking issue. Thanks for the PR @yydzhou LGTM |
Since both IPIP/tunnel and overrride-nexthop have their own parameters to enable/disable, it's better to make the switch independent each other. This PR is needed to support the case where both cross subnet iBGP peering (via tunnel) and multiple upstream routers peering(via override-nexthop) feature are needed.