-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a Node's primary NIC as the secondary OVS bridge physical interface #6108
Conversation
2fccd34
to
1805b30
Compare
/cc @jianjuns |
1805b30
to
32825e4
Compare
pkg/agent/secondarynetwork/init.go
Outdated
if err != nil { | ||
return err | ||
} | ||
|
||
phyInterfaces := make([]string, len(secNetconfig.OVSBridges[0].PhysicalInterfaces)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move these two lines to be inside the if len(secNetconfig.OVSBridges[0].PhysicalInterfaces) == 1
block? Here we can do phyInterfaces := secNetconfig.OVSBridges[0].PhysicalInterfaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So secNetConfig is passed as a pointer, and I'm updating the name of the interface eth0
-> eth0~
just before calling connectPhyInterfacesToOVSBridge(phyInterfaces)
.
phyInterfaces := secNetconfig.OVSBridges[0].PhysicalInterfaces
this would create a shallow copy and renaming interface name in phyInterfaces
would mutate the original config struct, right?
pkg/agent/agent_linux.go
Outdated
klog.ErrorS(err, "Configure route to uplink interface failed", "uplink", uplinkName) | ||
} | ||
} | ||
util.RestorePhyInterfaceConfiguration(i.ovsBridge, i.nodeConfig.UplinkNetConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it may be safer to follow your new way of getting the interface config again from the system, so even the config (IP, routes, etc.) have been changed after the interface is moved the bridge, we can restore the changes. What you think? @gran-vmv : thoughts?
It is good to check if some fields in nodeConfig.UplinkNetConfig
can be removed, if we just re-fetch them from the system and they are not used anywhere else.
32825e4
to
66f782c
Compare
pkg/agent/secondarynetwork/init.go
Outdated
copy(phyInterfaces, secNetconfig.OVSBridges[0].PhysicalInterfaces) | ||
if len(secNetconfig.OVSBridges[0].PhysicalInterfaces) == 1 { | ||
externalIDs := map[string]interface{}{ | ||
interfacestore.AntreaInterfaceTypeKey: interfacestore.AntreaUplink, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Fine to leave it in the caller then.
pkg/agent/agent_linux.go
Outdated
klog.ErrorS(err, "Configure route to uplink interface failed", "uplink", uplinkName) | ||
} | ||
} | ||
util.RestoreHostInterfaceConfiguration(i.ovsBridge, i.nodeConfig.UplinkNetConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy back my earlier comments that got resolved after code update:
I feel it may be safer to follow your new way of getting the interface config again from the system, so even the config (IP, routes, etc.) have been changed after the interface is moved the bridge, we can restore the changes. What you think? @gran-vmv : thoughts?
It is good to check if some fields in nodeConfig.UplinkNetConfig can be removed, if we just re-fetch them from the system and they are not used anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I f my understanding is correct, your point is when trying to restore the configurations, the steps is like this, 1) read configurations (Name, MAC, IP, Route) from the OVS internal port (host Interface), 2) remove OVS ports including both host Interface and uplink, 3) move the configurations read in step 1) to the uplink ?
This solution sounds good to me, the advantage is it can recover the runtime configurations which are updated in the period since the uplink was moved to OVS, which may not exist in the memory. An precondition is this restore method must be called after the uplink was successfully moved to OVS, but not be used a rollback during the period when moving the uplink to OVS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think IP and route are more important.
An precondition is this restore method must be called after the uplink was successfully moved to OVS, but not be used a rollback during the period when moving the uplink to OVS.
This is a good point. @aroradaman : could you check we can run into this case - restoration after host interface connection failure, that all host interface configurations are moved to the bridge port?
66f782c
to
0a5d866
Compare
pkg/agent/util/net_linux.go
Outdated
if err := RenameInterface(linkConfig.Name, bridgedName); err != nil { | ||
return "", false, err | ||
} | ||
if _, err := bridge.CreateInternalPort(linkConfig.Name, int32(linkConfig.OFPort), linkConfig.MAC.String(), externalIDs); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here linkConfig.OFPort
is supposed to be prepared for the uplink not the host interface (internal port), it seems a misuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm passing the following config to PrepareHostInterfaceConnection
from ConnectUplinkToOVSBridge
now.
&config.AdapterNetConfig{
Name: uplinkNetConfig.Name,
OFPort: i.nodeConfig.HostInterfaceOFPort,
Index: uplinkNetConfig.Index,
MAC: uplinkNetConfig.MAC,
IPs: uplinkNetConfig.IPs,
Routes: uplinkNetConfig.Routes,
}
pkg/agent/agent_linux.go
Outdated
klog.ErrorS(err, "Configure route to uplink interface failed", "uplink", uplinkName) | ||
} | ||
} | ||
util.RestoreHostInterfaceConfiguration(i.ovsBridge, i.nodeConfig.UplinkNetConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I f my understanding is correct, your point is when trying to restore the configurations, the steps is like this, 1) read configurations (Name, MAC, IP, Route) from the OVS internal port (host Interface), 2) remove OVS ports including both host Interface and uplink, 3) move the configurations read in step 1) to the uplink ?
This solution sounds good to me, the advantage is it can recover the runtime configurations which are updated in the period since the uplink was moved to OVS, which may not exist in the memory. An precondition is this restore method must be called after the uplink was successfully moved to OVS, but not be used a rollback during the period when moving the uplink to OVS.
0a5d866
to
27c1934
Compare
pkg/agent/util/net_linux.go
Outdated
if err := DeleteOVSPort(brName, linkConfig.Name); err != nil { | ||
klog.ErrorS(err, "Delete OVS port failed", "port", linkConfig.Name) | ||
} | ||
if err := DeleteOVSPort(brName, bridgedName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Wenying's point, if the uplink port is not created, which means the physical interface has not been moved to the bridge, we should not restore the IPs & routes from the OVS internal port to the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for all these operations, we should just ignore the not-exist error, as the func can be called after a partial failure happened at interface moving, and so possible the interface is not moved and the OVS port is not created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to exit early, if the bridged port doesn't exists (eth0~).
And for all these operations, we should just ignore the not-exist error
DeleteOVSPort runs the del command with --if-exists
flag, so those errors will be ignored. And also the first condition would now prevent this.
27c1934
to
8b79d5a
Compare
pkg/agent/secondarynetwork/init.go
Outdated
if len(secNetConfig.OVSBridges[0].PhysicalInterfaces) == 1 { | ||
iface, ifaceIPs, ifaceRoutes, err := util.GetInterfaceConfig(secNetConfig.OVSBridges[0].PhysicalInterfaces[0]) | ||
if err != nil { | ||
klog.ErrorS(err, "failed to restore ovs bridge") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should log the exact error: klog.ErrorS(err, "Failed to get interface config", "interface", secNetConfig.OVSBridges[0].PhysicalInterfaces[0])
.
Can we know If the interface does not exist, and not log that? This is not very important.
8b79d5a
to
7f45a55
Compare
pkg/agent/util/net_linux.go
Outdated
|
||
// RestoreHostInterfaceConfiguration restore the configuration from bridge back to host interface, revering the | ||
// actions taken in PrepareHostInterfaceConnection. | ||
func RestoreHostInterfaceConfiguration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-checked the interface move and restore code paths, and I see the following steps:
Move host interface bridge:
a1. rename host interface
a2. create internal port
a3. move config to internal port
a4. move host interface to bridge
Restore host interface:
b1. get internal port config
b2. delete internal port
b3. delete host interface bridge port
b4. rename host interface
b5. restore host interface config
If we failed at a2 earlier, then the code will not go to b3.
So, I am thinking about the following changes:
- even b1 fails and b2 fails because of port does not exist, we should still do b3 and b4.
- we should also ignore port not exist for b3
- probably move b1 into
RestoreHostInterfaceConfiguration
; but we can pass the saved IPs&routes to the func if you want to fall back the saved state ifGetInterfaceConfig
fails (in my mind this is not very necessary and we can just rely onGetInterfaceConfig
). - if IPs or routes are nil, we skip restoring them in b5.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we execute the steps of RestoreHostInterfaceConfiguration
in reverse of the order in which they were executed in PrepareHostInterfaceConnection
? And we only execute steps when required:
- we can call delete-port (both internal and host) with --if-exists
- move ips if required (nil config means ips are already present with the interface, right?)
- rename interface if both eth0~ exists and eth0 doesn't exist.
(this is similar to what you suggested above, but following the reverse order of execution)
Init()
- PrepareHostInterfaceConnection (
eth0 idx:3
)- a1. Rename (
eth0 idx:3
) -> (eth0~ idx:3
) - a2. Create internal (
eth0 idx:50
) - a3. MoveConfig (
eth0 idx:50
)
- a1. Rename (
- a4. CreateUplinkPort(
eth0~ idx:3
)
Restore()
- RestoreHostInterfaceConfiguration (
eth0 idx:50
)- b1. Get config from (
eth0 idx:50
)
pre req: to get the config - b2. Delete (
eth0~ idx:3
)
(reverse of a4, if port doesn't exist) - b3. MoveConfig (
eth0~ idx:3
)
(reverse of a3, if config is non-empty) - b4. Delete Internal Port (
eth0 idx:50
)
(reverse of a2, if port doesn't exist) - b5. Rename (
eth0~ idx:3
) -> (eth0 idx:3
)
(reverse of a1)
- b1. Get config from (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Souds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used the original order only, somehow reverse order was not working.
(Have added the checks before performing any action.)
pkg/agent/agent_linux.go
Outdated
@@ -166,30 +166,6 @@ func (i *Initializer) saveHostRoutes() error { | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we can remove saveHostRoutes
and change prepareOVSBridgeForK8sNode
to use GetInterfaceConfig
. Could you check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyingd : I saw there is a difference between GetAllIPNetsByName
and GetIPNetsByLink
called by GetInterfaceConfig
that the former ignores link local addresses. Should we always ignore link local addresses? Could/should we change GetIPNetsByLink
to do that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we can remove saveHostRoutes and change prepareOVSBridgeForK8sNode to use GetInterfaceConfig. Could you check?
It may not work for some corner cases if the uplink is configured by DHCP, and static routes exist on the host before attaching it to OVS. If the NIC is configured with DHCP, the routes on the host interface are configured from DHCP responses, then the static routes are possibly lost if we don't do it in the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always ignore link local addresses? Could/should we change GetIPNetsByLink to do that too?
Having checked the code, it sounds good to me that ignoring the link-local addresses in GetIPNetsByLink
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went thought relevant functions today and agree they are kind of messy. I'm working on a patch trying to sort out these functions and moving business specific code out of util package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true. For this PR, do you agree to change GetIPNetsByLink
to ignore link local addresses? We can leave refactoring (like removing GetAllIPNetsByName
or not) to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetInterfaceConfig
is used by ExternalNode feature, not related to Egress.
Actually both GetIPNetsByLink
and GetAllIPNetsByName
are used to get the IPs that will be moved to OVS bridge port, one for K8s bridging mode, one for ExternalNode. Since they are essentially the same, I don't think there is a problem to make both ignore link local addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. then let us change GetIPNetsByLink
to ignore link local addresses. @aroradaman
7f45a55
to
eb0dead
Compare
pkg/agent/agent_linux.go
Outdated
klog.ErrorS(err, "Configure route to uplink interface failed", "uplink", uplinkName) | ||
} | ||
} | ||
util.RestoreHostInterfaceConfiguration(i.ovsBridge, i.nodeConfig.UplinkNetConfig.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can skip setting link index, IP, routes in prepareOVSBridgeForK8sNode
.
pkg/agent/agent_linux.go
Outdated
@@ -166,30 +166,6 @@ func (i *Initializer) saveHostRoutes() error { | |||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eb0dead
to
7007314
Compare
@jianjuns I'm facing a weird problem.
Do we need to provide some |
pkg/agent/agent_linux.go
Outdated
@@ -70,22 +70,25 @@ func (i *Initializer) prepareOVSBridgeForK8sNode() error { | |||
uplinkNetConfig := i.nodeConfig.UplinkNetConfig | |||
uplinkNetConfig.Name = adapter.Name | |||
uplinkNetConfig.MAC = adapter.HardwareAddr | |||
uplinkIPs, err := getAllIPNetsByName(adapter.Name) | |||
_, uplinkIPs, uplinkRoutes, err := getInterfaceConfig(adapter.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we need not to get uplink IPs and Routes here, but can just get them in PrepareHostInterfaceConnection
.
pkg/agent/agent_linux.go
Outdated
} | ||
var uplinkV4Routes []interface{} | ||
for _, route := range uplinkRoutes { | ||
if route.(netlink.Route).Gw.To4() == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess no need to skip IPv6 routes. At least no harm to move them too.
pkg/agent/util/net_linux.go
Outdated
// actions taken in PrepareHostInterfaceConnection. | ||
func RestoreHostInterfaceConfiguration(brName string, interfaceName string) { | ||
klog.InfoS("Restoring bridge config to host interface") | ||
if interfaceName != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do the check in the caller. It looks unnatural to pass a "" interfaceName.
pkg/agent/util/net_linux.go
Outdated
// RestoreHostInterfaceConfiguration restore the configuration from bridge back to host interface, reverting the | ||
// actions taken in PrepareHostInterfaceConnection. | ||
func RestoreHostInterfaceConfiguration(brName string, interfaceName string) { | ||
klog.InfoS("Restoring bridge config to host interface") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you increase verbose level for this log?
pkg/agent/util/net_linux.go
Outdated
} | ||
} | ||
} | ||
klog.InfoS("Finished to restore bridge config to host interface") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we log bridge name and interface name too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should return at some errors and do not log this message which sounds like restoration succeeded.
pkg/agent/util/net_linux.go
Outdated
if interfaceName != "" { | ||
bridgedName := GenerateUplinkInterfaceName(interfaceName) | ||
// restore if interface eth0~ exists | ||
if exists, err := interfaceExists(bridgedName); exists || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just log an error and return if err != nil. The operation should not fail and if it can really fail the following operations (and we do not know if they are really nedded) can fail too.
@wenyingd : do you know the |
May I know the default value for config Besides, it seems some re-creation on "eth0" happened as we may see this log intermittently, which may happen when the netlink doesn't exist on the host. If that is true, we may see that a change history on the the value of "disable_ipv6", -1 -> 1.
A strange observation is why the initial value of eth0 is "0" if the default is 1, did you manually set it? |
@wenyingd
Netlink is just an API, even if it disappears for a while it shouldn't affect the underlying value which we set initially, right? |
I wonder the thing is it doesn't disappear for a while, but is deleted and then re-created. So the latter is a new one without your previous manual configurations. To fix it, can we set the default value as "0" in advance. Then even for the new created ports, it can be enabled. |
7007314
to
4afe74c
Compare
thanks @wenyingd!
@jianjuns should we update the default config |
pkg/agent/util/net_linux.go
Outdated
} | ||
|
||
// rename host interface(eth0~ -> eth0) | ||
if err = RenameInterface(bridgedName, interfaceName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here, even interfaceName
does not exist, we still want to rename eth0~ back to eth0.
Maybe we can add a boolean to record if an interface exists or not. Maybe the code is simpler to return at errors, or when bridgedName does not exist.
For IPv6 cluster, I didn't remember we have steps in the code to modify sysctl configurations dedicated for IPv6, so we may require that IPv6 is enabled by default on any Nodes in the cluster if they plan to run IPv6 (including the IPv6 configurations enabled on interface and the network forwarding configurations). We used to hit issues that IPv6 networking forwarding is not enabled by default in our lab env before (not received similar reports from users), and the solution is to manually enable it in the env as it is one-time setting.
|
2f64197
to
ac7192f
Compare
ac7192f
to
69241fb
Compare
/test-all |
@aroradaman : seems there are some Windows build errors. Could you check the failed tests like "Go / Build Antrea Windows binaries"? |
69241fb
to
10ff49d
Compare
/test-all |
There is another error. Check "Go / Golangci-lint (ubuntu-latest)". @aroradaman |
10ff49d
to
94e1327
Compare
|
Could you share more info for me to understand the failure? @aroradaman |
If primary nic is configured with secondary bridge antrea-agent will move primary nic configuration(IPs and Routes) to the secondary bridge, and will revert the change on shutdown. Signed-off-by: Daman Arora <aroradaman@gmail.com>
94e1327
to
a0bbf0c
Compare
I was directly accessing the first ovs bridge of sec net config which was causing panic / index error. secNetConfig.OVSBridges[0].PhysicalInterfaces I thought the validation required at least one bridge for the secondary network config. |
if err != nil { | ||
return err | ||
} | ||
|
||
// We only support moving and restoring of interface configuration to OVS Bridge for the single physical interface case. | ||
if len(secNetConfig.OVSBridges) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed this one when reviewing the code.
/test-all |
Fixes: #5735