-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unexpected AltName after rename interface #6321
Conversation
/test-e2e |
@@ -378,6 +384,17 @@ func renameHostInterface(oriName string, newName string) error { | |||
return nil | |||
} | |||
|
|||
func removeInterfaceAltName(name string, altName string) error { | |||
link, err := netlinkUtil.LinkByName(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to add check first, if altname exists, and equals to the original name, we shall delete it, otherwise, ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that time delays exist between adding the altname and our deletion on the target altname, e.g., the sequence is like this,
- rename uplink
- try to delete altname (introduced in this change)
- system adds the altname.
If so, the change may not work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that time delays exist between adding the altname and our deletion on the target altname, e.g., the sequence is like this,
- rename uplink
- try to delete altname (introduced in this change)
- system adds the altname.
If so, the change may not work as expected.
AltName will be set very quickly.
If this process get delayed, antrea-agent will create internal port and occupy this name first, and this is not an issue.
303b114
to
58dbcc3
Compare
In this PR we bumped up Do you have any suggestion on this situation? |
What is the behavior change?
That's true of a lot of dependency version updates. If we cannot update to a version of the netlink library that includes support for AltName while not introducing bugs that impact us, we will fork the repository and do our own release. (In that case, we should also open issues upstream as appropriate.) |
Previously the dafault route has
Antrea uses |
go.mod
Outdated
@@ -236,3 +236,5 @@ require ( | |||
sigs.k8s.io/kustomize/kyaml v0.14.3-0.20230601165947-6ce0bf390ce3 // indirect | |||
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect | |||
) | |||
|
|||
replace github.com/vishvananda/netlink => github.com/vishvananda/netlink v0.0.0-20240425164735-856e190dd707 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't it set version in require
directly?
@hongliangl have you figured out whether the new version really affects the e2e test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, still testing it on my forked Antrea repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to change the version in require
directly, but it is changed back to v1.2.1-beta.2
after Goland resynced the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hongliang's PR is merged. No need to bump up netlink now.
pkg/agent/util/net_linux.go
Outdated
// which is also occupied by an interface name, after the interface name changed from generated value to another one, | ||
// the interface altname will be set to previous name immediately. | ||
// This altname must be removed to avoid unexpected conflict. | ||
removeInterfaceAltName(to, from) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss error handling
If this is the only known case, we should adapt the code to it. But I remember @hongliangl mentioned other test failures that may be related to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had started a review yesterday, but didn't submit it...
go.mod
Outdated
@@ -236,3 +236,5 @@ require ( | |||
sigs.k8s.io/kustomize/kyaml v0.14.3-0.20230601165947-6ce0bf390ce3 // indirect | |||
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect | |||
) | |||
|
|||
replace github.com/vishvananda/netlink => github.com/vishvananda/netlink v0.0.0-20240425164735-856e190dd707 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need a replace
directive. Can't you just use the desired "version" in the require
block directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to change the version in require
directly, but it is changed back to v1.2.1-beta.2
after Goland resynced the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hongliang's PR is merged. No need to bump up netlink now.
pkg/agent/util/net_linux.go
Outdated
@@ -290,6 +290,12 @@ func RenameInterface(from, to string) error { | |||
if pollErr != nil { | |||
return fmt.Errorf("failed to rename host interface name %s to %s", from, to) | |||
} | |||
// Fix for the issue https://github.com/antrea-io/antrea/issues/6301. | |||
// In some new Linux versions which support AltName, if there is only one valid value in AlternativeNamesPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by "one valid value" here? In the original issue, I see that AlternativeNamesPolicy=database onboard slot path
. It looks like 4 valid values to me based on https://access.redhat.com/solutions/6964829
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we can set many values here, it may not be valid for an specific interface.
For example, here is the default value in ubuntu 23.10
[Link]
NamePolicy=keep kernel database onboard slot path
AlternativeNamesPolicy=database onboard slot path
If slot
and path
is valid for this interface (it depends on the hardware information), we'll get below name/altname for a new interface. Name is slot
and altname is path
:
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a6:68:79 brd ff:ff:ff:ff:ff:ff
altname enp11s0
If only path
is valid, we'll get below name for a new interface. Name is path
and no altname:
3: enp11s1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a6:68:7A brd ff:ff:ff:ff:ff:ff
If we change the name of above interface, altname will be set to path
immediately. And antrea will failed to create internal OVS port for this case.
3: enp11s1~: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a6:68:7A brd ff:ff:ff:ff:ff:ff
altname enp11s1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adjust it a little:
In some new Linux versions which support AltName, if the only valid altname of the interface is the same as the interface name, it would be left empty when the name is occupied by the interface name; after we rename the interface name to another value, the altname of the interface would be set to the original interface name by the system. This altname must be removed as we need to reserve the name for an OVS internal port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Yes, I always got the same failure from the following test after bumping the lib netlink.
|
Which version you bumped up and which pipeline you get the failure? In this PR, all finished e2e pipelines are passed. |
All e2e tests might pass sometimes, but it seems that the failure is always the same. I saw the same failure from the tests in this PR:
This is from another PR I made: |
43a83cd
to
41a3bbe
Compare
/test-all |
pkg/agent/util/net_linux.go
Outdated
@@ -290,6 +290,12 @@ func RenameInterface(from, to string) error { | |||
if pollErr != nil { | |||
return fmt.Errorf("failed to rename host interface name %s to %s", from, to) | |||
} | |||
// Fix for the issue https://github.com/antrea-io/antrea/issues/6301. | |||
// In some new Linux versions which support AltName, if there is only one valid value in AlternativeNamesPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adjust it a little:
In some new Linux versions which support AltName, if the only valid altname of the interface is the same as the interface name, it would be left empty when the name is occupied by the interface name; after we rename the interface name to another value, the altname of the interface would be set to the original interface name by the system. This altname must be removed as we need to reserve the name for an OVS internal port.
pkg/agent/util/net_linux.go
Outdated
klog.ErrorS(err, "Failed to remove AltName after interface renamed") | ||
return fmt.Errorf("failed to remove AltName %s on interface %s", from, to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.ErrorS(err, "Failed to remove AltName after interface renamed") | |
return fmt.Errorf("failed to remove AltName %s on interface %s", from, to) | |
return fmt.Errorf("failed to remove AltName %s on interface %s: %w", from, to, err) |
don't handle the same error twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com> Co-authored-by: Lan <luola@vmware.com>
/test-all |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gran-vmv could you backport this to 1.14-2.0? |
Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com> Co-authored-by: Lan <luola@vmware.com>
Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com> Co-authored-by: Lan <luola@vmware.com>
Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com> Co-authored-by: Lan <luola@vmware.com>
Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com> Co-authored-by: Lan <luola@vmware.com>
Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com> Co-authored-by: Lan <luola@vmware.com>
…trea-io#6402) Fix antrea-io#6301 Signed-off-by: gran <gran@vmware.com>
Fix #6301