Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DHCP IP Retries in PrepareHNSNetwork #5819

Merged
merged 1 commit into from
Jan 2, 2024
Merged

Conversation

XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Dec 21, 2023

To address the potential race condition issue where acquiring a DHCP IP address may fail after CreateHNSNetwork,
we added a retry mechanism to wait for an available IP. If the DHCP IP cannot be acquired within six seconds,
an error will be logged.

@XinShuYang XinShuYang requested a review from wenyingd December 21, 2023 04:09
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang XinShuYang force-pushed the windhcp branch 5 times, most recently from a765a81 to 3aafc50 Compare December 21, 2023 06:07
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

1 similar comment
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a e2e test to verify that vNIC migration is not supposed to modify the original properties.

pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
@XinShuYang XinShuYang force-pushed the windhcp branch 5 times, most recently from 7c919ee to 801139a Compare December 25, 2023 01:20
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang XinShuYang force-pushed the windhcp branch 3 times, most recently from 4b57901 to e912ea2 Compare December 25, 2023 02:48
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
@XinShuYang XinShuYang force-pushed the windhcp branch 6 times, most recently from dcccd93 to 5973c2f Compare December 26, 2023 06:57
ci/jenkins/test.sh Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Outdated Show resolved Hide resolved
@XinShuYang XinShuYang force-pushed the windhcp branch 2 times, most recently from f2b4ca2 to 4bf6142 Compare December 26, 2023 22:03
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

wenyingd
wenyingd previously approved these changes Dec 28, 2023
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang XinShuYang requested a review from tnqn December 28, 2023 06:13
@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

ci/jenkins/test.sh Outdated Show resolved Hide resolved
ci/jenkins/test.sh Outdated Show resolved Hide resolved
pkg/agent/util/net_windows.go Show resolved Hide resolved
if err != nil {
klog.ErrorS(err, "Failed to get Ipv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)
}
klog.Warningf("Timeout acquiring IP for the adapter, DHCP status: %t", dhcpStatus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Use structured logging for new logs, use InfoS if this is something expected and ErrorS if not expected to happen.
  2. Logging dhcpStatus could be confusing when it fails to get its status.

Copy link
Contributor Author

@XinShuYang XinShuYang Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the logic, now we only print dhcpStatus if we successfully retrieve its value from adapter. @tnqn

// Therefore, we set the timeout limit to triple of that value, allowing a maximum wait of 6 seconds here.
err = wait.PollImmediate(1*time.Second, 6*time.Second, func() (bool, error) {
var checkErr error
adapter, ipFound, checkErr = adapterIPExists(nodeIPNet.IP, uplinkAdapter.HardwareAddr, ContainerVNICPrefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Windows support both DHCP and static case? If yes, these retries would add unnecessary initialization delay for static IP case.
I think it should first check if this is DHCP case, and only expects it to get IP from DHCP server when DHCP is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, the "IP delay available issue" occurs during the creation of a new uplink interface, and this delay is consistent regardless of whether DHCP is enabled on the interface. Regarding the static IP case, we still expect to get available IP from adapter timely. Otherwise it may indicate an issue with the adapter itself. @wenyingd please correct me if I am wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For static IP configurations, we still expect Windows OS can automatically migrate the IP address from pnic to vnic. We would like to use this check to ensure that Windows OS has performed expected behavior, and gives warning logs if not.

@XinShuYang XinShuYang force-pushed the windhcp branch 3 times, most recently from f2f6c7a to 7d125ac Compare December 28, 2023 10:06
@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang XinShuYang force-pushed the windhcp branch 2 times, most recently from a737bab to ee8222f Compare December 31, 2023 06:44
@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang XinShuYang requested a review from tnqn January 2, 2024 05:42
tnqn
tnqn previously approved these changes Jan 2, 2024
if err == wait.ErrWaitTimeout {
dhcpStatus, err := InterfaceIPv4DhcpEnabled(uplinkAdapter.Name)
if err != nil {
klog.ErrorS(err, "Failed to get Ipv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.ErrorS(err, "Failed to get Ipv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)
klog.ErrorS(err, "Failed to get IPv4 DHCP status on the network adapter", "adapter", uplinkAdapter.Name)

@@ -647,6 +670,16 @@ func HostInterfaceExists(ifaceName string) bool {
return true
}

// InterfaceIPv4DhcpEnabled returns the Ipv4 DHCP status on the specified interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

In the commit message: s/a warning message will be returned/an error will be logged/

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

No need to rerun tests after addressing the typos

To address the potential race condition issue where acquiring a DHCP IP address may fail after CreateHNSNetwork,
we added a retry mechanism to wait for an available IP. If the DHCP IP cannot be acquired within six seconds,
an error will be logged.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
@XinShuYang
Copy link
Contributor Author

No need to rerun tests after addressing the typos

Got it, PR has been updated.

@tnqn
Copy link
Member

tnqn commented Jan 2, 2024

/skip-all

@tnqn tnqn merged commit 923b429 into antrea-io:main Jan 2, 2024
45 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants