Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky test] TestVLANNetwork failing #6023

Closed
antoninbas opened this issue Feb 22, 2024 · 6 comments · Fixed by #6041
Closed

[Flaky test] TestVLANNetwork failing #6023

antoninbas opened this issue Feb 22, 2024 · 6 comments · Fixed by #6041
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
I have observed the following test failure.

=== RUN   TestVLANNetwork
2024/02/22 21:44:33 Waiting for all Antrea DaemonSet Pods
2024/02/22 21:44:34 Checking CoreDNS deployment
    fixtures.go:260: Creating 'testvlannetwork-m9lwaxz2' K8s Namespace
=== RUN   TestVLANNetwork/testCreateTestPodOnNode
=== RUN   TestVLANNetwork/testpingBetweenInterfaces
    secondary_network_test.go:194: Error when pinging between interfaces: interface eth1 not found on vlan-pod2. err: <nil>
=== NAME  TestVLANNetwork
    fixtures.go:333: Exporting test logs to '/home/runner/work/antrea/antrea/log/TestVLANNetwork/beforeTeardown.Feb22-21-44-38'
    fixtures.go:504: Deleting 'testvlannetwork-m9lwaxz2' K8s Namespace
time="2024-02-22T21:44:39Z" level=info msg="Deleting Namespace testvlannetwork-m9lwaxz2 took 3.579738ms"
--- FAIL: TestVLANNetwork (5.36s)
    --- PASS: TestVLANNetwork/testCreateTestPodOnNode (0.02s)
    --- FAIL: TestVLANNetwork/testpingBetweenInterfaces (3.11s)
FAIL
FAIL	antrea.io/antrea/test/e2e-secondary-network	6.942s
FAIL
@antoninbas antoninbas added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Feb 22, 2024
@prakrit55
Copy link
Contributor

Hey @antoninbas, wd like to know more how did you produce it. I tried, it has shown different errors on --- FAIL: TestSriovNetwork

@antoninbas
Copy link
Contributor Author

TestSriovNetwork cannot be run locally.

This failure was observed in CI. You should be able to reproduce it locally by running ./ci/kind/test-secondary-network-kind.sh, but it may take a few tries. ./ci/kind/test-secondary-network-kind.sh will create a test Kind cluster with the correct configuration, then run TestVLANNetwork.

@shikharish
Copy link
Contributor

@antoninbas I tried running it locally a few times and it passes ok.

=== RUN   TestVLANNetwork
2024/02/27 18:37:23 Waiting for all Antrea DaemonSet Pods
2024/02/27 18:37:24 Checking CoreDNS deployment
    fixtures.go:260: Creating 'testvlannetwork-8cl1p588' K8s Namespace
=== RUN   TestVLANNetwork/testCreateTestPodOnNode
=== RUN   TestVLANNetwork/testpingBetweenInterfaces
time="2024-02-27T18:37:30+05:30" level=info msg="ping 'vlan-pod1' -> 'vlan-pod2'( Interface: eth1, IP Address: 148.14.24.3): OK"
time="2024-02-27T18:37:36+05:30" level=info msg="ping 'vlan-pod1' -> 'vlan-pod3'( Interface: eth1, IP Address: 148.14.25.111): OK"
time="2024-02-27T18:37:41+05:30" level=info msg="ping 'vlan-pod2' -> 'vlan-pod1'( Interface: eth1, IP Address: 148.14.24.2): OK"
time="2024-02-27T18:37:50+05:30" level=info msg="ping 'vlan-pod3' -> 'vlan-pod1'( Interface: eth2, IP Address: 148.14.25.112): OK"
=== NAME  TestVLANNetwork
    fixtures.go:504: Deleting 'testvlannetwork-8cl1p588' K8s Namespace
time="2024-02-27T18:37:52+05:30" level=info msg="Deleting Namespace testvlannetwork-8cl1p588 took 13.930763ms"
--- PASS: TestVLANNetwork (28.92s)
    --- PASS: TestVLANNetwork/testCreateTestPodOnNode (0.02s)
    --- PASS: TestVLANNetwork/testpingBetweenInterfaces (27.88s)
PASS
2024/02/27 18:37:52 Removing empty logs directory '/tmp/antrea-test-455031965'
ok  	antrea.io/antrea/test/e2e-secondary-network	36.334s

@prakrit55
Copy link
Contributor

hey @shikharish, its actually failing in ./ci/kind/test-secondary-network-kind.sh

@antoninbas
Copy link
Contributor Author

@shikharish Looking at the Github Actions history for Kind jobs on the main branch, I see that it failed twice out of the last 3 runs:

However, these failures are not always straightforward to reproduce locally.

antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 1, 2024
For VLAN Secondary Network e2e tests, we need to wait for secondary IPs
to be available. Interface creation can take up to a few seconds after
the Pod is in the Running state, as it happens asynchronously. Otherwise
we can get the following error:

secondary_network_test.go:194: Error when pinging between interfaces: interface eth1 not found on vlan-pod2. err: <nil>

Fixes antrea-io#6023

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas self-assigned this Mar 1, 2024
@antoninbas
Copy link
Contributor Author

I got lucky, I was able to reproduce on my first try :) I submitted a PR to fix it

antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 1, 2024
For VLAN Secondary Network e2e tests, we need to wait for secondary IPs
to be available. Interface creation can take up to a few seconds after
the Pod is in the Running state, as it happens asynchronously. Otherwise
we can get the following error:

secondary_network_test.go:194: Error when pinging between interfaces: interface eth1 not found on vlan-pod2. err: <nil>

Fixes antrea-io#6023

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Mar 1, 2024
For VLAN Secondary Network e2e tests, we need to wait for secondary IPs
to be available. Interface creation can take up to a few seconds after
the Pod is in the Running state, as it happens asynchronously. Otherwise
we can get the following error:

secondary_network_test.go:194: Error when pinging between interfaces: interface eth1 not found on vlan-pod2. err: <nil>

Fixes #6023

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants