Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing lua luv tests in build environments #539

Closed
cryptomilk opened this issue Apr 19, 2021 · 16 comments · Fixed by #540, #541 or #543
Closed

Failing lua luv tests in build environments #539

cryptomilk opened this issue Apr 19, 2021 · 16 comments · Fixed by #540, #541 or #543
Labels

Comments

@cryptomilk
Copy link
Contributor

Hi,

test in tests/test-udp.lua seems to fail in build environments with EPERM with limited network. The fedora build hosts don't have internet access, so don't have any network configuration. Thus luv tests are failing.

See e.g.
https://kojipkgs.fedoraproject.org//work/tasks/949/66240949/build.log

I would comment them out for now.

Maybe https://cwrap.org/socket_wrapper.html would help to improve network testing :-)

@squeek502
Copy link
Member

squeek502 commented Apr 19, 2021

This seems like it might be slightly more than just not having internet access. Those tests pass for me without internet access on Windows, but I haven't tested Linux yet.

The IP used in the test that's failing is "239.255.0.1" which is an administratively scoped multicast address, which I don't think relies on external networking (but I'm not very familiar with multicast).

Could you try changing this line to use "224.0.0.255" (a multicast IP in the 'local network control block') instead of "239.255.0.1"?

EDIT: Also worth noting that this test was ported from libuv's test-udp-multicast-join.c. Is that test run on the fedora build hosts?

EDIT#2: On Linux, with no network access, I'm hitting this conditional and the test is being skipped (the ipv6 in the 'skipping' message is erroneous):

luv/tests/test-udp.lua

Lines 180 to 184 in b85b9ef

local _, err, errname = uv.udp_set_membership(server, multicast_addr, interface_addr, "join")
if errname == "ENODEV" then
print("no ipv6 multicast route, skipping")
server:close()
return

@squeek502
Copy link
Member

One possible solution that would probably work would be to generalize this:

luv/tests/test-udp.lua

Lines 246 to 256 in b85b9ef

local function can_ipv6_external()
local addresses = assert(uv.interface_addresses())
for _, vals in pairs(addresses) do
for _, info in ipairs(vals) do
if info.family == "inet6" and not info.internal then
return true
end
end
end
return false
end

to be able to test for both ipv4 and ipv6, and then skip the test if there is no ipv4/ipv6 external network interfaces found.

@cryptomilk
Copy link
Contributor Author

I've tried using "224.0.0.255" instead but the test still failed:

https://koji.fedoraproject.org/koji/taskinfo?taskID=66321473

@squeek502
Copy link
Member

squeek502 commented Apr 20, 2021

Could you test the branch in #540 and let me know if that fixes it for you?

@cryptomilk
Copy link
Contributor Author

This doesn't fix it:
https://koji.fedoraproject.org/koji/taskinfo?taskID=66386014 see build.log

@cryptomilk
Copy link
Contributor Author

This is the network configuration of the build root:

+ /usr/sbin/ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:fc:46:75 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    inet 10.3.169.76/24 brd 10.3.169.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fefc:4675/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:d7:75:d8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:d7:75:d8 brd ff:ff:ff:ff:ff:ffh

@squeek502
Copy link
Member

squeek502 commented Apr 21, 2021

Thanks for testing that out. Unfortunately I'm not familiar enough with this stuff to understand what could be happening or how to debug it.

Strangely, I did notice that some of the other Libuv multicast tests actually do allow for EPERM in their sends:

https://github.com/libuv/libuv/blob/47e0c5c575e92a25e0da10fc25b2732942c929f3/test/test-udp-multicast-interface.c#L47

https://github.com/libuv/libuv/blob/47e0c5c575e92a25e0da10fc25b2732942c929f3/test/test-udp-multicast-ttl.c#L47

But I'm not sure why that would be, or if there's a reason the multicast-join doesn't allow EPERM in the same way. Plus, I'm not sure what's causing EPERM in the first place.

Will need to look into this further. Some way to reproduce EPERM locally would be ideal.

EDIT: Here's the PR that added the allowed EPERM to some of the multicast tests: libuv/libuv#1689

Because this can happen on some build infrastructures.

Much more cryptic than I'd like. 😞

@squeek502
Copy link
Member

squeek502 commented Apr 22, 2021

Ok, I finally understand more about what's happening here. From the comment added in #541:

EPERM here likely means that a firewall has denied the send, which can happen in some build/CI environments, e.g. the Fedora build system. Reproducible on Linux with iptables by doing:

iptables --policy OUTPUT DROP
iptables -A OUTPUT -s 127.0.0.1 -j ACCEPT

and for ipv6:

ip6tables --policy OUTPUT DROP
ip6tables -A OUTPUT -s ::1 -j ACCEPT

@squeek502
Copy link
Member

#541 should fix this for you. Let us know if you run into any more problems.

@cryptomilk
Copy link
Contributor Author

Now the test hanging:

ok 104 udp - udp connect
  send to multicast ip was likely denied by firewall, skipping
ok 105 udp - udp multicast join ipv4

https://koji.fedoraproject.org/koji/taskinfo?taskID=66526614

@squeek502 squeek502 reopened this Apr 23, 2021
@squeek502
Copy link
Member

squeek502 commented Apr 23, 2021

Hm, I wonder if the firewall is dropping the incoming packets in the IPv6 case, meaning that the send succeeds but the recv never gets any input...? If that's the case, I'm not quite sure how to deal with that.

Is it possible to get the firewall settings for the build server? Or is there an easy way to reproduce the build server's environment locally?

@squeek502
Copy link
Member

squeek502 commented Apr 28, 2021

Confirmed that this behavior can happen if the firewall is dropping incoming ipv6 packets from outside localhost so I'm assuming that's what's going on, e.g.

ip6tables --policy INPUT DROP
ip6tables -A INPUT -s ::1 -j ACCEPT

I'm not sure what do here, since altering the test to be skipped in this scenario feels a bit too special-case-y to me, or, at least, any workaround that I can think of would obscure real failures (a timeout before skipping, etc).

Any suggestions @cryptomilk?

@dibyendumajumdar
Copy link

dibyendumajumdar commented May 7, 2021

Hi,
I am running the tests in RHEL 8. It hangs when it gets to the multicast test.
This is not a CI env - just regular dev machine.

@squeek502
Copy link
Member

squeek502 commented May 8, 2021

@dibyendumajumdar would you be willing to build Libuv and its tests and then try running:

uv_run_tests udp_multicast_join

and

uv_run_tests udp_multicast_join6

I'm curious if this is specific to luv or if the same thing would happen in the libuv tests that ours are based on.

EDIT: Finally got a Fedora VM setup and am able to reproduce the hang (ipv4 udp multicast test is hanging for me). Will be able to better investigate this myself now.

EDIT#2: The udp_multicast_join Libuv tests also fail with a timeout:

# ./uv_run_tests udp_multicast_join
not ok 1 - udp_multicast_join
# timeout

@squeek502
Copy link
Member

squeek502 commented May 8, 2021

Ok, the firewalld defaults for Fedora/RHEL seem to be rejecting incoming packets from UDP multicast addresses. I was able to do the following to get the tests to pass (there is probably a better way, I've never worked with firewalld):

firewall-cmd --add-rich-rule 'rule family="ipv4" destination address="239.255.0.1" accept' --permanent
firewall-cmd --add-rich-rule 'rule family="ipv6" destination address="ff02::1" accept' --permanent
firewall-cmd --reload

I think we have to treat this as a real failure, though, so the best fix I can think to do would be to add a timeout to the test and then fail the test if it times out. Any input is welcome; I'm not familiar enough with this stuff to know what all the options are here.

squeek502 added a commit to squeek502/luv that referenced this issue May 8, 2021
This is one way to close luvit#539.

With the default settings of firewalld (used by Fedora, RHEL, etc), incoming messages from multicast IPs are dropped, meaning that the multicast test will hang forever. This introduces a 1 second timeout, after which the test will fail.
@dibyendumajumdar
Copy link

dibyendumajumdar commented May 8, 2021

Hi @squeek502

I tried building libuv and running the test you mention.

not ok 1 - udp_multicast_join
# timeout
# Output from process `udp_multicast_join`: (no output)

So the libuv test times out rather than hanging.

squeek502 added a commit that referenced this issue May 10, 2021
This is one way to close #539.

With the default settings of firewalld (used by Fedora, RHEL, etc), incoming messages from multicast IPs are dropped, meaning that the multicast test will hang forever. This introduces a 1 second timeout, after which the test will fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants