Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No IPv6 routes from BLE IPSP node (NRF52840DK) #25444

Closed
lindemer opened this issue May 19, 2020 · 17 comments
Closed

No IPv6 routes from BLE IPSP node (NRF52840DK) #25444

lindemer opened this issue May 19, 2020 · 17 comments
Assignees
Labels
area: Bluetooth area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale

Comments

@lindemer
Copy link
Collaborator

lindemer commented May 19, 2020

Describe the bug
The sockets/echo_client sample compiled with Bluetooth support for the NRF52840DK is unable to send IPv6 packets to any destination, even the Bluetooth interface it is connected to. However, I am able to assign the device an IPv6 address and ping it from the host by following the guide in the Bluetooth IPSP sample documentation.

Strangely, the device can gain network access only while it is being actively pinged by the host computer (i.e., I can ping the Bluetooth interface and other IPv6 addresses from the Zephyr shell over minicom and connect to the echo server.) Once the ping program on the host has been stopped, the device seems to forget all routes to all IPv6 addresses within 2 seconds.

To reproduce
Steps to reproduce the behavior:

  1. cd zephyr/samples/net/sockets/echo_client
  2. west build -b nrf52840_pca10056 -- -DCONF_FILE="prj.conf overlay-bt.conf"; west flash
  3. modprobe bluetooth_6lowpan
  4. echo "connect <device MAC> 2" > /sys/kernel/debug/bluetooth/6lowpan_control
  5. ping6 -I bt0 ff02::1 (works)
  6. ip address add 2001:db8::2/64 dev bt0 (works)
  7. ping6 2001:db8::3 (works)
  8. minicom -D /dev/ttyACM0 -b 115200
  9. net ping 2001:db8::2 (from Zephyr shell, only works while step 7 is still running)

Expected behavior
Since there is a Bluetooth overlay configuration file in the echo client source code, it is implied that this should work out-of-the-box, but this does not seem to be the case.

Impact
I am unable to continue working until I can get uninterrupted IPv6 access from my BLE IPSP nodes. They need to act as clients contacting external servers, so they cannot wait for an external connection to get network access.

Environment

  • Ubuntu (Linux 5.3.0-51-generic) on VirtualBox
  • Latest Zephyr SDK
  • Zephyr v2.2

Additional information
Running net nbr from the Zephyr shell does show the link to the Bluetooth interface on the host at 2001:db8::2, but the connection goes stale just a few seconds after ping is stopped on the host computer. However, the device is unable to ping that address even before it goes stale.

@lindemer lindemer added the bug The issue is a bug, or the PR is fixing a bug label May 19, 2020
@rlubos
Copy link
Contributor

rlubos commented May 19, 2020

Just as quick feedback - I wonder if the issue is in Zephyr or Linux kernel. I recall having issues with TCP connectivity, when I was using newer Linux kernel (5.4). Once I switched back to 4.14 it worked fine again, so there could be some compatibility issue with newer version - again, not sure on which side, I'm not that deep into BLE IPSP.

@jukkar
Copy link
Member

jukkar commented May 19, 2020

Can you set CONFIG_NET_L2_BT_ZEP1656=y and try again? See explanation of this option in subsys/net/l2/Kconfig

@carlescufi carlescufi added the priority: medium Medium impact/importance bug label May 19, 2020
@lindemer
Copy link
Collaborator Author

Thanks for the quick feedback. I tried again with CONFIG_NET_L2_BT_ZEP1656=y but it doesn't fix the issue.

@jukkar
Copy link
Member

jukkar commented May 20, 2020

Running net nbr from the Zephyr shell does show the link to the Bluetooth interface on the host at 2001:db8::2, but the connection goes stale just a few seconds after ping is stopped on the host computer

Indeed, the stale status should happen after a timeout and not immediately. I wonder if this is similar issue that I was seeing with IPv6 address and prefix timeouts in #22733

@carlescufi
Copy link
Member

@rlubos @jukkar from Slack:

vudentz @carlesc I guess this on the network stack, IPSP does not maintain any routes itself it just adds the link-local IP but that is not what the reporter is using.
carlesc @vudentz the Zephyr network stack?
vudentz Yep, the ip seems to be considered stale for some reason

@rlubos
Copy link
Contributor

rlubos commented May 22, 2020

@carlescufi I did some investigation on this case. I've tested with current master (77946fa) and two Linux kernels: 4.14.179-1 and 5.4.39-1.

I was able to reproduce the spcific case on both kernels I've used. I've tracked the ping message, up to the link layer (BLE IPSP in this case), where I saw that the packet is forwarded to bt_l2cap_chan_send function. Even though no error was returned, the packet does not seem to be transmitted. So I would suspect it's rather a BLE, than a networking layer issue.

It's interesting though, that the packet is not lost in that case. It seems that it's is buffered somewhere (at the BLE layer?). After any activity from the host side (not necesirally a ping, I've observed it could also happen on Neighbor Solicitation message from the host), the packet gets finally transmitted, like on the screenshot below. The ping request from Zephyr, was transmitted immediately after the ping request from host was received.

Screenshot from 2020-05-22 13-43-48

Additionally, not related to this issue, TCP connectivity seems to be broken in 5.4.39-1 (the issue I've mentioned before). It can be easily observed in bluetooth/ipsp example, when trying to establish telnet connection. With 4.14.179-1 it works like a charm, but 5.4.39-1 is not able to establish TCP connection. Similar can be observed with echo_client, when Zephyr initialtes connection. In that case, connection is established, but no further TCP packets are recevied.

@carlescufi
Copy link
Member

@Vudentz and @jukkar could you please comment on @rlubos analysis above?

@carlescufi
Copy link
Member

@Vudentz according to the analysis by @rlubos this seems like an issue in IPSP and not the IP stack.

@Vudentz
Copy link
Contributor

Vudentz commented May 26, 2020

@carlescufi I did some investigation on this case. I've tested with current master (77946fa) and two Linux kernels: 4.14.179-1 and 5.4.39-1.

I was able to reproduce the spcific case on both kernels I've used. I've tracked the ping message, up to the link layer (BLE IPSP in this case), where I saw that the packet is forwarded to bt_l2cap_chan_send function. Even though no error was returned, the packet does not seem to be transmitted. So I would suspect it's rather a BLE, than a networking layer issue.

It's interesting though, that the packet is not lost in that case. It seems that it's is buffered somewhere (at the BLE layer?). After any activity from the host side (not necesirally a ping, I've observed it could also happen on Neighbor Solicitation message from the host), the packet gets finally transmitted, like on the screenshot below. The ping request from Zephyr, was transmitted immediately after the ping request from host was received.

Screenshot from 2020-05-22 13-43-48

Additionally, not related to this issue, TCP connectivity seems to be broken in 5.4.39-1 (the issue I've mentioned before). It can be easily observed in bluetooth/ipsp example, when trying to establish telnet connection. With 4.14.179-1 it works like a charm, but 5.4.39-1 is not able to establish TCP connection. Similar can be observed with echo_client, when Zephyr initialtes connection. In that case, connection is established, but no further TCP packets are recevied.

The packet would be queued if there is no credits to transmit, or perhaps it is waiting the a TX context, perhaps you should enable debugging so we can check why it is not being transmitted immediately.

@carlescufi
Copy link
Member

@rlubos could you enable CONFIG_BT_DEBUG_L2CAP and CONFIG_BT_DEBUG_CONN and post the log please?

@rlubos
Copy link
Contributor

rlubos commented May 26, 2020

Ok adding log file with CONFIG_BT_DEBUG_L2CAP and CONFIG_BT_DEBUG_CONN enabled. Additionally, a wireshark log from the same run (zipped, apparently GH does not allow pcap).

There is a tremendous amount of logs after I've established connection with my host, then after a short pause, I've tried to ping the host (with no reply). After a few seconds, a Router Solicitation message was sent from the host, and only after that the ping showed up in the wireshark (along with the response, this can be observed in the wireshark pcap). I really hope it helps!
ble_ipsp_ping_issue.log
ble_ipsp_ping_issue.zip

@Vudentz
Copy link
Contributor

Vudentz commented May 26, 2020

With ipsp sample and NET_SHELL enabled (BOARD=qemu_x86) it doesn't seem to have this problem:

uart:~$ net ping 2001:db8::2
PING 2001:db8::2
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=0 ttl=64 time=217 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=1 ttl=64 time=81 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=2 ttl=64 time=57 ms
uart:~$ net ping 2001:db8::2
PING 2001:db8::2
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=0 ttl=64 time=98 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=1 ttl=64 time=64 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=2 ttl=64 time=93 ms
uart:~$ net ping 2001:db8::2
PING 2001:db8::2
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=0 ttl=64 time=86 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=1 ttl=64 time=62 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=2 ttl=64 time=93 ms`
uart:~$ net nbr 
     Neighbor   Interface        Flags State     Remain  Link              Address
[ 1] 0x14b2e0 0x14bd00     0/1/0/0 static      0  B8:8A:60:D8:17:D7 fe80::b88a:60ff:fed8:17d7
[ 2] 0x14b328 0x14bd00     0/1/1/0 stale          0  B8:8A:60:D8:17:D7 2001:db8::2

Also it doesn't like there is any calls to L2CAP layer on this attempt:

uart:~$ uart:~$ uart:~$ net ping 2001:db8::1 -c 1
PING 2001:db8::1
Ping timeout

When I try with echo-client I got the same but with echo-server it does works:

uart:~$ net iface 

Interface 0x1592a0 (Bluetooth) [1]
====================================
Link addr : 34:13:E8:B2:78:C9
MTU       : 1280
IPv6 unicast addresses (max 3):
	fe80::3413:e8ff:feb2:78c9 autoconf preferred infinite
	2001:db8::1 manual preferred infinite
IPv6 multicast addresses (max 4):
	ff02::1
IPv6 prefixes (max 2):
	<none>
IPv6 hop limit           : 64
IPv6 base reachable time : 30000
IPv6 reachable time      : 40879
IPv6 retransmit timer    : 0
uart:~$ net nbr 
     Neighbor   Interface        Flags State     Remain  Link              Address
[ 1] 0x158720 0x1592a0     0/1/0/0 static      0  B8:8A:60:D8:17:D7 fe80::b88a:60ff:fed8:17d7
uart:~$ net ping 2001:db8::2
PING 2001:db8::2
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=0 ttl=64 time=221 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=1 ttl=64 time=82 ms
8 bytes from 2001:db8::2 to 2001:db8::1: icmp_seq=2 ttl=64 time=62 ms
uart:~$ net nbr 
     Neighbor   Interface        Flags State     Remain  Link              Address
[ 1] 0x158720 0x1592a0     0/1/0/0 static      0  B8:8A:60:D8:17:D7 fe80::b88a:60ff:fed8:17d7
[ 2] 0x158768 0x1592a0     0/1/0/0 reachable  28079  B8:8A:60:D8:17:D7 2001:db8::2

Perhaps is something related to this:

[00:00:09.600,000] <inf> net_echo_client_sample: Run echo client
[00:00:09.600,000] <inf> net_echo_client_sample: Network connected
[00:00:09.600,000] <inf> net_echo_client_sample: Starting...
[00:00:12.690,000] <err> net_echo_client_sample: Cannot connect to TCP remote (IPv6): 60
[00:00:12.690,000] <inf> net_echo_client_sample: Stopping...

@rlubos is there any reason you were using echo_client instead of echo_server? It doesn't look like echo_client should be used to trigger other traffics since it appears it stops UDP and TCP if it cannot connects, though there could be something else preventing echo_client to ping @jukkar.

@Vudentz
Copy link
Contributor

Vudentz commented May 26, 2020

Something odd, after turning wireshark echo-client just works as well:

echo-client.txt

@rlubos
Copy link
Contributor

rlubos commented May 27, 2020

@rlubos is there any reason you were using echo_client instead of echo_server?

The only reason is that echo_client was used in the original report

It doesn't look like echo_client should be used to trigger other traffics since it appears it stops UDP and TCP if it cannot connects,

The only consequence of the TCP connection failure is that application sockets are closed, and main thread is blocked on a semaphore. It should not affect the network stack in any way.

@rlubos
Copy link
Contributor

rlubos commented May 27, 2020

@Vudentz Unfortunately I cannot confirm that the issue does not show up in other samples. Reproduced on nrf52840dk_nrf52840 in both, bluetooth/ipsp and sockets/echo_server.
Note, that this behavior is not 100% reproducible, the ping stalls only after a few seconds of inactivity from the host side.

@Vudentz
Copy link
Contributor

Vudentz commented May 27, 2020

@Vudentz Unfortunately I cannot confirm that the issue does not show up in other samples. Reproduced on nrf52840dk_nrf52840 in both, bluetooth/ipsp and sockets/echo_server.
Note, that this behavior is not 100% reproducible, the ping stalls only after a few seconds of inactivity from the host side.

Ok, that would explain why it suddenly started working, but it appears the data never reaches the stack when the ping timeouts as we didn't get any logs from L2CAP layer when that happens.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

5 participants