Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

[tcp udp ws] All issues after commit 087938a merged #1521

Closed
cuiyanx opened this issue Sep 6, 2017 · 18 comments
Closed

[tcp udp ws] All issues after commit 087938a merged #1521

cuiyanx opened this issue Sep 6, 2017 · 18 comments

Comments

@cuiyanx
Copy link
Contributor

cuiyanx commented Sep 6, 2017

Description

I test tcp udp ws test case and sample after commit 087938a merged.

Test Case Arduino 101 with ENC28J60 Arduino 101 with BLE FRDM-K64F
UDPEchoServ4.js disconnect N/A bus fault
test-udp4-server.js disconnect N/A bus fault
UDPEchoServ6.js disconnect disconnect bus fault
test-udp6-server.js disconnect disconnect bus fault
test-tcp4-server.js hang N/A hang
test-tcp4-server-DHCP.js N/A N/A hang
test-tcp6-client.js cannot acquire send_buf disconnect pass
test-tcp6-server.js hang hang hang
test-ws4-server-dhcp.js N/A N/A payload error
test-ws6-server.js disconnect disconnect disconnect

Test Code

Steps to Reproduction

Actual Result

Expected Result

Work well.

Test Builds

Branch Commit Id Target Device Test Date Result
master 087938a Arduino 101 Sep 6, 2017 Fail
master 087938a FRDM-K64F Sep 6, 2017 Fail

Additional Information

@grgustaf
Copy link
Contributor

grgustaf commented Sep 7, 2017

That is one very depressing chart. :) Hopefully #1522 will have helped somewhat.

@grgustaf
Copy link
Contributor

grgustaf commented Sep 8, 2017

OK, testing with test-tcp4-server.js on K64F... here's what I see, and I suspect this is common across your 'hang' scenarios. If you boot up the K64F it passes most of the initial tests and starts waiting. However, if you try PINGING the device at 192.0.2.1 it will not respond immediately. Leave ping running though and eventually it starts working. When I start ping just as I hit the reset button on K64F, I've seen it take 14, 27, 29, 42, as many as 43 tries before ping starts working.

PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=43 ttl=64 time=0.872 ms

(E.g. above the first icmp_seq was 43.)

But after ping starts working... it's really live and launching the client python script WILL work, to some extent at least.

Note, while ping is not working there's really nothing we can do, we've made our calls to Zephyr already and we're just waiting for either the Zephyr driver or the hardware to start working. So there could be a bug in Zephyr here, I'm not sure, but there's nothing we can really do about this I think.

So please test this kind of thing where you're using static IPs. I seem to see this kind of delayed functioning behavior specifically on K64F and specifically when I use static IP. It seems like w/ DHCP it starts working as soon as it claims to have a DHCP address; maybe that's because it's actually been on the network already by definition in that case. I do see like 10s delay usually before DHCP returns. This is all just my "feeling" after hundreds of cycles of development in the last few weeks, I haven't scientifically researched the point.

See if that doesn't make a difference in the kind of results you get. I at least get Got data; hello / Send data: hello, then "time out" when I wait for ping to succeed, in the case mentioned above.

I'll plan to look at more cells of your matrix after lunch.

@grgustaf
Copy link
Contributor

grgustaf commented Sep 8, 2017

Note, that problem seems specific to K64F. On the A101 ping seems to start working almost immediately. If I start ping right after hitting the reset button, it takes about 6 seconds before ping starts to respond - which is mostly the 5s dfu bootloader timeout.

@grgustaf
Copy link
Contributor

grgustaf commented Sep 8, 2017

I managed to reproduce some hangs and those were deadlocks caused by improper locking of callbacks. At least some of those should be fixed by commit ed1a9de in PR #1523

@grgustaf
Copy link
Contributor

grgustaf commented Sep 9, 2017

So one deadlock at least was in test-tcp4-server.js... it would stop working after the first "hello" packet, but with that latest patch in #1523 it now runs to completion. Please retest the other hang scenarios to see if they were improved.

@grgustaf
Copy link
Contributor

I'm not sure what you meant by "disconnect" on test-ws6-server.js w/ K64F, but when I investigated, it was getting a "payload too large" error there too. The error wasn't showing up outside debug mode because it was happening during the handshake, before the socket had been returned to JS. @jprestwo has fixed that with #1524. That should fix a few of the WS bugs seen here.

@grgustaf
Copy link
Contributor

Confirmed test-ws6-server.js works for me in all three scenarios.

@grgustaf
Copy link
Contributor

Confirmed test-ws4-server-dhcp.js works for me on K64F.

@grgustaf
Copy link
Contributor

Confirmed test-tcp6-server.js works for me in all three scenarios. (In each case 54 of 71 subtests passed FWIW.)

@grgustaf
Copy link
Contributor

Confirmed that test-tcp6-client.js works on K64F, at least some of the time on A101 w/ Ethernet (sometimes the "Got data:" on the Linux side is all blank), doesn't seem to work on BLE, looking into that now.

@grgustaf
Copy link
Contributor

For the BLE one, a packet trace shows Zephyr 8484 send a SYN to linux 9876, Linux answers back with SYN/ACK but for some reason Zephyr responds with ICMP port unreachable. Seems like it must be a Zephyr bug, but no idea why this would be only in the BLE case. I will probably back out of this apparently painful one and continue surveying the other bugs. (I'm going bottom to top for some reason if that's not clear.)

@cuiyanx
Copy link
Contributor Author

cuiyanx commented Sep 12, 2017

@grgustaf I test those issues again after commit 2a71022 merged.Just only one test case is not passed. test-tcp6-client.js on a101 with ENC28J60 will throw out error cannot acquire send_buf and on a101 with BLE will can not connect server, but on k64 is work well.

@grgustaf grgustaf self-assigned this Sep 12, 2017
@grgustaf
Copy link
Contributor

Confirmed both test-tcp4-server.js and test-tcp4-client.js worked for me on both K64F and A101+Eth. (Though client on A101 first reports a socket error before retrying and working - I think we had a bug on this.)

@grgustaf
Copy link
Contributor

OK, I tested all the UDP samples and all the valid cases seem to work fine for me too. So I think we have a reasonable candidate for 0.4 and maybe you could open individual bugs for the remaining issues you see.

@grgustaf
Copy link
Contributor

Note, I solved the little mystery about why I couldn't connect to the K64F for a while after rebooting w/ a static IP. This was because it changes its MAC address on every reboot (at least mine does). The Linux box has an ARP (or neighbor solicitation for IPv6) cache that remembers the MAC associated with the address 192.0.2.1. So it keeps trying to go to the old MAC address until the ARP cache timeout occurs (default 60 seconds). Then finally it will broadcast to find the new owner of the IP address.

So that issue is not ZJS's fault - it's the fault of the unnatural situation of changing MAC address frequently. I'm not sure why the K64F does that, maybe it can be configured.

@cuiyanx
Copy link
Contributor Author

cuiyanx commented Sep 13, 2017

@grgustaf the issue of test case test-tcp4-client.js looks like #1187 .

@cuiyanx
Copy link
Contributor Author

cuiyanx commented Sep 13, 2017

@grgustaf I think it certainly can be configured if it is a virtual MAC address. By the way, I learn much, thank you.

@cuiyanx
Copy link
Contributor Author

cuiyanx commented Sep 13, 2017

@grgustaf The issue of test-tcp6-client.js is the same to #1518 . So, I close this issue and follow that issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants