Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bsim mesh establish_multi.sh doesn't send data for one of devices #48514

Closed
alxelax opened this issue Aug 1, 2022 · 9 comments
Closed

bsim mesh establish_multi.sh doesn't send data for one of devices #48514

alxelax opened this issue Aug 1, 2022 · 9 comments
Assignees
Labels
area: Bluetooth Mesh bug The issue is a bug, or the PR is fixing a bug platform: nRFx BSIM nrf5x_bsim priority: low Low impact/importance bug

Comments

@alxelax
Copy link
Collaborator

alxelax commented Aug 1, 2022

Describe the bug
The bug is reproduce only if extended advertiser is chosen for mesh (the legacy adv works well). Test scenario includes 6 emulated devices: 1 mesh friend node and 5 mesh low power nodes. Test should start all devices simultaneously. 5 low power nodes should establish friendship with friend node in one turn.
Wrong external behavior: 4th node always cannot establish friendship. It doesn't matter in which sequence emulated low power nodes started their Friend request frames. 4th node may send Friend Request the first one, in the middle or the last one (the randomness is provided by babblesim engine) but result is always the same all emulated nodes can establish friendship but the 4th node cannot.

Test passes only if there are 3 low power nodes if there are 4 or 5 nodes the 4th node is always failed.

if verbosity level is set at 7 (maximum?) the babblesim engine shows that all emulated low power nodes requested sending frame to emulated radio and got access to it. Even more, 4th node behaves itself as another nodes regarding reported statuses from Bluetooth controller and reports successful frame sending. The friend node doesn't see this frame.

If one more additional emulated device that reproduces BLE scanner to run in parallel it cannot see 4th node frames either.
Seems 4th device actually doesn't emit frame in emulated radio channel.

The same issue is reproduced on two different PCs (no influence of PC performance).

Preliminary opinion:
It seems babblesim engine has bug during frame sending in emulated radio channel. Babblesim model and modem are default ones.

To Reproduce
Steps to reproduce the behavior:

  1. checkout Zephyr main
  2. set option CONFIG_BT_EXT_ADV=y in https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/bluetooth/bsim_bt/bsim_test_mesh/prj.conf
  3. build babble sim mesh tests (it is possible to build only this: https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/bluetooth/bsim_bt/compile.sh#L56)
  4. run full set bsim mesh tests or only establish_multi.sh scenario
  5. See error in shown log. Issue looks like the 4th device cannot establish friendship due to absence relevant offer from friend node.

Expected behavior
establish_multi.sh should pass for both advertisers extended and legacy ones.

Impact
Not possible to make extended advertiser as default one and pass Zephyr's CI. The legacy advertiser has very poor performance from mesh relay feature point of view.

Logs and console output
```
alkh-local@alkh-linux:~/ncs/zephyr/tests/bluetooth/bsim_bt/bsim_test_mesh/tests_scripts/friendship$ bash establish_multi.sh
Starting friendship_friend_est_multi as device #0
Starting friendship_lpn_est as device #1
Starting friendship_lpn_est as device #2
Starting friendship_lpn_est as device #3
Starting friendship_lpn_est as device #4
Starting friendship_lpn_est as device #5
Starting phy with 6 devices
d_00: @00:00:00.000000 *** Booting Zephyr OS build zephyr-v3.1.0-2520-g702b6843669d ***
d_01: @00:00:00.000000 *** Booting Zephyr OS build zephyr-v3.1.0-2520-g702b6843669d ***
d_02: @00:00:00.000000 *** Booting Zephyr OS build zephyr-v3.1.0-2520-g702b6843669d ***
d_03: @00:00:00.000000 *** Booting Zephyr OS build zephyr-v3.1.0-2520-g702b6843669d ***
d_04: @00:00:00.000000 *** Booting Zephyr OS build zephyr-v3.1.0-2520-g702b6843669d ***
d_05: @00:00:00.000000 *** Booting Zephyr OS build zephyr-v3.1.0-2520-g702b6843669d ***
d_00: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
d_00: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Variant: nRF52x (0x0002)
d_00: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.1 Build 99
d_00: @00:00:00.005768 [00:00:00.005,767] bt_id: No static addresses stored in controller
d_01: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
d_01: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Variant: nRF52x (0x0002)
d_01: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.1 Build 99
d_01: @00:00:00.005768 [00:00:00.005,767] bt_id: No static addresses stored in controller
d_00: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: Identity: C5:AD:ED:38:00:B2 (random)
d_00: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x0059
d_00: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
d_00: @00:00:00.006488 [00:00:00.006,469] mesh_test: Bluetooth initialized
d_00: @00:00:00.006488 [00:00:00.006,469] mesh_test: Mesh initialized
d_00: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: Primary Element: 0x0001
d_00: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
d_01: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: Identity: FF:21:3D:E5:B4:CC (random)
d_01: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x0059
d_01: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
d_01: @00:00:00.006488 [00:00:00.006,469] mesh_test: Bluetooth initialized
d_01: @00:00:00.006488 [00:00:00.006,469] mesh_test: Mesh initialized
d_01: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: Primary Element: 0x0004
d_01: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
d_02: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
d_02: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Variant: nRF52x (0x0002)
d_02: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.1 Build 99
d_03: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
d_03: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Variant: nRF52x (0x0002)
d_02: @00:00:00.005768 [00:00:00.005,767] bt_id: No static addresses stored in controller
d_03: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.1 Build 99
d_04: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
d_04: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Variant: nRF52x (0x0002)
d_03: @00:00:00.005768 [00:00:00.005,767] bt_id: No static addresses stored in controller
d_04: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.1 Build 99
d_05: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
d_05: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: HW Variant: nRF52x (0x0002)
d_05: @00:00:00.005768 [00:00:00.005,767] bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.1 Build 99
d_04: @00:00:00.005768 [00:00:00.005,767] bt_id: No static addresses stored in controller
d_02: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: Identity: D2:B4:48:CE:BC:8A (random)
d_02: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x0059
d_02: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
d_02: @00:00:00.006488 [00:00:00.006,469] mesh_test: Bluetooth initialized
d_03: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: Identity: C4:C9:B5:FE:AB:8A (random)
d_03: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x0059
d_03: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
d_03: @00:00:00.006488 [00:00:00.006,469] mesh_test: Bluetooth initialized
d_02: @00:00:00.006488 [00:00:00.006,469] mesh_test: Mesh initialized
d_02: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: Primary Element: 0x0005
d_02: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
d_03: @00:00:00.006488 [00:00:00.006,469] mesh_test: Mesh initialized
d_03: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: Primary Element: 0x0006
d_03: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
d_05: @00:00:00.005768 [00:00:00.005,767] bt_id: No static addresses stored in controller
d_04: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: Identity: C6:9F:2E:65:E7:0C (random)
d_04: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x0059
d_04: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
d_04: @00:00:00.006488 [00:00:00.006,469] mesh_test: Bluetooth initialized
d_04: @00:00:00.006488 [00:00:00.006,469] mesh_test: Mesh initialized
d_04: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: Primary Element: 0x0007
d_04: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
d_05: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: Identity: E5:C2:AD:72:9E:F2 (random)
d_05: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x0059
d_05: @00:00:00.006488 [00:00:00.006,469] bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
d_05: @00:00:00.006488 [00:00:00.006,469] mesh_test: Bluetooth initialized
d_05: @00:00:00.006488 [00:00:00.006,469] mesh_test: Mesh initialized
d_05: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: Primary Element: 0x0008
d_05: @00:00:00.006488 [00:00:00.006,469] bt_mesh_main: bt_mesh_provision: net_idx 0x0000 flags 0x00 iv_index 0x0000
d_01: @00:00:00.308898 [00:00:00.308,868] test_friendship: LPN: Polling 0x0001 (initial)
d_00: @00:00:00.311379 [00:00:00.311,370] test_friendship: Friend: Poll from 0x0004
d_00: @00:00:00.311379 [00:00:00.311,370] test_friendship: Friend: established with 0x0004
d_02: @00:00:00.331743 [00:00:00.331,726] test_friendship: LPN: Polling 0x0001 (initial)
d_00: @00:00:00.334581 [00:00:00.334,564] test_friendship: Friend: Poll from 0x0005
d_00: @00:00:00.334581 [00:00:00.334,564] test_friendship: Friend: established with 0x0005
d_03: @00:00:00.354204 [00:00:00.354,187] test_friendship: LPN: Polling 0x0001 (initial)
d_00: @00:00:00.356698 [00:00:00.356,689] test_friendship: Friend: Poll from 0x0006
d_00: @00:00:00.356698 [00:00:00.356,689] test_friendship: Friend: established with 0x0006
d_05: @00:00:00.377048 [00:00:00.377,044] test_friendship: LPN: Polling 0x0001 (initial)
d_00: @00:00:00.379212 [00:00:00.379,211] test_friendship: Friend: Poll from 0x0008
d_00: @00:00:00.379212 [00:00:00.379,211] test_friendship: Friend: established with 0x0008
d_01: @00:00:00.461606 [00:00:00.461,578] bt_mesh_lpn: Friendship established with 0x0001
d_01: @00:00:00.461606 [00:00:00.461,578] test_friendship: LPN: established with 0x0001
d_02: @00:00:00.485043 [00:00:00.485,015] bt_mesh_lpn: Friendship established with 0x0001
d_02: @00:00:00.485043 [00:00:00.485,015] test_friendship: LPN: established with 0x0001
d_03: @00:00:00.508834 [00:00:00.508,819] bt_mesh_lpn: Friendship established with 0x0001
d_03: @00:00:00.508834 [00:00:00.508,819] test_friendship: LPN: established with 0x0001
d_05: @00:00:00.530527 [00:00:00.530,517] bt_mesh_lpn: Friendship established with 0x0001
d_05: @00:00:00.530527 [00:00:00.530,517] test_friendship: LPN: established with 0x0001
d_04: @00:00:01.155274 [00:00:01.155,273] bt_mesh_lpn: No acceptable Friend Offers received
d_04: @00:00:04.891602 ERROR: (WEST_TOPDIR/zephyr/tests/bluetooth/bsim_bt/bsim_test_mesh/src/test_friendship.c:393): evt_wait(LPN_ESTABLISHED, K_SECONDS(5)) failed with error -11
d_04: @00:00:04.891602 main: The TESTCASE FAILED with return code 2
d_00: @00:00:04.869426 main: The TESTCASE FAILED with return code 1
d_01: @00:00:00.715821 main: The TESTCASE FAILED with return code 1
d_03: @00:00:00.760743 main: The TESTCASE FAILED with return code 1
d_02: @00:00:00.738282 main: The TESTCASE FAILED with return code 1
d_05: @00:00:00.784180 main: The TESTCASE FAILED with return code 1
```

Environment (please complete the following information):

  • OS: Linux Ubuntu 20.04.4 LTS 64-bit
  • Platform: nrf52_bsim
  • Toolchain: Zephyr SDK, BabbleSim
  • Commit: main
@alxelax alxelax added the bug The issue is a bug, or the PR is fixing a bug label Aug 1, 2022
@alxelax alxelax changed the title bsim mesh establish_multi.sh doesn't send data for one of devices though it reports data has been sent successfully bsim mesh establish_multi.sh doesn't send data for one of devices Aug 1, 2022
@alxelax alxelax added the platform: nRFx BSIM nrf5x_bsim label Aug 1, 2022
@mmahadevan108 mmahadevan108 added the priority: low Low impact/importance bug label Aug 2, 2022
@aescolar
Copy link
Member

aescolar commented Aug 3, 2022

@alxelax can you please check in the Phy radio logs to see what is actually sent by each device and when each other device tries to receive and what it receives.

@alxelax
Copy link
Collaborator Author

alxelax commented Aug 3, 2022

@alxelax can you please check in the Phy radio logs to see what is actually sent by each device and when each other device tries to receive and what it receives.

Yep, sure. @aescolar could you provide more information how to collect these logs? Do I need to add some options when I build tests or to run it with specific options?

@aescolar
Copy link
Member

aescolar commented Aug 3, 2022

@alxelax
The Phy can dump all radio activity to files. You control this with the -dump and -nodump command line options. This radio activity can be easily imported for analysis into the Ellisys Bluetooth Analyzer SW.
The results for each simulation end up in the results/<sim_id> folder
You don't need to import the simulation activity into the Ellisys, you can just look into the raw csv files if you want. (And the ellisys does not show you Rx activity , that you can only see in the corresponding csv files )

@alxelax
Copy link
Collaborator Author

alxelax commented Aug 5, 2022

logs.zip
These are logs of tests. Archive includes all csv files for establish_multi.sh test. Additionally I put test log as test_log.txt file.
It is possible to see in d_2G4_04.Tx.csv that 4th device sends some frames (this is broadcasted Friend Request).
Also it is possible to see in d_2G4_00.Rx.csv that there are no these frames. Even more it is possible to see from test_log.txt and d_2G4_00.Rx.csv that bt_mesh_scan_cb returns what d_2G4_00.Rx.csv consists.
As a consequence 0th device (friend node) doesn't hear 4th device (lpn's node Friend requests)

@aescolar
Copy link
Member

aescolar commented Aug 5, 2022

Please check in the Rx file if the receiver is open at times when it could possibly receive those device 4 transmissions. Also please check that it is not receiving somebody else's packet at the time.

@alxelax
Copy link
Collaborator Author

alxelax commented Aug 5, 2022

Scanner in Mesh works from the BLE Mesh initialization with duty cycle about 100%. Rx should be open to the time of friendship establishment (according to timestamps from log files it is open).
Regarding receiving somebody else's frame it seems very strange. Looking at Tx log and start stop time it is possible to see Friend Request takes 368 (time units) always.
Looking at Rx log I see time diff (rx_time_stamp - start_time ) for receiving 368 tu length frame it spent 12645 tu !!!
Then Rx spent 16662 tu fro receiving something that seems like nothing (or phy mistake ???)
Rx spends again enormous long time for Rx 6070, 1487, 2369 tu.
If compare with Tx time they are crossover-ed but why rx spends enormous time .

@aescolar
Copy link
Member

aescolar commented Aug 7, 2022

Hi @alxelax , I took a look at the logs:
The first 3 advertisement (transmissions) from device 4 overlap exactly in time and frequency with the first 3 advertisements from device 1. Device 0 happened to receive device 1 instead of 4(^1).
The 4th advertisement from 4 happens while device 0 is listening in another frequency, in which it receives device 1 again.
The 5th advertisement from 4 happens while device 0 is still receiving the previous packet from device 1.
The 6th advertisement from 4 happens while device 0 is listening in another frequency, in which it receives device 5.
Device 4 does not advertise anymore.

There does not seem to be any issues in the Phy.
Device 4 just happens to try transmitting too few times given how busy device 0 is. This seems like a test case issue, not a simulation issues.


(^1) The default Phy modem is the magical one, which can receive any transmission irrespectively of received power level or interference, but will still only receive one packet at a time.

Note that in the Rx logs, the start_time is when the device opens its receiver listening for possible packets. It can be open for very long if so configured by SW. There is nothing wrong with it being open for several tenths of milliseconds.
For a correctly received packet, the rx_time_stamp is the time in which the sync_word/phy_address of the locked received packet ends.
For an SW stopped/aborted reception attempt, the rx_time_stamp is when the receiver was closed.

Time in the logs is all in microseconds.

@alxelax
Copy link
Collaborator Author

alxelax commented Aug 8, 2022

Hi @aescolar , thanks a lot for the detailed clarification. Now I see the cause of the issue. I'll modify the test to shift device starts.

@aescolar
Copy link
Member

aescolar commented Aug 8, 2022

I'll modify the test to shift device starts.

Beware that if the test fails in this way it is probably too tough (meaning the test should probably allow for more time for devices to find each other or so). Note that in simulation tests are repeatable, but still you don't want to have tests that pass in simulation but in real life would be flaky.

Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Mesh bug The issue is a bug, or the PR is fixing a bug platform: nRFx BSIM nrf5x_bsim priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants