Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/thread: add thread states for FreeRTOS queue implementation #18454

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gschorcht
Copy link
Contributor

Contribution description

This PR adds the thread states STATE_QUEUE_FULL_BLOCKED and STATE_QUEUE_EMPTY_BLOCKED which are compiled in depending on whether the FreeRTOS compatibility layer for ESP SoCs is used.

This change fixes a problem that was figured out when investigating a crash in tests/nimble_rpble_gnrc in PR #18439.

The FreeRTOS queue and semaphore implementation of the FreeRTOS compatibility layer previously used STATUS_RECEIVE_BLOCKED and STATE_SEND_BLOCKED to block a thread when the corresponding queue was empty and full, respectively. This was compatible to the blocking mechanism used for msg by gnrc_netif. However, since PR #16748, gnrc_netif no longer uses the msg blocking mechanism, but uses thread flags.

As a consequence, RIOT may crash when using FreeRTOS queues and semaphores together with gnrc_netif for the ESP network interfaces if STATUS_RECEIVE_BLOCKED in same thread and STATUS_RECEIVE_BLOCKED is used by FreeRTOS compatibility layer. If the gnrc_netif thread is blocked because of a FreeRTOS semaphore and is therefore in STATUS_RECEIVE_BLOCKED state, the _msg_send function will cause a crash because it then assumes that target->wait_data contains a pointer to a message of type msg_t, but by using thread flags it contains just the flag mask. This situation can easily occur when the ESP hardware is used by the gnrc_netif thread while another thread is sending something timer controlled to the gnrc_netif thread. See also #18439 (comment).

To solve this problem STATE_QUEUE_FULL_BLOCKED and STATE_QUEUE_EMPTY_BLOCKED are introduced.

Testing procedure

Use command

USEMODULE=esp_wifi CFLAGS='-DESP_WIFI_SSID=\"myssid\" -DESP_WIFI_PASS=\"mypass\" -DDEBUG_ASSERT_VERBOSE' BOARD=esp32-wrover-kit make -j8 -C examples/gnrc_networking flash term

and check that the WiFi network interface still works.

Use command ps, the output should look like

	pid | name                 | state    Q | pri | stack  ( used) ( free) | base addr  | current     
	  1 | sys_evt              | bl qempt _ |   4 |   2616 ( 1188) ( 1428) | 0x3ffbd3cc | 0x3ffbdba0 
	  2 | esp_timer            | sleeping _ |   2 |   3640 (  464) ( 3176) | 0x3ffbde08 | 0x3ffbea70 
	  3 | idle                 | pending  Q |  31 |   2048 (  448) ( 1600) | 0x3ffb2390 | 0x3ffb29d0 
	  4 | main                 | running  Q |  15 |   3584 ( 1424) ( 2160) | 0x3ffb2b90 | 0x3ffb3430 
	  5 | pktdump              | bl rx    _ |  14 |   3584 (  560) ( 3024) | 0x3ffb84f0 | 0x3ffb90c0 
	  6 | ipv6                 | bl rx    _ |  12 |   2048 (  784) ( 1264) | 0x3ffb6110 | 0x3ffb66c0 
	  7 | udp                  | bl rx    _ |  13 |   2048 (  572) ( 1476) | 0x3ffb9d3c | 0x3ffba300 
	  8 | wifi                 | bl qempt _ |   1 |   6200 ( 2172) ( 4028) | 0x3ffbec44 | 0x3ffc0240 
	  9 | netif-esp-wifi       | bl anyfl _ |  10 |   2048 (  844) ( 1204) | 0x3ffb45bc | 0x3ffb4b50 
	 10 | RPL                  | bl rx    _ |  13 |   2048 (  504) ( 1544) | 0x3ffb9538 | 0x3ffb9b40 
	    | SUM                  |            |     |  29864 ( 8960) (20904)

The sys_evt thread should be in state bl qemt.

Issues/PRs references

Prerequisite for PR #18439

Adds the special `STATUS_QUEUE_FULL_BLOCKED` and `STATUS_QUEUE_EMPTY_BLOCKED` thread states for the queue blocking mechanism in the FreeRTOS compatibility layer implementation. These additional thread states are necessary because the previously used thread states `STATUS_RECEIVE_BLOCKED` and `STATUS_SEND_BLOCKED` can no longer be used because they are not compatible with the thread flag blocking mechanism used by `gnrc_netif` since PR 16748, which can cause crashes.
New blobecause the previously used thread states `STATUS_RECEIVE_BLOCKED` and `STATUS_SEND_BLOCKED` can no longer be used because they are not compatible with the thread flag blocking mechanism used by `gnrc_netif` since PR 16748king states have to be used.
@gschorcht gschorcht requested a review from kaspar030 as a code owner August 15, 2022 10:45
@github-actions github-actions bot added Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: cpu Area: CPU/MCU ports Platform: ESP Platform: This PR/issue effects ESP-based platforms labels Aug 15, 2022
@gschorcht gschorcht changed the title core/thread: add thread flags for FreeRTOS queue implementation core/thread: add thread states for FreeRTOS queue implementation Aug 15, 2022
@gschorcht gschorcht requested a review from maribu August 15, 2022 10:46
@gschorcht gschorcht added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Aug 15, 2022
@maribu maribu added the Process: needs >1 ACK Integration Process: This PR requires more than one ACK label Aug 15, 2022
Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Code looks good and test result is provided.

Since this is affecting core, we need a 2nd ACK, though.

@maribu maribu requested a review from benpicco August 15, 2022 11:02
Comment on lines +175 to +178
#if IS_USED(MODULE_ESP_FREERTOS_COMMON)
STATUS_QUEUE_FULL_BLOCKED, /**< FreeRTOS compatibility layer queue full */
STATUS_QUEUE_EMPTY_BLOCKED, /**< FreeRTOS compatibility layer queue empty */
#endif
Copy link
Contributor

@benpicco benpicco Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add those at the end it will not mess up the thread state mapping in OpenOCD.

The use of this enum in OpenOCD also would speak for adding this unconditionally.
We already have STATUS_ZOMBIE which is only used by mynewt-core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add those at the end it will not mess up the thread state mapping in OpenOCD.

The use of this enum in OpenOCD also would speak for adding this unconditionally. We already have STATUS_ZOMBIE which is only used by mynewt-core

Hm, I got an assert here

assert(thread->status < STATUS_ON_RUNQUEUE);
when I had them at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have STATUS_ZOMBIE which is only used by mynewt-core

I couldn't find that.

Copy link
Contributor

@benpicco benpicco Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set by thread_zombify() which is only called by that package.

I guess the assert could be replaced by

assert(thread->status == STATUS_RUNNING || thread->status == STATUS_PENDING);

but I'm not sure if this assumption is backed in elsewhere too 😕

Copy link
Contributor Author

@gschorcht gschorcht Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, thread_add_to_list may only be called for threads that are not in a runqueue. This is ensured by the assert.

Copy link
Contributor Author

@gschorcht gschorcht Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set by thread_zombify() which is only called by that package.

Ah ok, it is a permanent state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the assert could be replaced by

assert(thread->status == STATUS_RUNNING || thread->status == STATUS_PENDING);

but I'm not sure if this assumption is backed in elsewhere too confused

Hm, >= STATUS_ON_RUNQUEUE is also used several times:

263:        if (!(process->status >= STATUS_ON_RUNQUEUE)) {
268:        if (process->status >= STATUS_ON_RUNQUEUE) {
280:    int on_runqueue = (active_thread->status >= STATUS_ON_RUNQUEUE);
./core/sched.c
146:    if (me->status >= STATUS_ON_RUNQUEUE) {
./core/thread.c

Copy link
Contributor Author

@gschorcht gschorcht Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add those at the end it will not mess up the thread state mapping in OpenOCD.

The use of this enum in OpenOCD also would speak for adding this unconditionally.

I see, it seems not to be a short-term solution. Maybe, we should continue with PR #18439 with state STATUS_MUTEX_BLOCKED for now which also solves the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining them unconditionally would cost 18 additional bytes in ROM.

@benpicco
Copy link
Contributor

benpicco commented Sep 2, 2022

Is this still needed?

@gschorcht
Copy link
Contributor Author

Is this still needed?

I would say yes. In the FreeRTOS customization layer, STATUS_MUTEX_BLOCKED is being misused for FreeRTOS queue states. However, the PR definitely requires a revision, depending on the progress of PR #18460.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ESP Platform: This PR/issue effects ESP-based platforms Process: needs >1 ACK Integration Process: This PR requires more than one ACK Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants