scheduler: priority inversion problem #7365

geith · 2017-07-14T14:35:53Z

The RIOT scheduler seems to have no protection against the priority inversion problem where a task of medium priority can prevent the execution a higher prioritized task when that waits for a resource allocated by a low prioritized task. The attached code example demonstrated this behavior.

A common solution for this is a for this is priority inheritance mechanism, where owning task temporary inherits the priority of the waiting task. But there might be other solutions too.

Even though this problem can be mostly avoided in the application design, it would be an feature which can avoid critical complications and simplifies the design of complex applications.
An famous example for this problem is the Pathfinder Mission which got stuck in a reboot loop (triggered by the watchdog timer) and exceeded its energy budget. In that case, that could be fixed by remotely enabling the priority inheritance feature.

In the example below, the shared resource represented by the mutex res_mtx which is used by a low and a high priority thread. A medium priority thread starts working after 3 s prevents the low priority thread from freeing the mutex and thus, the high priority thread from being scheduled.

#include <stdio.h>
#include <string.h>
#include <stdint.h>

#include "thread.h"
#include "mutex.h"
#include "xtimer.h"

mutex_t res_mtx;

char stack_high[THREAD_STACKSIZE_MAIN];
char stack_mid[THREAD_STACKSIZE_MAIN];
char stack_low[THREAD_STACKSIZE_MAIN];

void *t_low_handler(void *arg)
{
  /* starting working loop immediately */
  while(1){
    printf("t_low: allocating resource...\n");
    mutex_lock(&res_mtx);
    printf("t_low: got resource.\n");
    xtimer_sleep(1);

    printf("t_low: freeing resource...\n");
    mutex_unlock(&res_mtx);
    printf("t_low: freed resource.\n");
    xtimer_sleep(1);
  }
  return NULL;
}

void *t_mid_handler(void *arg)
{
  /* starting working loop after 3s */
  xtimer_sleep(3);

  printf("t_mid: doing some stupid stuff...\n");
  while(1){
    thread_yield_higher();
  }
}

void *t_high_handler(void *arg)
{
  /* starting working loop after 500 ms */
  xtimer_usleep(500);
  while(1){
    printf("t_high: allocating resource...\n");
    mutex_lock(&res_mtx);
    printf("t_high: got resource.\n");
    xtimer_sleep(1);

    printf("t_high: freeing resource...\n");
    mutex_unlock(&res_mtx);
    printf("t_high: freed resource.\n");
  }
  return NULL;
}

kernel_pid_t pid_low;
kernel_pid_t pid_mid;
kernel_pid_t pid_high;

int main(void)
{
  xtimer_init();
  mutex_init(&res_mtx);
  puts("This is a scheduling test for Priority Inversion");

  pid_low = thread_create(stack_low, sizeof(stack_low),
      THREAD_PRIORITY_MAIN - 1,
      THREAD_CREATE_STACKTEST,
      t_low_handler, NULL,
      "t_low");

  pid_mid = thread_create(stack_mid, sizeof(stack_mid),
      THREAD_PRIORITY_MAIN - 2,
      THREAD_CREATE_STACKTEST,
      t_mid_handler, NULL,
      "t_mid");

  pid_high = thread_create(stack_high, sizeof(stack_high),
      THREAD_PRIORITY_MAIN - 3,
      THREAD_CREATE_STACKTEST,
      t_high_handler, NULL,
      "t_high");

  thread_sleep();
  return 0;
}

jnohlgard · 2017-07-14T15:02:01Z

I understand the statement and I agree that this is a problem.
@geith would you like to create a pull request with your example code in a test application (for example create a new tests/thread_priority_inversion based on tests/thread_basic), to make it easier for others to check out and see the problem for themselves? I guess it makes sense to keep this program in the repo as a demonstration of the priority inversion problem, and that it is working after we have created a solution.

haukepetersen · 2017-07-31T13:20:53Z

Thanks for pointing this out, this is indeed a known problem. As you also stated, RIOT does so far move this problem onto the application developers, offering no means of protection against priority inversion inside the core/scheduler.

I agree with @gebart here: how about we add the code you provided as a test application (maybe adding some documentation that this test will always fail using RIOTs default kernel configuration). Then based on that, we could look into means we can put (as optional option) into the kernel, to enable e.g. priority inheritance or priority ceiling or similar.

haukepetersen · 2017-08-03T11:51:21Z

I gave this some more thought and started to play around a little bit with mutexes, using the example above as 'benchmark' -> see https://github.com/haukepetersen/RIOT/tree/add_core_prioinheritance

I thought that there might be an easy solution to integrate priority inheritance into the mutex code, but as I found out, it seems not that easy. The problem I currently look it is caused by mutexes being stacked (e.g. a thread locks more than one at the same time). In the example above this happens implicitly through xtimer_sleep, which uses a second mutex internally. In that case, my first try will first increase the prio of t_min to the prio of t_high, but this is reset with the first call to mutex_unlock, done in the xtimer. So it seems we need a more precise tracking of which mutex caused a thread's priority change - so looking for efficient ways to do this...

haukepetersen · 2017-08-04T10:01:54Z

@geith: While I was looking into this, I PRed an adapted version of your test application: #7444

bergzand · 2017-08-04T10:29:17Z

Does a thread have a list of all acquired locks? If so, the priority could be reset to the highest lending priority among locked mutexes. The original thread priority would only have to be registered with the thread.

haukepetersen · 2017-08-04T10:39:25Z

nope, that information is not available. But I think I found another way, PR will follow in the next hours...

kaspar030 · 2017-08-04T11:24:23Z

An famous example for this problem is the Pathfinder Mission which got stuck in a reboot loop

Do we have any non-theoretical, real world problem here? Pathfinder was launched twenty years ago.

IMO this only gets "solved" (e.g., by priority inversion), because it is actually doable compared to preventing all deadlocks. Why do we solve this (programming error) compared to "thread A locks mutex1, thread B locks mutex2, then A locks mutex 2, then B locks mutex 1 -> classical deadlock"?
Any solution will not come for free in terms of code size and cycles.

haukepetersen · 2017-08-04T11:33:25Z

Pathfinder was launched twenty years ago.

But that doesn't mean, that programming errors like that are not made anymore... And why not give the developer something to his/her hand to make an application potentially more error proof?!

Any solution will not come for free in terms of code size and cycles

That's for sure! But adding some priority inversion prevention as an optional module would not hurt, right?!

kaspar030 · 2017-08-04T11:44:16Z

But adding some priority inversion prevention as an optional module would not hurt, right?!

Having it not only does not hurt, but would be nice. I'm just asking myself if we're solving a problem that noone actually encountered in the wild. So why invest time? (unless it's fun! :) )

haukepetersen · 2017-08-04T11:59:11Z

noone actually encountered in the wild

actually I believe this is encountered quite a bit in the wild and is not something that was a problem only 100 years ago. I think the reason one does not hear about this very often anymore, is that all major real-time OSes do have priority inversion prevention integrated by default/as module... (-> see freertos, zephyr, mynewt, ...).

bergzand · 2017-08-04T12:00:31Z

I've always been taught priority inheritance as a way to minimize blocking time of higher priority tasks for better real time guarantees.

kaspar030 · 2017-08-04T12:08:21Z

actually I believe

Well, I don't. :) Any references to this being a real problem?

I've always been taught priority inheritance as a way to minimize blocking time of higher priority tasks for better real time guarantees.

True that.

Alright, didn't want to discourage fixing this, carry on guys!

haukepetersen · 2017-08-04T12:08:44Z

Fixed it, just preparing the PR...

haukepetersen · 2017-08-04T12:53:03Z

done #7445

stale · 2019-08-10T10:08:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

kaspar030 · 2019-09-21T17:14:54Z

This is not a bug, but a design choice.

stale · 2020-03-24T17:57:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

jia200x · 2020-04-03T14:29:54Z

Here's a real world case where priority inversion is indeed a problem: #13573 (comment)

maribu · 2022-09-16T13:05:01Z

Fixed by #17040

geith mentioned this issue Jul 17, 2017

added test program for priority inversion problem #7372

Merged

OlegHahm assigned jnohlgard Jul 31, 2017

OlegHahm added the Area: core Area: RIOT kernel. Handle PRs marked with this with care! label Jul 31, 2017

haukepetersen mentioned this issue Aug 4, 2017

tests: add test for priority inversion using muxes #7444

Closed

haukepetersen mentioned this issue Aug 4, 2017

core: add priority inheritance option #7445

Closed

mmkonrad mentioned this issue Jan 6, 2018

Causing deadlocks while using priority inheritance that are avoidable with priority ceiling #8330

Closed

PeterKietzmann mentioned this issue Feb 21, 2019

release test 2018.01 RC1: tests/thread_priority_inversion, hangup #8444

Closed

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019

stale bot closed this as completed Sep 10, 2019

aabadie mentioned this issue Sep 10, 2019

stale: don't stale issues labelled with "Type: bug" #12192

Merged

aabadie added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Sep 21, 2019

aabadie reopened this Sep 21, 2019

stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Sep 21, 2019

kaspar030 removed the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Sep 21, 2019

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Mar 24, 2020

stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Apr 3, 2020

miri64 added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Jul 1, 2020

miri64 assigned haukepetersen Jul 1, 2020

miri64 added this to the Release 2020.07 milestone Jul 1, 2020

miri64 assigned kaspar030 Jul 1, 2020

miri64 modified the milestones: Release 2020.07, Release 2020.10 Jul 24, 2020

MrKevinWeiss modified the milestones: Release 2020.10, Release 2021.07 Jun 22, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

maribu closed this as completed Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: priority inversion problem #7365

scheduler: priority inversion problem #7365

geith commented Jul 14, 2017

jnohlgard commented Jul 14, 2017

haukepetersen commented Jul 31, 2017

haukepetersen commented Aug 3, 2017

haukepetersen commented Aug 4, 2017

bergzand commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

kaspar030 commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

kaspar030 commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

bergzand commented Aug 4, 2017

kaspar030 commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

stale bot commented Aug 10, 2019

kaspar030 commented Sep 21, 2019

stale bot commented Mar 24, 2020

jia200x commented Apr 3, 2020

maribu commented Sep 16, 2022

scheduler: priority inversion problem #7365

scheduler: priority inversion problem #7365

Comments

geith commented Jul 14, 2017

jnohlgard commented Jul 14, 2017

haukepetersen commented Jul 31, 2017

haukepetersen commented Aug 3, 2017

haukepetersen commented Aug 4, 2017

bergzand commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

kaspar030 commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

kaspar030 commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

bergzand commented Aug 4, 2017

kaspar030 commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

haukepetersen commented Aug 4, 2017

stale bot commented Aug 10, 2019

kaspar030 commented Sep 21, 2019

stale bot commented Mar 24, 2020

jia200x commented Apr 3, 2020

maribu commented Sep 16, 2022