Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newlib: Initial thread-safe implementation #4529

Closed
wants to merge 2 commits into from

Conversation

DipSwitch
Copy link
Member

This is the initial implementation to make newlib thread safe.

Things that could be changed:

  • We could make the user choose what thread must be thread-safe and what not. (by don't including the struct reent to the tcb_t but a pointer where we set the global pointer if the user doesn't want to make it thread-safe

I haven't tested yet if it compiles.

@DipSwitch DipSwitch added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Area: core Area: RIOT kernel. Handle PRs marked with this with care! Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Impact: major The PR changes a significant part of the code base. It should be reviewed carefully labels Dec 20, 2015
@DipSwitch
Copy link
Member Author

Fixes: #4488

@DipSwitch DipSwitch force-pushed the pr/add_newlib_threadsafe branch from 3f53d58 to 27bcae9 Compare December 20, 2015 22:16
@haukepetersen
Copy link
Contributor

The code looks sane to me. I guess you have not yet looked at code sizes, right? Especially the memory increase for the tcb would be interesting to know...

Though in general I would say that we don't enable this feature per default and try to get the newlib 'thread-safe-enough' with other means (e.g. actively blocking or buffering UART for puts/printf) to safe memory and task switching overheads, but this is also something we would need to benchmark to make some informed decisions...

@Kijewski
Copy link
Contributor

Yep, please update tests/sizeof_tcb/main.c.

@OlegHahm OlegHahm added this to the Release 2016.03 milestone Dec 21, 2015
@DipSwitch DipSwitch removed the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Dec 27, 2015
@DipSwitch
Copy link
Member Author

Fixed compile problems and included it in tests\sizeof_tcb

Please note that it would also be possible to make the reent implementation thread specific by using a pointer to struct _reent where the struct itself will be placed on the stack when enabled and the tcb would only hold the pointer to the location on the stack or use the _global_impure_ptr. This way a simple LED toggle task wouldn't have a thread-safe newlib context. But the consol handler does for example.

@DipSwitch
Copy link
Member Author

@haukepetersen the code in newlib is already called, the only difference now is that newlib doesn't use a global struct but a thread specific struct. So no penalties are to be expected.

@@ -6,7 +6,7 @@ export USEMODULE += cortexm_common
# Export the peripheral drivers to be linked into the final binary:
export USEMODULE += periph
# all cortex MCU's use newlib as libc
export USEMODULE += newlib
export USEMODULE += newlib_thread_safe
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed before merging and another method needs to be found to enable thread-safety.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restore this line and make the application include it using the pseudo module?

@haukepetersen
Copy link
Contributor

So no penalties are to be expected

I am not so sure about that. We have this newlib struct the as part of every tcb, so the tcp size increases, right? Did you do some initial measurements on the changes in code/ram usage?

@OlegHahm
Copy link
Member

Any news?

@DipSwitch
Copy link
Member Author

The reentry struct is 96 bytes / 24 words total :)

@jnohlgard
Copy link
Member

@DipSwitch So, that is the cost per thread to implement thread safe errno etc?

@DipSwitch
Copy link
Member Author

Yup :)
On 29 Jan 2016 14:37, "Joakim Nohlgård" notifications@github.com wrote:

@DipSwitch https://github.com/DipSwitch So, that is the cost per thread
to implement thread safe errno etc?


Reply to this email directly or view it on GitHub
#4529 (comment).

@jnohlgard
Copy link
Member

I tried this PR very briefly and I get hard faults on mulle during startup in the xtimer_drift test application and a plain lockup on stm32f4discovery. The stm32 is stuck inside sched_run with an empty runqueue. The program runs up to the first xtimer_usleep_until call before hanging, but there is no output seen at all on the UART.

Below is my quick debug dump from the stm32f4discovery board:

^C
Program received signal SIGINT, Interrupt.
0x08000398 in bitarithm_lsb (v=0) at /home/jgn/work/src/riot/core/bitarithm.c:53
53          r++;
(gdb) bt
#0  bitarithm_lsb (v=0) at /home/jgn/work/src/riot/core/bitarithm.c:53
#1  0x08000220 in sched_run () at /home/jgn/work/src/riot/core/sched.c:72
#2  0x0800047e in isr_svc () at /home/jgn/work/src/riot/cpu/cortexm_common/thread_arch.c:275
(gdb) list
48  {
49      register unsigned r = 0;
50  
51      while ((v & 0x01) == 0) {
52          v >>= 1;
53          r++;
54      };
55  
56      return r;
57  }
(gdb) print v
$1 = 0
(gdb) print r
$2 = 77772978
(gdb) frame 1
#1  0x08000220 in sched_run () at /home/jgn/work/src/riot/core/sched.c:72
72      int nextrq = bitarithm_lsb(runqueue_bitcache);
(gdb) list
67      tcb_t *active_thread = (tcb_t *)sched_active_thread;
68  
69      /* The bitmask in runqueue_bitcache is never empty,
70       * since the threading should not be started before at least the idle thread was started.
71       */
72      int nextrq = bitarithm_lsb(runqueue_bitcache);
73      tcb_t *next_thread = clist_get_container(sched_runqueues[nextrq], tcb_t, rq_entry);
74  
75      DEBUG("sched_run: active thread: %" PRIkernel_pid ", next thread: %" PRIkernel_pid "\n",
76            (active_thread == NULL) ? KERNEL_PID_UNDEF : active_thread->pid,
(gdb) print runqueue_bitcache 
$3 = 0

@OlegHahm
Copy link
Member

Any news?

@OlegHahm OlegHahm assigned jnohlgard and unassigned haukepetersen Feb 27, 2016
@jnohlgard
Copy link
Member

ping @DipSwitch

@DipSwitch
Copy link
Member Author

I looked yesterday and have a fix the crash. But then I noticed the stack
sizes where bigger without optimization. IDLE went from 168 to 1590
somewhere. While with debug enabled and no thread-safe newlib the size
didn't increase so dramatically. Will state findings and numbers in a bit.
First some food :p

Maybe instead of using -O0 this can be solved using -Og although I don't
know if clang supports this?
On 13 Mar 2016 08:40, "Joakim Nohlgård" notifications@github.com wrote:

ping @DipSwitch https://github.com/DipSwitch


Reply to this email directly or view it on GitHub
#4529 (comment).

@jnohlgard
Copy link
Member

@DipSwitch no rush, could you push your updated branch to Github?

@jnohlgard jnohlgard added the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Mar 13, 2016
@DipSwitch
Copy link
Member Author

Hmm false positive...

newlib-thread-safe: -Og
> ps
        pid | name                 | state    Q | pri | stack ( used) | location
          1 | idle                 | pending  Q |  15 |  2048 ( 1200) | 0x200001cc
          2 | main                 | running  Q |   7 |  2560 ( 1784) | 0x200009cc
            | SUM                  |            |     |  4608 ( 2984)

newlib-thread-safe: -Os
> ps
        pid | name                 | state    Q | pri | stack ( used) | location
          1 | idle                 | pending  Q |  15 |  2048 ( 1200) | 0x200001cc
          2 | main                 | running  Q |   7 |  2560 ( 1792) | 0x200009cc
            | SUM                  |            |     |  4608 ( 2992)

newlib: -Os
> ps
        pid | name                 | state    Q | pri | stack ( used) | location
          1 | idle                 | pending  Q |  15 |   256 (  136) | 0x200001cc
          2 | main                 | running  Q |   7 |  1536 (  728) | 0x200002cc
            | SUM                  |            |     |  1792 (  864)


newlib: -Og
> ps
        pid | name                 | state    Q | pri | stack ( used) | location
          1 | idle                 | pending  Q |  15 |   256 (  136) | 0x200001cc
          2 | main                 | running  Q |   7 |  1536 (  720) | 0x200002cc
            | SUM                  |            |     |  1792 (  856)

main(): This is RIOT! (Version: 2016.03-devel-1104-g5e548-dipswitch-K55VD-pr/add_newlib_threadsafe)
        member, sizeof, offsetof
sizeof(thread_t): 1104
        sp            4   0
        status        1   4
        priority      1   5
        pid           2   6
        rq_entry      8   8
        wait_data     4  16
        msg_waiters   4  20
        msg_queue    12  24
        msg_array     4  36
        newlib_reent1064  40
Done.

And sorry about the stray commits, will fix later.

I don't know where the previous statement of 96 bytes is comming from. Maybe it got something to do with an optimalization which came with using -fdata-sections -ffunction-sections -Wl,--gc-sections enabled and having the struct _reent being located in bss...

endef

# compile the tests
__has_reent := $(shell echo "\#include <sys/reent.h>\n" | $(CC) -E $(CFLAGS) $(INCLUDES) - 2> /dev/null > /dev/null; echo $$?)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the \n results in an error: "stdin>:1:23: error: extra tokens at end of #include directive [-Werror]", adding -e to the echo command solves it ( -e: enable interpretation of backslash escapes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo is really the worst way to get a text string in shell scripts. There
is no standard specification and the flags and default behaviour differ
between almost every platform (OSX, Linux, BSD, windows etc.)
Also, the \n is redundant since echo will add a newline to the string (on
all platforms that I know of) when called with no flags.

Den 23 aug. 2016 09:52 skrev "Pieter Willemsen" notifications@github.com:

In sys/newlib/thread_safe/Makefile.include
#4529 (comment):

@@ -0,0 +1,22 @@
+# define the compile test file
+define __has_reent_small_test
+#include <sys/reent.h>\n
+#ifndef _REENT_SMALL\n
+#error wasting 1KB\n
+#endif
+endef
+
+# compile the tests
+__has_reent := $(shell echo "#include &lt;sys/reent.h&gt;\n" | $(CC) -E $(CFLAGS) $(INCLUDES) - 2> /dev/null > /dev/null; echo $$?)

the \n results in an error: "stdin>:1:23: error: extra tokens at end of
#include directive [-Werror]", adding -e to the echo command solves it (
-e: enable interpretation of backslash escapes)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/RIOT-OS/RIOT/pull/4529/files/0b380225290d6b1efeea9a7bbb273684b417672d#r75816861,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AATYQu8sumUmNCJsn-omBi-6etbn_ofjks5qiqa9gaJpZM4G43Px
.

Copy link

@pwillem pwillem Aug 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the \n solved it.

@miri64 miri64 removed the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Oct 31, 2016
@miri64
Copy link
Member

miri64 commented Oct 31, 2016

I don't see why this is marked as a bug (except if you see newlib's non-threadsafeness as a bug), so postponed due to feature freeze

@miri64 miri64 modified the milestones: Release 2017.01, Release 2016.10 Oct 31, 2016
@miri64 miri64 removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 31, 2016
@OlegHahm
Copy link
Member

Any progress here?

@PeterKietzmann PeterKietzmann modified the milestones: Release 2017.01, Release 2017.04 Jan 26, 2017
@smlng
Copy link
Member

smlng commented Jan 12, 2018

postponed 4 times in the past, removing milestone label.

@smlng smlng removed this from the Release 2018.01 milestone Jan 12, 2018
@miri64
Copy link
Member

miri64 commented Feb 15, 2018

No significant progress for over a year. Closing as memo for now.

@miri64 miri64 closed this Feb 15, 2018
@miri64 miri64 added the State: archived State: The PR has been archived for possible future re-adaptation label Feb 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Impact: major The PR changes a significant part of the code base. It should be reviewed carefully State: archived State: The PR has been archived for possible future re-adaptation Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.