Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-local memory #506

Closed
wants to merge 5 commits into from

Conversation

haukepetersen
Copy link
Contributor

This PR is a proposal for an implementation to reserve a memory section of the stack memory for thread-local memory. This is especially useful for device drivers that run in their own thread, as there can be any amount of driver instances without the need of reserving memory inside the driver code in advance.

For example if you want to use multiple UARTs on a board, you can just create one thread for each device and each thread will use the same code. The usage could look like this:

// somewhere in the uart code
typdef struct {...} uart_state_t;           // struct containing the memory needed for the uart driver

// during initilization:
char mem1[STACKSIZE], mem2[STACKSIZE];
int uart1_pid = thread_create_with_local_mem(mem1, STACKSIZE, sizeof(uart_state_t), PRIO, 0, &uart_run, "UART 1");
int uart1_pid = thread_create_with_local_mem(mem2, STACKSIZE, sizeof(uart_state_t), PRIO, 0, &uart_run, "UART 2");

[...]

// for accessing the thread local memory inside the uart thread:
uart_state_t *my_state = (uart_state_t*) thread_get_local_mem(thread_getpid());

The advantages of this implementation over using malloc() are:

  • it is independent from any clib implementation
  • no dynamic memory allocation needed
  • its platform independent

The only drawback is the use of sizeof(char*) bytes more in the tcb_t struct (e.g. 80byte when having 20 threads on a 32-bit machine...)

Does this work for you or does anyone have a better idea how to do this? I'm open for discussion!

@mehlis
Copy link
Contributor

mehlis commented Jan 15, 2014

👍

@haukepetersen
Copy link
Contributor Author

I further just found the discussion "Thread-local Storage" from Nov 14th 2013 on the developer mailing-list.
The main difference between the arguments there and my implementation is the static nature of the latter, compared to malloc() and tls_malloc().

In my implementation there is no need to keep track of any pointer, as you can always acces the thread specific memory through thread_get_local_mem(). If you would use malloc() or tls_malloc() they return pointers, but how does a thread instance now which pointer it has to access...

@@ -55,6 +55,8 @@
cib_t msg_queue;
msg_t *msg_array;

char *lcoal_mem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

@haukepetersen
Copy link
Contributor Author

Yes, it was -> came from copying after my last compilation, shame on me...

@kaspar030
Copy link
Contributor

Why not just

char *mem1[STACKSIZE+my_tls_size] uart_state_t *my_state = sched_threads[getpid()]->stack_start+sched_threads[getpid()]->stack_size

or something similar.

@haukepetersen
Copy link
Contributor Author

would work, but I guess its not really user-friendly -> for this method you really have to know the core and i guess one of the RIOT goals is to make it easily programmable by users without having go know all the OS-details?!

@kaspar030
Copy link
Contributor

well,
thread_get_local_mem() { return sched_threads[getpid()]->stack_start+sched_threads[getpid()]->stack_size;

int thread_create_with_local_mem(char *stack, int stacksize, int localmemsize, char priority, int flags, void (*function) (void), const char *name) { thread_create(stack, stacksize-localmemsize, priority, flags, function); }

No need to extend the tcb.

@haukepetersen
Copy link
Contributor Author

This was acutally my first implementation, but for a reason I changed again... Now I just have to remember the reason: I was trying to put the thread-local memory in after calling thread_create (by altering the already initialized tcb), and that lead to problems depending on how the platform handles it's stack.

But yours looks better, I will adept the PR!

@OlegHahm
Copy link
Member

Setting thread's status to STATUS_LOCAL_MEMORY is missing.

@haukepetersen
Copy link
Contributor Author

Ok, new try, this acutally looks way cleaner (why havn't I thought about this...). I made one further enhancement by 'hijacking' a bit in the tcb->status field to mark if a thread-local memory exists.

- new setting the STATUS_LOCAL_MEMORY flag
- not trying to set the tcb->local_mem field anymore
@kaspar030
Copy link
Contributor

# RIOT/core$ grep '==' * | grep STATUS

status is not a bitfield.

@kaspar030
Copy link
Contributor

what do you need STATUS_LOCAL_MEMORY for anyways?

@haukepetersen
Copy link
Contributor Author

if you don't have a measure to find out if thread-local memory is used, thread_get_local_mem() would return either 1) the correct address to the local memory or 2) something something.
For ease of use and proneness to errors I think it is a good idea to give the caller some feedback if the memory section does actually exist (and I think the overhead is quite negligible)

And what to you mean with the bitfield, status is uint16 and if (status & BLABLA) { should be valid, or not?!

@OlegHahm
Copy link
Member

@kaspar030 means that adding this "flag" to the tcb_t.status would break several condition checks in RIOT. Simply run

git grep -E "==.*STATUS

to see what he mean. ;)

@haukepetersen
Copy link
Contributor Author

Ok, I see :-) I gues there are three possible ways to solve this:

  1. Change 13 conditions and interpret tcb->status as bitfield
  2. Purge the test if local memory was defined and trust the user to use the thread_get_local_mem() with caution
  3. Find a better solution that will not use more memory in the tcb
    Opinions?

@kaspar030
Copy link
Contributor

  1. seems non-trivial
  2. seems non trivial (ideas and brainstorming is welcome

Let's got with well documented 2) for now. Users using get_localmem must have used create_localmem before. Two simple functions for professionals understanding memory in C...

Use-cases are hard to find, anyways. Your uart example can easily be rewritten without this kind of TLS.
Real TLS is something different, anyways (same pointer pointing to different location in different threads. Needs some kind of MMU...

@haukepetersen
Copy link
Contributor Author

Seems sensible to be, so #2 it is for now.

But I would be very interested in the rewritten version of my example (and if that would then work for further usages I have thought of...)!

@kaspar030
Copy link
Contributor

@haukepetersen

Something like

struct uart_state_t[NUM_UARTS] uart_states;
int used_uart_states;

int main() {
struct uart_state_t *my_state = uart_states[used_uart_states++];

@haukepetersen
Copy link
Contributor Author

@kaspar030 Actually that is exactly the code I use in my other projects (where I don't yet use RIOT). In my opinion it has the disadvantage of introducing a lot of configuration values.

For example you have to statically define the number of instances for say high-level uarts, smb380s, cc1100s etc etc. So this means, that your board-config needs to define all these numbers. But especially for some multi-purpose boards as the STM32discoverys, where you connect devices as to your needs, you would always have to edit the board-config or install some other means to set those values in your user-makefile or so.

With the thread local memory you could go completely around this issue, all you need to do is call the init-function of your device and tell it, which pins/periphial to use. For example init_smb380(SPI_1, GPIO_2, GPIO_3) will create an instance of the smb380 driver which uses the given periphials, no further configuration needed...

@kaspar030
Copy link
Contributor

Well,
you can always do

int smb380_main_loop() {
  struct uart_state_t my_state;
  <do stuff and freely use my_state>
}

... and have thread local my_state ...

keep it simple. ;)

@haukepetersen
Copy link
Contributor Author

Yes, but again, in that case each function that is called from the main loop would need a pointer to that struct in the argument list, and that does not work well with my sense for aesthetic :-)

But if thats what we agree on, I could do with it...

@kaspar030
Copy link
Contributor

Otherwise every function that is called would need a function call to "thread_get_localmem"...

If there can be multiple instances of an "uart driver", you should use normal C object orientation techniques. Use a struct as object state and hand that to every "method" you call.

You're using a global variable here. That is against (my sense of) asthetics, even if you put the global out of reach, call it TLS and then make it accessible through stack magic...

@OlegHahm
Copy link
Member

I'm strictly against C OO techniques. C is not meant to be object oriented and there's a good reason for it: performance!

@kaspar030
Copy link
Contributor

Are you serious?

@kaspar030
Copy link
Contributor

I mean, we're using structs as objects and functions with the struct as parameter for the object instance all over the place...

@OlegHahm
Copy link
Member

Do we? Then let's remove this shit!

@kaspar030
Copy link
Contributor

While we're at it, we can also finally remove the C stuff altogether. Performance is king!

@haukepetersen
Copy link
Contributor Author

Agreed. But let's open a new PR for that, are you gonna do it?

@OlegHahm
Copy link
Member

To get this serious again: What I've meant was that I want to avoid constructs like

struct {
    void *device_t;
    int (*write)(char c);
    int (*read)((char *buf);
} char_dev_t;

struct {
   void *device_t;
   int (*write)(char c);
   int (*read)((char *buf);
} uart_t;

and so one, which leads to nice oo interface but a nasty chain of passing around function pointers.

@kaspar030
Copy link
Contributor

@OlegHahm I know and I'm with you there. ;)

@OlegHahm
Copy link
Member

OlegHahm commented Feb 5, 2014

@haukepetersen, can you rebase?
@kaspar030, can you review again?

@mehlis
Copy link
Contributor

mehlis commented Feb 26, 2014

discussed with @haukepetersen close

@mehlis mehlis closed this Feb 26, 2014
@LudwigKnuepfer
Copy link
Member

@haukepetersen @mehlis How about you share the reasons for closing it here?

@OlegHahm
Copy link
Member

I think having thread-local storage would be nice.

@haukepetersen
Copy link
Contributor Author

I came to the conclusion that (at least for the applications I had the thread-local storage in mind) it was not useful. At the moment I can not really think of any application that can't be done (in an easy way) without this PR, so I see no reason to merge it...

@haukepetersen haukepetersen deleted the threadlocal_mem branch March 26, 2014 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants