-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce usage of K_THREAD_STACK_SIZEOF macro in k_thread_create() #14269
Comments
I agree there's a problem here. |
So, some examples for the ARMv7-M MPU case, @andrewboie , to clarify this, and show one more implication: Assume GUARDS are used and no-workaround for power-of-two-alignment. Example 1
Example 2 (same as above but developer does not use K_THREAD_STACK_SIZEOF):
The above example shows that we need a workaround, if we are not sure what is passed in k_thread_create(). Assume now workaround is used for size adjustment. We repeat the examples: Example 1-updt
This works if the MPU rounds up to the next power-of-two Example 2-updt (same as 1-updt but developer does not use K_THREAD_STACK_SIZEOF):
In other words, since the work-around assumes worst-case, and this is worst-case, the result is fine. Conclusion: we need to have a formal and deterministic way of passing the available size to the k_thread_create, so we won't need workarounds. |
One other option @andrewboie is to detect in k_thread_create() the difference between the total_size and the one derived by the object meta-data (z_object_find) and add this difference to what we pass into arm's |
Yes, I favor the latter approach. I'd like to see if we can, for all arches, establish an equivalence, where if we have:
For any value of
Either of these calls should produce the same values in If we have this setup correctly, either the common I'm sketching out the particulars and will post a follow-up once I think I have the details worked out. |
Based on what we have seen porting userspace to 3 arches, I think we can establish some invariant properties of thread stack objects. And I think we can make the layouts the same for all arches. The following sizes are important:
Now let's show the memory layout for threads in supervisor and in user mode. In this new layout (currently only followed by ARC), the location of the guard area moves depending on what mode the thread is in: Supervisor mode:
User mode
This layout has the following properties
Now, how do we establish equivalence, such that we work in the same way if we have:
And call either:
We can simply do: #define K_THREAD_STACK_SIZEOF(stack) (sizeof(stack) - K_THREAD_STACK_RESERVED) And in k_thread_create, apply Incidentally, this setup should also allow us to completely define a lot of these stack macros directly in kernel.h without having to define Z_ARCH_THREAD_STACK_.... macros in each arch header. |
@andrewboie I think this is a nice analysis; we should see if it is possible to implement for the next Zephyr release. I have some high-level comments:
|
Actually, I am not so sure now. The problem is base address alignment of stacks on power-of-two systems. With the new proposed layout, consider a stack declared of size 2K. This generates a stack object of 2k + 32 bytes (guard) + 512 bytes (privileged stack) for a total of 2592. Now consider several of these stacks defined. Each one has to have its lowest memory address aligned to 2K. I suspect the linker is going to leave 1504 bytes of wasted space in between each instance! I have not yet verified this experimentally, but need to do so before we pursue this route further. I do not believe the linker is smart enough to fill the gaps with other, smaller objects, but need to test to see what happens on ARC (which is already doing this for MPU v2 systems) With the privileged stacks in separate memory, the 2K stacks can be packed in memory very efficiently by the linker. |
For supervisor threads, I like the idea of just making a guard MPU region out of whatever memory precedes the thread stack, without a specific guard region. Threads should not be sharing pointers with each other based on locations within stack frames. The only trick is what to do with the very first stack in that region, since it will not be preceded by other stacks, but some different memory. However, for privilege elevation, in a design where the priv stack is immediately after the user stack, turning the user mode stack into a guard won't work, it's not unreasonable to expect system calls to perform memory operations on memory areas defined in the system caller's stack frame. But as I posted earlier, having the priv and user stack be regions in the same stack object may waste tons of RAM. In a scheme where they are in separate areas of memory, we could make this work since privilege stacks will be adjacent to only other privilege stacks, with no expectations of data sharing between them. |
Agreed. I haven't yet come up with a good scheme to ensure calls to k_thread_create() to create user threads with one of these supervisor-only stacks fails, but I'm thinking about it. |
How about this design, which I think has the best aspects of what we have been thinking about:
This could work pretty well:
|
I spoke to @jhedberg and @andyross and I think forbidding supervisor-to-supervisor stack access needs more contemplation before we commit to it. Here's a truth table of what is accepted for access when Context A tries to access memory on Context B's stack:
The change is to forbid Supervisor -> Supervisor thread stack access. We can position the ISR/stack guard in such a way to still allow ISRs to write to stacks. |
This is fine. I believe we do this already. |
That's fine too. |
I agree with the idea. The first stack is not the only problem: we need to make sure the guards are of sufficient size and respect the alignment requirements of the MPU architecture. |
I agree with refactoring the code to forbit Supervisor-Supervisor thread stack access. |
This is of course, ok, but this is not the only thing we need to do; we also need to not enforce special alignment requirements for these threads. |
ISRs run on the interrupt stack and not the thread stack. Separate memory area. |
I am confused here. ISRs run on the interrupt stack, yes, but they might get pointers to objects defined in the stacks of threads, no? If ISRs only access global kernel data, then, I agree, we are good. But otherwise, I don't see how we can put supervisor stacks as read only... |
Oh, I see. What I was thinking was, if we can get wider agreement that it's OK to ban supervisor->supervisor thread access, to arrange the IRQ stack in memory so that its guard area would not intersect any thread stack. Treat it as a special case. |
This macro is slated for complete removal, as it's not possible on arches with an MPU stack guard to know the true buffer bounds without also knowing the runtime state of its associated thread. As removing this completely would be invasive to where we are in the 1.14 release, demote to a private kernel Z_ API instead. The current way that the macro is being used internally will not cause any undue harm, we just don't want any external code depending on it. The final work to remove this (and overhaul stack specification in general) will take place in 1.15 in the context of zephyrproject-rtos#14269 Fixes: zephyrproject-rtos#14766 Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This macro is slated for complete removal, as it's not possible on arches with an MPU stack guard to know the true buffer bounds without also knowing the runtime state of its associated thread. As removing this completely would be invasive to where we are in the 1.14 release, demote to a private kernel Z_ API instead. The current way that the macro is being used internally will not cause any undue harm, we just don't want any external code depending on it. The final work to remove this (and overhaul stack specification in general) will take place in 1.15 in the context of #14269 Fixes: #14766 Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Downgrading this to LOW. Basically, because it results only in some ram waste for ARM, but no serious impact, otherwise. It leads to some inconsistencies, yes, but they are not critical. |
Describe the bug
There is some inconsistency in how the macros related to Thread stack definition are actually user in Zephyr applications/samples/tests/subsystems. Ideally, we want to
stack obj = K_THREAD_STACK_DEFINE
(sym,nominal size
)k_thread_create
with stack_size =K_THREAD_STACK_SIZEOF(stack obj)
In this way we guarantee that
k_thread_stack_create
is given the amount of stack area (in bytes) that can actually be used by the thread, excluding, for instance, a privileged stack guard (in ARM builds).However, this is not always the case: in several cases (including even the _main thread) k_thread_stack_create is passed
nominal_size
, instead of K_THREAD_STACK_SIZE_OF(sym). If privileged stack guards are to be used on ARM builds with requirement for power-of-two start/size alignment of stack objects, then this is a bug, becausek_thread_stack_define
is supplied with a size value that is larger that what the thread can actually use: this is because in that case no extra area is allocated for the guard.In ARM builds with CONFIG_MPU_REQUIRES_POWER_OF_TWO_ALIGNMENT && USERSPACE we have a work-around for this, but it effectively eats up 32-bytes from the stack objects.
Expected behavior
k_thread_stack_create()
with argumentstack_size = K_THREAD_STACK_SIZEOF(stack object)
, where,stack object = K_THREAD_STACK_DEFINE(nominal size)
The text was updated successfully, but these errors were encountered: