-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: further stack reductions for nanox support #45
Conversation
- move UI context to heap - split out key image and ident frames from Engine::update approx 9124 -> 8412 at ring_signing call
I'm curious, how does no-inlining help reduce stack usage? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes all look good to me. I'd still like to understand how no inlining helps here.
Thanks!
thanks for the review!
so LLVM's optimiser really likes to inline functions which makes a lot of sense for performance, but when you inline functions the stack frames get merged. say you have two functions called sequentially that both use 4k of local storage, when inlined that becomes 8k in the caller stack frame whereas non-inlined they'll each have their own 4k frame and when called sequentially will re-use the same (SP+4k) space on the stack. none of this usually matters that much but it's exacerbated by rust not yet having working copy elision / NRVO and the rather severe 8k limit on the stack pointer (vs. ~20k of available memory) in the nanox OS. |
Gotcha, thanks for explaining! |
it turns out the nanox has a more severe stack use limitation, faulting on -memory access- when the stack pointer exceeds 8k, rather than only when invoking syscalls on the nanosplus.
it is expected that this limit will be resolved in future firmware, for now we should be able to avoid the issue by pre-allocating and using out-pointer tricks to avoid rust's lack of working copy elision / NRVO.
following #23, this PR:
Engine::update
this reduces our worst-case stack depth by ~2k down to ~7524 bytes which appears to get us under the limit on the nanox. this seems to be okay in the simulator, however, needs to be validated on an actual device.