-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notification/interrupt latency: resetting IRQ notification mask takes hundreds of cycles #517
Comments
At least some of this is due to the linear search in both the IRQ handler and the I've been looking for an excuse to use perfect hashing, so I'm going to poke around and see if I can make this (I'm not sure about declaring interrupts outside of the kernel's ownership, but one of the other engineers may know off-hand) |
Declaring an |
@timblakely - incidentally - partially inspired by your use case here, I've added a sketch of a kernel mechanism for doing pin profiling on a scope or logic analyzer. Might be useful to you. #518 In this screenshot I've got That measurement will be a little imprecise because it's toggling the pin from the kernel, meaning, it will miss the register stacking code on kernel entry and exit. That should add only a few tens of cycles though. A sufficiently motivated individual could instrument the SVCall entry/exit sequence assembly in |
@timblakely - a further thought - you may wish to test whether |
Thanks for the input! And to be clear, if we're talking Thinking this through I can confirm this is running at 520MHz, otherwise my timer reload and compare calculations wouldn't be accurate enough to give me the 40kHz PWM. So minus one variable :) @mkeeter Thanks for looking into this! I've got a repro commit tagged to try it out (zero hurry from my end, of course :). @cbiffle I had it in one of my tasks' And I love the profiling in #518! It's a formalization of my hacky flip-bits-in-arbitrary-GPIOx-bits-at-random-points-in-the-Kernel, and is one less change in the core Hubris repo I need to keep track of. At the risk of yet another Regarding |
Quick update: looks like there were a few changes to |
Okay, have some data! Also, apologies ahead of time; I'm a bit long-winded at times 😅 Initial ISR latencyAll experiments done with With
|
This thread is all very cool and exciting, I don't have much to say other than:
This is #439. |
I would also love to do this, just work that hasn't been prioritized just yet. We started off aggressively trying to cache, and then kept running into problems where new things should cause a rebuild, and eventually said "you know what correctness matters more" and ended up just doing what we're doing now. It can be better though, you're right, it's just not easy. Someday... |
I have a draft of the perfect hashing up in #519 / the You're welcome to rebase onto it and see if it helps! It will depend on how many interrupts are in your app (i.e. defined in the |
@timblakely thanks for doing the parameter sweep of the optimization levels, it seems like 3 really is helping you with latency. That's great. Out of curiosity, is your code in SRAM?
I feel like I discover this on every application as soon as I turn on external event profiling like this. :-)
AFAICT that message never shows up. I've proposed hiding the linker messages and printing our own but at the time, folks were concerned about our estimate being unfaithful. I'm no longer concerned about this, fwiw.
If we get around to #240 this should go away. The current aggressive cleaning is, well, too aggressive. Ironically, given that the PAC crates are the longest things to compile in most applications, the PAC crates are largely responsible for the need for this -- they like to deposit linker scripts into random places in the build tree without properly generating the Incidentally, if you make sure to harmonize features across the cortex-m / cortex-m-rt / PAC crates in all your tasks, you can usually get it down two two PAC builds -- one for userland, one for the kernel. (Current minimum is two in practice because all PACs use features to do |
To be clear: definitely appreciate the cycles spent on issues like these, but I'm absolutely aware that the priority of Hubris - and Oxide as a whole - is getting products shipped/out the door; feature requests from out of nowhere are super low priority. Again, zero hurry from my end! Just really enjoying Hubris and its ethos so far and want to contribute where/when I can :)
Can confirm that it definitely helps. I've got three interrupts defined in my
Aye, and then it becomes "well crap, now if only I could remember how I fixed this the last half dozen times this happened...?" 😅
I got it to show up exactly once during my various experiments with Hubris. Unfortunately I'm afraid I can't seem to reproduce the environment that allowed it to happen :( Not an issue at all though, since at least
Ah, thanks for the lead. In adding support for the STM32h72x/3x series I've had to add a few additional base and
I... erm... good point. I assume not...? I'm trying to follow Hubris' safety and memory protection guidelines as close as possible. The * Is there a Hubris-specific way of annotating "this goes here" in the various Hubris-specific linker scripts? Overriding ISRsSide note: let me know if I should open another bug for this discussion, or if there's a more appropriate place for open-ended design discussions! Just trying to limit the "blast radius" of my completely-not-what-Hubris-was-designed-for ramblings :) One thing that I'm running into with custom ISR overriding is that AFICT they have to be defined in The former can be hacked around by telling the MPU that there's some arbitrary memory defined in The latter is a bit more nuanced. It seems like Idolatry servers and APIs are intended to be implemented as tasks, which means if a custom ISR is configured by an external task - i.e. the PWM duty cycles in a BLDC - at the moment it requires coordination between the Idolatry server and the relevant overridden ISR. I don't have a good mental model of how to coordinate between ISRs that are required to be defined in |
Thanks again for the discussion and pointers! The PRs already helped quite a bit, and since any additional optimization will probably be implementation-dependent I'm going to close this. That said, one last question: I know Oxide is hard at work and neck-deep in getting the racks out the door and I definitely don't want to add any unnecessary noise. Is there a better place to put more open-ended discussions on longer-term Hubris ideas/feature requests that are likely outside the immediate Oxide dev scope? |
Missed this before. Please feel free to continue filing issues here, (1) we don't really have a better option and (2) I'm interested to hear user reports even if I can't immediately act on them. Something else has occurred to me, which you might want to experiment with. There's this big comment at the top of the arm_m support talking about the use of PendSV. One of the things it alludes to, but doesn't flat-out state, is that our use of PendSV here will likely hurt interrupt response time. It's hard to say by how much, off the top of my head -- thanks to tail-chaining on the fancier M-series cores (like the one you're using) it doesn't mean we have an entire extra exception entry-exit sequence. However, both of the ISRs are going to do some save/restore because they're in Rust and save callee-save registers. So, it would be at least a few cycles cheaper to modify DefaultHandler to do the full context save/restore sequence, instead of poking PENDSVSET and returning. With the timescales you're seeing, a few cycles might not be very interesting, but, I wanted to mention it. |
Interesting. From the code comment it sounds like the only interrupt that isn't very likely to switch contexts is the
Ironically I came across that here and there in the various Cortex-M NVIC app notes, but never really gave it much thought until I saw your shock absorber implementation in m4vga-rs. That's a pretty slick little trick, that is! |
Heh, thanks! There's a discussion of it in prose here in case that's useful. I've merged a collection of kernel cleanups that, among other things, ought to further reduce interrupt/syscall latency. They're really intended for ARMv6-M, but they should show some improvement on your fancier M4. I'm not sure how much of a kernel diff you're carrying but this might be worth looking at. (Also it turns out there was a subtle bug in the kernel entry asm -- fixed now.) |
Greetings!
I'm
on an incredibly misguided misadventureexperimenting with using Hubris in a BLDC application. Given the high currents and potential for copious amounts of magic smoke, the, erm... "pucker factor" is pretty high, but so far so good! 😅I've gotten Hubris's codebase to the point where I can depend on it as if it were an external crate (lots of env variable overriding and explicit path setting, but that's for another bug), and am now working through the intricacies of how Hubris handles interrupts as notifications. It appears that there's a Not Small™️ amount of overhead associated with interrupt/notification handling, at least in terms of a 40KHz control loop. I'm building this on an H723 where I've confirmed I can boost the
SYSCLK
to 520MHz (!), with anAHBx
bus speed of 270MHz, which means I've got just about 13k instructions to do the full commutation + FoC computation, including overhead.I've been benchmarking the interrupt latency using an Idolatry server via
idol_runtime::dispatch_n
and thehandle_notitication
section of theNotificationHandler
trait. For the benchmark setup I've got a timer going at full tilt, have the OC1 channel drive a pin high, and catch the update interrupt. The moment we enterDefaultHandler
fromarm_m.rs
I drive a GPIO pin high via theBSRR
register so I can tell exactly when we enter the interrupt handler. Hubris then detects the task that needed the interrupt, context switches intoPendSV
, and resumes the task. Since this is anidol
server, thesys_recv
call indispatch_n
returns and in turn calls thehandle_notification
method on the server.Apologies, this is kinda wordy; here's a probe of what's going on here (this is done at
opt-level = 3
, which is both different than the stock Hubris config and also much larger in terms of flash):DefaultHandler
to drive the cyan trace high, indicating the first moment we get into Hubris.handle_notification
, I first drive the magenta pin high, do some "work" inside my custom interrupt handler (200xcore::asm::nop()
calls just for this example), then drive it low when I'm effectively done with what I need to do in my interrupt handler.sys_irq_control
call, which I've wrapped with a call to drive the blue tracehigh and low before/after the call:handle_notification
I set the cyan trace low.Side note: I need to re-confirm this is at 520MHz... I had previously done so via probing
MCO
, but the 40ns delay between the magenta drop and the blue raise - two consecutive writes toBSRR
- admittedly is a bit concerning, considering that translates to ~10 instructions instead of just a single one. Will look at the disassembly after filing this...If I'm reading this correctly, this takes about 1.3us to get into the
handle_notification
call, or ~680 instructions (!). I know there's a decent amount of overhead in context switching, finding the right handling task, checking masks. setting up the MPU, etc. It still takes quite a while even onO3
, though judging by the use cases I can find in the repo I suspect interrupt latency has not been a target for optimization. That said the more surprising part is that simply notifying Hubris that I want to reset and listen again for the interrupt (sys_irq_control
) takes over 3x as long as the actual interrupt itself (!!!), which translates to multiple hundreds of cycles at 520MHz. This seems... inefficient? Looking through the code it seems to need to call into the kernel to do so, and I haven't been able to trace beyond the kernel context switch insys_irq_control_stub
. Any ideas on how to handle notifications like this faster? Am I just missing an optimization flag or three?As a workaround: how does one go about overriding interrupt handlers to use 'em outside of the Hubris framework? The reference manual suggests they're declared
weak
, but I tried usingcortex-m-rt
's#[interrupt]
decorator around afn TIM1_UP
and it didn't seem to override theDefaultHandler
Hubris handler. I don't see any instances of overriding the default interrupt handler within the Hubris codebase, though there's a pretty good chance I just plain missed it :)The text was updated successfully, but these errors were encountered: