-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soc/intel_adsp: ipc: initialize semaphore in driver init #72158
soc/intel_adsp: ipc: initialize semaphore in driver init #72158
Conversation
The ipc driver device data (struct intel_adsp_ipc_data) contains a semaphore. Upon device init, the device data is zeroed out. This is safe for other fields, but the semaphore should be properly initialized before use. This lack of initialization leads to a system crash when CONFIG_POLL is enabled (e.g. to enable CONFIG_SHELL), IPC driver handles an interrupt and executes k_sem_give() on a uninitialized semaphore object. This will eventually lead to null dereference in z_handle_obj_poll_events(). Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
e64bb96
to
f8b1cdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the change was tested in SOF CI.
@tmleman wrote:
Thanks, that's a good reminder! So not yet. I've tested this locally on cavs25 and ace15 platforms, but let me do a full test round as well just to be sure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one. I had to go back and check (it's been a while) but the architecture of this code is that the semaphore should be initialized on message send, is taken in a later message_sync() call, and is given only in the resulting ISR.
If someone is touching the semaphore in other contexts that almost certainly means that there's a race somewhere. Initializing the semaphore might prevent a crash but it's not going to save you from the resulting deadlock. Are there more details about what is going on?
Based on other communications from @kv2019i , I suspect CONFIG_SHELL is the key element to reproduce a crash. |
Ah, indeed, I'd seen that discussion go by SOF-side. So I'm guessing what's happening is that the IPC driver's semaphore is being added to a k_poll() list? Yeah, that won't work with the driver as it stands. k_poll() doesn't understand the idea of a semaphore being reset while it's in progress. If we want to do that we should update the driver to use the semaphore more conventionally, merely doing a k_sem_give() at send time and not an init, and increasing the max value to handle the resulting pipelining if someone races on the next send (that worry was why it was originally done like this, so that you recover after one cycle if the code blasts a bunch of racing sends). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 to capture my assumption that this driver is k_poll-incompatible and needs more work.
@andyross wrote:
There's no linkage between the IPC driver and Zephyr poll implementation directly. The problem happens with following sequence:
The SOF FW_READY should probably be changed so it sends the FW_READY via the driver, but I see no reason to leave a possibility to execute above code on uninitialized semaphore object. If the driver is initialized, it should be ready to handle interrupts. If the host is faulty (or modified with bad intentions) and sends unexpected IPC on boot, you can potentially hit this very hard to debug case (Zephyr fails silently here, no exception raised, in ISR context with spinlock held). And given I triggered this by only enabling CONFIG_SHELL=y in the build, not even running any shell code, it took a while to figure out what is happening and how linking (unused!) code can break DSP boot so fundamentally. But alas, enabling shell also impacted the Zephyr semaphore implementation, but it took a while for me to figure this out. I'd hope to spare other poor souls from this experience. CONFIG_POLL is only used for shell's internal implementation (for shell signals). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. To rephrase my understanding: the first IPC ISR is the result of a boot-time command sent outside this driver. So there's no active command and the semaphore is garbage. In that case I completely agree, just initialize it ahead of time and let the driver wake up a command that isn't waiting. Then the rest on the first "real" command will recover state. And I guess k_sem_give() on a zero-filled semaphore technically doesn't crash, only k_poll's augmented code (to catch the wakeup) does.
OK, I'm sold.
Though actually if you do respin for any reason, maybe augmenting the commit message to be more explicit about the case would be nice: "The driver can receive ISRs from IPC activity performed before and during Zephyr boot, so the semaphore must always be initialized", something like that. |
Extra manual test run with thesofproject/sof#9089 and now got a clean run with PR testing. Removing the DNM. |
The ipc driver device data (struct intel_adsp_ipc_data) contains a semaphore. Upon device init, the device data is zeroed out. This is safe for other fields, but the semaphore should be properly initialized before use.
This lack of initialization leads to a system crash when CONFIG_POLL is enabled (e.g. to enable CONFIG_SHELL), IPC driver handles an interrupt and executes k_sem_give() on a uninitialized semaphore object. This will eventually lead to null deferences in z_handle_obj_poll_events().