Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DSP crash in copier_ipcgtw_create() after Zephyr 26th Nov update #9687

Closed
kv2019i opened this issue Nov 27, 2024 · 5 comments
Closed
Assignees
Labels
bug Something isn't working as expected PTL Intel Panther Lake platform

Comments

@kv2019i
Copy link
Collaborator

kv2019i commented Nov 27, 2024

Describe the bug
DSP panic seen with Zephyr update in #9671 . This pull-request without the Zephyr update didn't trigger issue, so must be related to the Zephyr version update.

To Reproduce
Tests triggered by CI

Reproduction Rate
100%

Expected behavior
No DSP crash

Impact
Blocking Zephyr updates as a test in PR testing is failing.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: Intel test 01_21_TestCopierIpc
  3. Name of the platform(s) on which the bug is observed.
    • Platform: Intel ptl

Screenshots or console output

*** Booting Zephyr OS build v4.0.0-831-ga14ae39e7447 ***
[00:00:01.946,813] <inf> main: sof_app_main: SOF on intel_adsp
[00:00:01.946,936] <inf> main: sof_app_main: SOF initialized
[00:00:02.037,930] <err> os: print_fatal_exception:  ** FATAL EXCEPTION
[00:00:02.038,041] <err> os: print_fatal_exception:  ** CPU 0 EXCCAUSE 29 (store prohibited)
[00:00:02.038,071] <err> os: print_fatal_exception:  **  PC 0xa007f894 VADDR 0xa00303bc
[00:00:02.038,100] <err> os: print_fatal_exception:  **  PS 0x60823
[00:00:02.038,126] <err> os: print_fatal_exception:  **    (INTLEVEL:3 EXCM: 0 UM:1 RING:0 WOE:1 OWB:8 CALLINC:2)
[00:00:02.038,216] <err> os: xtensa_dump_stack:  **  A0 0xa0063640  SP 0xa00fbd70  A2 0xa0149100  A3 0x40148ba8
[00:00:02.038,248] <err> os: xtensa_dump_stack:  **  A4 (nil)  A5 0xa01491b0  A6 0xb8  A7 0x19fff
[00:00:02.038,275] <err> os: xtensa_dump_stack:  **  A8 0x19fff  A9 0xa00fbd40 A10 0x40147000 A11 0x448
[00:00:02.038,301] <err> os: xtensa_dump_stack:  ** A12 0x40149240 A13 0x10006 A14 0xa01491b0 A15 (nil)
[00:00:02.038,328] <err> os: xtensa_dump_stack:  ** LBEG 0xa004f591 LEND 0xa004f59d LCOUNT 0xf
[00:00:02.038,355] <err> os: xtensa_dump_stack:  ** SAR 0x20
[00:00:02.038,381] <err> os: xtensa_dump_stack:  **  THREADPTR (nil)


Backtrace:0xa007f891:0xa00fbd70 0xa006363d:0xa00fbd90 0xa0061dde:0xa00fbdb0 0xa006b2a7:0xa00fbdd0 0xa0070ac1:0xa00fbe00 0xa006d021:0xa00fbe40 0xa006c7f6:0xa00fbe70 0xa003e25c:0xa00fbea0 0xa0060b05:0xa00fbf00 0xa003d555:0xa00fbf30 0xa0061f4a:0xa00fbf60 0xa0061a68:0xa00fbf80 0xa00647f2:0xa00fbfa0 0xa00581c1:0xa00fbfc0 0xa005bcf3:0xa00fbff0 

[00:00:02.039,546] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.039,598] <err> os: z_fatal_error: Current thread: 0x40146998 (unknown)
[00:00:02.058,460] <err> zephyr: k_sys_fatal_error_handler: Halting system
@kv2019i kv2019i added bug Something isn't working as expected PTL Intel Panther Lake platform labels Nov 27, 2024
@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 27, 2024

Backtrace parsed shows following:

comp_new_ipc4

  • copier_init
  • copier_ipcgtw_create
  • comp_buffer_connect
  • buffer_attach
    -> next->prev = item; /* next is NULL */

Code analysis did not reveal any obvious problems and how the attached item could be NULL.

It does seem this is not a new bug however, but changes in Zephyr upstream for PTL to make the zero address invalid, may just allow to trap stores with a NULL base address (on other platforms these do not trigger a fault).

@kv2019i kv2019i changed the title [BUG] DSP crash after Zephyr 26th Nov update [BUG] DSP crash in copier_ipcgtw_create() after Zephyr 26th Nov update Nov 27, 2024
@tmleman
Copy link
Contributor

tmleman commented Nov 28, 2024

@kv2019i fix: #9689

@tmleman
Copy link
Contributor

tmleman commented Nov 28, 2024

Next fix: #9691

tmleman added a commit to tmleman/sof that referenced this issue Nov 28, 2024
This patch addresses a potential NULL pointer dereference issue in the
`devicelist_reset` function within the Key Phrase Buffer (KPB)
component. The issue was exposed by a recent change in Zephyr's MMU
mapping for Intel ADSP ACE30, which now catches NULL pointer accesses.

The `devicelist_reset` function previously iterated over the entire
`DEVICE_LIST_SIZE` when clearing items and zeroing pointers, which could
lead to dereferencing NULL pointers. The fix involves iterating only up
to `devlist->count` to ensure that only valid pointers are accessed.

This change prevents potential NULL pointer dereference and ensures the
stability of the KPB component.

Link: thesofproject#9687

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
tmleman added a commit to tmleman/sof that referenced this issue Nov 28, 2024
This patch addresses a NULL dereference issue in the SOF firmware that
was exposed by a recent change in Zephyr's MMU mapping for Intel ADSP
ACE30. The change prevents mapping of the 0x0 address, which helps catch
NULL pointer accesses.

The issue was identified during testing, where an exception occurred due
to uninitialized buffer lists in the `comp_dev` structure. The
`list_init` function is called in `comp_new()` (for both IPC3 and IPC4),
but a NULL dereference can happen in the component `ops->create()`
function, which is called before the list is initialized. One affected
component is IPC4 `copier_ipcgtw`.

To fix this, the `bsink_list` and `bsource_list` are now initialized in
the `comp_alloc` function. This ensures that the lists point to
themselves before any use, preventing NULL dereference and subsequent
exceptions.

Link: thesofproject#9687

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
tmleman added a commit to tmleman/sof that referenced this issue Nov 28, 2024
This patch addresses a NULL dereference issue in the SOF firmware that
was exposed by a recent change in Zephyr's MMU mapping for Intel ADSP
ACE30. The change prevents mapping of the 0x0 address, which helps catch
NULL pointer accesses.

The issue was identified during testing, where an exception occurred due
to uninitialized buffer lists in the `comp_dev` structure. The
`list_init` function is called in `comp_new()` (for both IPC3 and IPC4),
but a NULL dereference can happen in the component `ops->create()`
function, which is called before the list is initialized. One affected
component is IPC4 `copier_ipcgtw`.

To fix this, the `bsink_list` and `bsource_list` are now initialized in
the `comp_alloc` function. This ensures that the lists point to
themselves before any use, preventing NULL dereference and subsequent
exceptions.

Link: thesofproject#9687

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
(cherry picked from commit 5f5588c)
tmleman added a commit to tmleman/sof that referenced this issue Nov 28, 2024
This patch addresses a potential NULL pointer dereference issue in the
`devicelist_reset` function within the Key Phrase Buffer (KPB)
component. The issue was exposed by a recent change in Zephyr's MMU
mapping for Intel ADSP ACE30, which now catches NULL pointer accesses.

The `devicelist_reset` function previously iterated over the entire
`DEVICE_LIST_SIZE` when clearing items and zeroing pointers, which could
lead to dereferencing NULL pointers. The fix involves iterating only up
to `devlist->count` to ensure that only valid pointers are accessed.

This change prevents potential NULL pointer dereference and ensures the
stability of the KPB component.

Link: thesofproject#9687

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
kv2019i pushed a commit that referenced this issue Dec 3, 2024
This patch addresses a NULL dereference issue in the SOF firmware that
was exposed by a recent change in Zephyr's MMU mapping for Intel ADSP
ACE30. The change prevents mapping of the 0x0 address, which helps catch
NULL pointer accesses.

The issue was identified during testing, where an exception occurred due
to uninitialized buffer lists in the `comp_dev` structure. The
`list_init` function is called in `comp_new()` (for both IPC3 and IPC4),
but a NULL dereference can happen in the component `ops->create()`
function, which is called before the list is initialized. One affected
component is IPC4 `copier_ipcgtw`.

To fix this, the `bsink_list` and `bsource_list` are now initialized in
the `comp_alloc` function. This ensures that the lists point to
themselves before any use, preventing NULL dereference and subsequent
exceptions.

Link: #9687

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
lgirdwood pushed a commit that referenced this issue Dec 3, 2024
This patch addresses a potential NULL pointer dereference issue in the
`devicelist_reset` function within the Key Phrase Buffer (KPB)
component. The issue was exposed by a recent change in Zephyr's MMU
mapping for Intel ADSP ACE30, which now catches NULL pointer accesses.

The `devicelist_reset` function previously iterated over the entire
`DEVICE_LIST_SIZE` when clearing items and zeroing pointers, which could
lead to dereferencing NULL pointers. The fix involves iterating only up
to `devlist->count` to ensure that only valid pointers are accessed.

This change prevents potential NULL pointer dereference and ensures the
stability of the KPB component.

Link: #9687

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
@wszypelt
Copy link

@kv2019i please verify

@kv2019i
Copy link
Collaborator Author

kv2019i commented Dec 13, 2024

Bug fixes merged, so this can be closed.

@kv2019i kv2019i closed this as completed Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected PTL Intel Panther Lake platform
Projects
None yet
Development

No branches or pull requests

3 participants