Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zephyr: lib: alloc: Use cached memory for L3 Heap #8632

Merged
merged 2 commits into from
Feb 1, 2024

Conversation

jxstelter
Copy link
Contributor

This patch implements recommended hardware flow for Intel ACE platforms.
The L3 heap should be accessed via cached pointers including management data.
This change is fixing D3 save/restore flow on LNL.

Signed-off-by: Jaroslaw Stelter Jaroslaw.Stelter@intel.com

@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch 2 times, most recently from 7278d39 to 310f7c1 Compare December 15, 2023 10:29
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question from me on the code.

Thinking long term I guess a lot of this heap mgmt will end up in Zephyr, but this will need memory native API usage 1st.

zephyr/lib/alloc.c Outdated Show resolved Hide resolved
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch 2 times, most recently from 23dbee2 to dc4585b Compare January 16, 2024 14:27
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
@jxstelter jxstelter changed the title zephyr: lib: alloc: Use cached memory for L3 Heap [DNM] zephyr: lib: alloc: Use cached memory for L3 Heap Jan 19, 2024
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from dc4585b to de6ed62 Compare January 19, 2024 14:36
@lgirdwood lgirdwood added this to the v2.9 milestone Jan 24, 2024
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some opens, but I think some inline comments could allow everyone to follow. CI Zephyr failures unrelated to this PR.

@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch 3 times, most recently from d3433aa to cfe9e9b Compare January 25, 2024 11:39
@jxstelter jxstelter changed the title [DNM] zephyr: lib: alloc: Use cached memory for L3 Heap zephyr: lib: alloc: Use cached memory for L3 Heap Jan 25, 2024
@jxstelter jxstelter marked this pull request as ready for review January 25, 2024 12:16
@jxstelter jxstelter requested a review from dabekjakub as a code owner January 25, 2024 12:16
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from cfe9e9b to c383d6a Compare January 25, 2024 12:30
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline

zephyr/lib/alloc.c Outdated Show resolved Hide resolved
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from c383d6a to 50a8774 Compare January 25, 2024 15:19
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
zephyr/lib/alloc.c Outdated Show resolved Hide resolved
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from 50a8774 to 235ecaa Compare January 26, 2024 10:08
* l3_heap.
*/
min_align = MAX(PLATFORM_DCACHE_ALIGN, min_align);
bytes = ALIGN_UP(bytes, min_align);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to think about it: now we limit the whole L3 heap to core 0, right? Doesn't this mean that we don't need cache-line alignment any more? I don't think Zephyr's heap API explicitly manipulates the caches either, but please double-check. So I would expect this function to be just an alias to l3_heap_alloc_aligned()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zephyr heap is not manipulating this, but cache line alignment is required at least for library manager functionality.
It allocates buffer from L3_HEAP for loadable library storage. After load the library image to this buffer we need to flush cache to have whole library stored in memory. It could be accessed after wake from D3. On the other hands, the D3 context save/restore buffer is also allocated and flushed before D3 entry. Without the cache line alignment I see failures in tests where D3 entry/exit is introduced between Load Library phase and Init Instance phase (this is real E2E scenario).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache line alignment is required at least for library manager functionality.

then it's the caller who has to specify that alignment, in this case the library manager. That's why we have alignment as a function parameter here.

Copy link
Contributor Author

@jxstelter jxstelter Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true. Unfortunately alignment parameter is omitted in upper API function
void *rmalloc(enum mem_zone zone, uint32_t flags, uint32_t caps, size_t bytes). I can fulfill your request adding new API function e.g.
void *rmalloc_alligned(enum mem_zone zone, uint32_t flags, uint32_t caps, size_t bytes, uint32_t alignment).
Another alternative would be to extend rballoc_align() functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

zephyr/lib/alloc.c Outdated Show resolved Hide resolved
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from 235ecaa to 640d630 Compare January 30, 2024 15:58
This patch implements recommended hardware flow for
Intel ACE platforms.
The L3 heap should be accessed via cached pointers
including management data.

Signed-off-by: Jaroslaw Stelter <Jaroslaw.Stelter@intel.com>
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from 640d630 to ed65fba Compare January 31, 2024 10:26
@jxstelter jxstelter force-pushed the lnl-rvp-fix-d3-save-restore branch from ed65fba to f3fda21 Compare January 31, 2024 10:39
@lgirdwood
Copy link
Member

@lyakh I think your review comments all addressed now.

heap = &l3_heap;
return (__sparse_force void *)l3_heap_alloc_aligned(heap, align, bytes);
#else
k_panic();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of theoretical, but maybe a small follow-up PR to convert this and line 301 above to return NULL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh this isn't theoretical based on the find by @marc-hb in the fuzzer logs

/* If no IMR buffer we can not recover */
if (!global_imr_ram_storage) {
tr_err(&zephyr_tr, "failed to allocate global_imr_ram_storage");
k_panic();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means: the DSP is being powered down and we try to allocate an IMR buffer to safe state, and if we fail, we panic. Would it be possible to just abort saving state and go for a clean power off? Also rather theoretical, but would probably be better not to bother the host with a DSP panic but just to make this a clean shut down? @ujfalusi Can be an incremental PR too

@kv2019i kv2019i merged commit 8cad5be into thesofproject:main Feb 1, 2024
40 of 44 checks passed
@marc-hb
Copy link
Collaborator

marc-hb commented Feb 2, 2024

This PR broke fuzzing, @andyross any idea why?

I fails very quickly and I can reproduce locally without any issue.

https://github.com/thesofproject/sof/actions/runs/7724788048/job/21057626265
https://github.com/thesofproject/sof/actions/runs/7749595113/job/21134436910

It was harder to notice because zmain has been broken for a while but it was not that hard to notice.

@jxstelter please pay attention to test results next time, reviewers have one order of magnitude more PRs to pay attention to.

#119902	NEW    cov: 729 ft: 3099 corp: 335/18Kb lim: 240 exec/s: 7993 rss: 87Mb L: 61/239 MS: 1 EraseBytes-
Exiting due to fatal error
==5292== ERROR: libFuzzer: fuzz target exited
    #0 0x8134c42 in __sanitizer_print_stack_trace (/home/runner/work/sof/sof/workspace/build-fuzz/zephyr/zephyr.exe+0x8134c42)
    #1 0x80aa4d6 in fuzzer::PrintStackTrace() (/home/runner/work/sof/sof/workspace/build-fuzz/zephyr/zephyr.exe+0x80aa4d6)
    #2 0x8090897 in fuzzer::Fuzzer::ExitCallback() (/home/runner/work/sof/sof/workspace/build-fuzz/zephyr/zephyr.exe+0x8090897)
    #3 0x809082e in fuzzer::Fuzzer::StaticExitCallback() (/home/runner/work/sof/sof/workspace/build-fuzz/zephyr/zephyr.exe+0x809082e)
    #4 0xf783a332  (/lib/i386-linux-gnu/libc.so.6+0x3a332) (BuildId: 0598ef3e075d7653ff4d565675d15666ec9b7b31)
    #5 0xf783a486 in exit (/lib/i386-linux-gnu/libc.so.6+0x3a486) (BuildId: 0598ef3e075d7653ff4d565675d15666ec9b7b31)
    #6 0x8194c5c in posix_exit /home/runner/work/sof/sof/workspace/zephyr/boards/posix/native_posix/main.c:51:2

SUMMARY: libFuzzer: fuzz target exited

@btian1
Copy link
Contributor

btian1 commented Feb 2, 2024

@marc-hb ,Thanks for point out, my PR also got FUZZY error, I am looking for the reason, could you help double confirm below FUZZY error also caused by this:
https://github.com/thesofproject/sof/actions/runs/7751869504/job/21140431875?pr=8822
?

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 2, 2024

I debugged this a bit. This PR is probably just the messenger but this fuzzing failure looks like a "good catch" to me.

One of the new k_panic() gets triggered because caps & SOF_MEM_CAPS_L3 is true even when CONFIG_L3_HEAP is false.

I think the reason caps & SOF_MEM_CAPS_L3 is true is because... caps comes directly from untrusted IPC input!? Why would IPCs be able to set caps directly?

The panic happens when ipc_glb_tplg_buffer_new() does this:

ret = ipc_buffer_new(ipc, (struct sof_ipc_buffer *)ipc->comp_data); 

At this point comp_data looks like it came straight from the fuzzer's untrusted input:

(gdb) p /x *desc

$5 = {comp = {hdr = {size = 0x66, cmd = 0x3020ffff}, id = 0xb6b6b600, type = 0xb6b6b6b6, pipeline_id = 0xb6b6b6b6, core = 0xffffb236, 
    ext_data_length = 0xffffffff}, size = 0xffffffff, caps = 0xffffffff, flags = 0xffffffff, reserved = 0xffffffff}
#0  rballoc_align (flags=0, caps=4294903040, bytes=86376703, align=64) at sof/zephyr/lib/alloc.c:398
#1  0x0831d38c in buffer_alloc (size=86376703, caps=4294903040, flags=4294967295, align=64, is_shared=false)
    at sof/src/audio/buffer.c:58
#2  0x082e4d6b in buffer_new (desc=0x8b184c0 <heapmem+1728>, is_shared=false) at sof/src/ipc/ipc-helper.c:48
#3  0x082cc78e in ipc_buffer_new (ipc=0x8b183c0 <heapmem+1472>, desc=0x8b184c0 <heapmem+1728>) at sof/src/ipc/ipc3/helper.c:459
#4  0x082b7249 in ipc_glb_tplg_buffer_new (header=807469055) at sof/src/ipc/ipc3/handler.c:1305
#5  0x082b090c in ipc_glb_tplg_message (header=807469055) at sof/src/ipc/ipc3/handler.c:1416
#6  0x082aff8a in ipc_cmd (_hdr=0x8b184c0 <heapmem+1728>) at sof/src/ipc/ipc3/handler.c:1651
#7  0x082fcb1b in ipc_platform_do_cmd (ipc=0x8b183c0 <heapmem+1472>) at sof/src/platform/posix/ipc.c:162
#8  0x082e2827 in ipc_do_cmd (data=0x8b183c0 <heapmem+1472>) at sof/src/ipc/ipc-common.c:328
#9  0x0836a27a in task_run (task=0x8b183e8 <heapmem+1512>) at sof/zephyr/include/rtos/task.h:94

@kv2019i
Copy link
Collaborator

kv2019i commented Feb 2, 2024

D'oh, my mistake, I did not notice the fail when merging. @jxstelter can you take a quick look so we don't need a revert?

@cujomalainey
Copy link
Contributor

@marc-hb thanks for the find, indeed if we are trusting IPC on hw limits then we are not secure

@andyross
Copy link
Contributor

andyross commented Feb 2, 2024

Nice. Frustrating that k_panic() seems not to appear in the fuzzing backtrace. I'm guessing that's a misfeature in native_posix where it gets "handled" via an exit(), when what we really want is for it to do something (like crash or abort) that is visible to the framework as an error to be dumped.

marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 2, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A practical use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 2, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A practical use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator

marc-hb commented Feb 2, 2024

A quick fix seems very unlikely so I filed new issue #8832, please move the discussion there.

marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 3, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A practical use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

In such a case, this commit adds value even before using a
debugger. Without this commit, confusingly meaningless stack trace:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
Exiting due to fatal error
==314134== ERROR: libFuzzer: fuzz target exited
    #0 0x81d9637 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9637)
    #1 0x80cc42b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc42b)
    zephyrproject-rtos#2 0x80ab79e in fuzzer::Fuzzer::ExitCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab864 in fuzzer::Fuzzer::StaticExitCallback() (zephyr.exe+
    zephyrproject-rtos#4 0xf783dfe8  (/usr/lib32/libc.so.6+0x3dfe8)
    zephyrproject-rtos#5 0xf783e1e6 in exit (/usr/lib32/libc.so.6+0x3e1e6)
    zephyrproject-rtos#6 0x82a5488 in posix_exit boards/posix/native_posix/main.c:51:2

SUMMARY: libFuzzer: fuzz target exited
```

Thanks to this commit the `k_panic()` location is immediately available
in the logs without even running anything:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
==315176== ERROR: libFuzzer: deadly signal
LLVMSymbolizer: error reading file: No such file or directory
    #0 0x81d9647 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9647)
    #1 0x80cc43b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc43b)
    zephyrproject-rtos#2 0x80ab6be in fuzzer::Fuzzer::CrashCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab77b in fuzzer::Fuzzer::StaticCrashSignalCallback()
    zephyrproject-rtos#4 0xf7f3159f  (linux-gate.so.1+0x59f)
    zephyrproject-rtos#5 0xf7f31578  (linux-gate.so.1+0x578)
    zephyrproject-rtos#6 0xf788ea16  (/usr/lib32/libc.so.6+0x8ea16)
    zephyrproject-rtos#7 0xf783b316 in raise (/usr/lib32/libc.so.6+0x3b316)
    zephyrproject-rtos#8 0xf7822120 in abort (/usr/lib32/libc.so.6+0x22120)
    zephyrproject-rtos#9 0x82afbde in ipc_cmd src/ipc/ipc3/handler.c:1623:2

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better
    crash reports.
SUMMARY: libFuzzer: deadly signal
```

Full stack trace When running zephyr.exe in gdb:

```

./scripts/fuzz.sh  -- -DEXTRA_CFLAGS="-O0 -g3"

gdb ./zephyr.exe

backtrace

 zephyrproject-rtos#2  0xf783b317 in raise () from /usr/lib32/libc.so.6
 zephyrproject-rtos#3  0xf7822121 in abort () from /usr/lib32/libc.so.6
 zephyrproject-rtos#4  0x082afbdf in ipc_cmd (_hdr=0x8b...) at src/ipc/ipc3/handler.c:1623
 zephyrproject-rtos#5  0x082fbf4b in ipc_platform_do_cmd (ipc=0x8b161c0)
                                    at src/platform/posix/ipc.c:162
 zephyrproject-rtos#6  0x082e1e07 in ipc_do_cmd (data=0x8b161c0 <heapmem+1472>)
                                    at src/ipc/ipc-common.c:328
 zephyrproject-rtos#7  0x083696aa in task_run (task=0x8b161e8 <heapmem+1512>)
                                    at zephyr/include/rtos/task.h:94
 zephyrproject-rtos#8  0x083682dc in edf_work_handler (work=0x8b1621c <heapmem+1564>)
                                    at zephyr/edf_schedule.c:32
 zephyrproject-rtos#9  0x085245af in work_queue_main (workq_ptr=0x8b15b00 <edf_workq>,...)
                                         at zephyr/kernel/work.c:688
 zephyrproject-rtos#10 0x0823a6bc in z_thread_entry (entry=0x8523be0 <work_queue_main>,..
                                    at zephyr/lib/os/thread_entry.c:48
 zephyrproject-rtos#11 0x0829a6a1 in posix_arch_thread_entry (pa_thread_status=0x8630648 ..
                                  at zephyr/arch/posix/core/thread.c:56
 zephyrproject-rtos#12 0x0829c043 in posix_thread_starter (arg=0x4)
                              at zephyr/arch/posix/core/posix_core.c:293
 zephyrproject-rtos#13 0x080f6041 in asan_thread_start(void*) ()
 zephyrproject-rtos#14 0xf788c73c in ?? () from /usr/lib32/libc.so.6
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 3, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A practical use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

In such a case, this commit adds value even before using a
debugger. Without this commit, confusingly meaningless stack trace:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
Exiting due to fatal error
==314134== ERROR: libFuzzer: fuzz target exited
    #0 0x81d9637 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9637)
    #1 0x80cc42b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc42b)
    zephyrproject-rtos#2 0x80ab79e in fuzzer::Fuzzer::ExitCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab864 in fuzzer::Fuzzer::StaticExitCallback() (zephyr.exe+
    zephyrproject-rtos#4 0xf783dfe8  (/usr/lib32/libc.so.6+0x3dfe8)
    zephyrproject-rtos#5 0xf783e1e6 in exit (/usr/lib32/libc.so.6+0x3e1e6)
    zephyrproject-rtos#6 0x82a5488 in posix_exit boards/posix/native_posix/main.c:51:2

SUMMARY: libFuzzer: fuzz target exited
```

Thanks to this commit the `k_panic()` location is immediately available
in the logs without even running anything:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
==315176== ERROR: libFuzzer: deadly signal
LLVMSymbolizer: error reading file: No such file or directory
    #0 0x81d9647 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9647)
    #1 0x80cc43b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc43b)
    zephyrproject-rtos#2 0x80ab6be in fuzzer::Fuzzer::CrashCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab77b in fuzzer::Fuzzer::StaticCrashSignalCallback()
    zephyrproject-rtos#4 0xf7f3159f  (linux-gate.so.1+0x59f)
    zephyrproject-rtos#5 0xf7f31578  (linux-gate.so.1+0x578)
    zephyrproject-rtos#6 0xf788ea16  (/usr/lib32/libc.so.6+0x8ea16)
    zephyrproject-rtos#7 0xf783b316 in raise (/usr/lib32/libc.so.6+0x3b316)
    zephyrproject-rtos#8 0xf7822120 in abort (/usr/lib32/libc.so.6+0x22120)
    zephyrproject-rtos#9 0x82afbde in ipc_cmd src/ipc/ipc3/handler.c:1623:2

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better
    crash reports.
SUMMARY: libFuzzer: deadly signal
```

Full stack trace When running zephyr.exe in gdb:

```

./scripts/fuzz.sh  -- -DEXTRA_CFLAGS="-O0 -g3"

gdb ./zephyr.exe

backtrace

 zephyrproject-rtos#2  0xf783b317 in raise () from /usr/lib32/libc.so.6
 zephyrproject-rtos#3  0xf7822121 in abort () from /usr/lib32/libc.so.6
 zephyrproject-rtos#4  0x082afbdf in ipc_cmd (_hdr=0x8b...) at src/ipc/ipc3/handler.c:1623
 zephyrproject-rtos#5  0x082fbf4b in ipc_platform_do_cmd (ipc=0x8b161c0)
                                    at src/platform/posix/ipc.c:162
 zephyrproject-rtos#6  0x082e1e07 in ipc_do_cmd (data=0x8b161c0 <heapmem+1472>)
                                    at src/ipc/ipc-common.c:328
 zephyrproject-rtos#7  0x083696aa in task_run (task=0x8b161e8 <heapmem+1512>)
                                    at zephyr/include/rtos/task.h:94
 zephyrproject-rtos#8  0x083682dc in edf_work_handler (work=0x8b1621c <heapmem+1564>)
                                    at zephyr/edf_schedule.c:32
 zephyrproject-rtos#9  0x085245af in work_queue_main (workq_ptr=0x8b15b00 <edf_workq>,...)
                                         at zephyr/kernel/work.c:688
 zephyrproject-rtos#10 0x0823a6bc in z_thread_entry (entry=0x8523be0 <work_queue_main>,..
                                    at zephyr/lib/os/thread_entry.c:48
 zephyrproject-rtos#11 0x0829a6a1 in posix_arch_thread_entry (pa_thread_status=0x8630648 ..
                                  at zephyr/arch/posix/core/thread.c:56
 zephyrproject-rtos#12 0x0829c043 in posix_thread_starter (arg=0x4)
                              at zephyr/arch/posix/core/posix_core.c:293
 zephyrproject-rtos#13 0x080f6041 in asan_thread_start(void*) ()
 zephyrproject-rtos#14 0xf788c73c in ?? () from /usr/lib32/libc.so.6
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 3, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A good, sample use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

In such a case, this commit adds value even before using a
debugger. Without this commit, confusingly meaningless stack trace:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
Exiting due to fatal error
==314134== ERROR: libFuzzer: fuzz target exited
    #0 0x81d9637 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9637)
    #1 0x80cc42b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc42b)
    zephyrproject-rtos#2 0x80ab79e in fuzzer::Fuzzer::ExitCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab864 in fuzzer::Fuzzer::StaticExitCallback() (zephyr.exe+
    zephyrproject-rtos#4 0xf783dfe8  (/usr/lib32/libc.so.6+0x3dfe8)
    zephyrproject-rtos#5 0xf783e1e6 in exit (/usr/lib32/libc.so.6+0x3e1e6)
    zephyrproject-rtos#6 0x82a5488 in posix_exit boards/posix/native_posix/main.c:51:2

SUMMARY: libFuzzer: fuzz target exited
```

Thanks to this commit the `k_panic()` location is now immediately
available in test logs without even running anything locally:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
@ WEST_TOPDIR/sof/src/ipc/ipc3/handler.c:1623
ZEPHYR FATAL ERROR: 3
==315176== ERROR: libFuzzer: deadly signal
LLVMSymbolizer: error reading file: No such file or directory
    #0 0x81d9647 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9647)
    #1 0x80cc43b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc43b)
    zephyrproject-rtos#2 0x80ab6be in fuzzer::Fuzzer::CrashCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab77b in fuzzer::Fuzzer::StaticCrashSignalCallback()
    zephyrproject-rtos#4 0xf7f3159f  (linux-gate.so.1+0x59f)
    zephyrproject-rtos#5 0xf7f31578  (linux-gate.so.1+0x578)
    zephyrproject-rtos#6 0xf788ea16  (/usr/lib32/libc.so.6+0x8ea16)
    zephyrproject-rtos#7 0xf783b316 in raise (/usr/lib32/libc.so.6+0x3b316)
    zephyrproject-rtos#8 0xf7822120 in abort (/usr/lib32/libc.so.6+0x22120)
    zephyrproject-rtos#9 0x82afbde in ipc_cmd src/ipc/ipc3/handler.c:1623:2

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better
    crash reports.
SUMMARY: libFuzzer: deadly signal
```

The full stack trace is now immediately available when running
zephyr.exe in gdb:

```
./scripts/fuzz.sh  -- -DEXTRA_CFLAGS="-O0 -g3"

gdb build-fuzz/zephyr/zephyr.exe

run
backtrace

 zephyrproject-rtos#2  0xf783b317 in raise () from /usr/lib32/libc.so.6
 zephyrproject-rtos#3  0xf7822121 in abort () from /usr/lib32/libc.so.6
 zephyrproject-rtos#4  0x082afbdf in ipc_cmd (_hdr=0x8b...) at src/ipc/ipc3/handler.c:1623
 zephyrproject-rtos#5  0x082fbf4b in ipc_platform_do_cmd (ipc=0x8b161c0)
                                    at src/platform/posix/ipc.c:162
 zephyrproject-rtos#6  0x082e1e07 in ipc_do_cmd (data=0x8b161c0 <heapmem+1472>)
                                    at src/ipc/ipc-common.c:328
 zephyrproject-rtos#7  0x083696aa in task_run (task=0x8b161e8 <heapmem+1512>)
                                    at zephyr/include/rtos/task.h:94
 zephyrproject-rtos#8  0x083682dc in edf_work_handler (work=0x8b1621c <heapmem+1564>)
                                    at zephyr/edf_schedule.c:32
 zephyrproject-rtos#9  0x085245af in work_queue_main (workq_ptr=0x8b15b00 <edf_workq>,...)
                                         at zephyr/kernel/work.c:688
 zephyrproject-rtos#10 0x0823a6bc in z_thread_entry (entry=0x8523be0 <work_queue_main>,..
                                    at zephyr/lib/os/thread_entry.c:48
 zephyrproject-rtos#11 0x0829a6a1 in posix_arch_thread_entry (pa_thread_status=0x8630648 ..
                                  at zephyr/arch/posix/core/thread.c:56
 zephyrproject-rtos#12 0x0829c043 in posix_thread_starter (arg=0x4)
                              at zephyr/arch/posix/core/posix_core.c:293
 zephyrproject-rtos#13 0x080f6041 in asan_thread_start(void*) ()
 zephyrproject-rtos#14 0xf788c73c in ?? () from /usr/lib32/libc.so.6
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 5, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A good, sample use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

In such a case, this commit adds value even before using a
debugger. Without this commit, confusingly meaningless stack trace:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
Exiting due to fatal error
==314134== ERROR: libFuzzer: fuzz target exited
    #0 0x81d9637 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9637)
    #1 0x80cc42b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc42b)
    zephyrproject-rtos#2 0x80ab79e in fuzzer::Fuzzer::ExitCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab864 in fuzzer::Fuzzer::StaticExitCallback() (zephyr.exe+
    zephyrproject-rtos#4 0xf783dfe8  (/usr/lib32/libc.so.6+0x3dfe8)
    zephyrproject-rtos#5 0xf783e1e6 in exit (/usr/lib32/libc.so.6+0x3e1e6)
    zephyrproject-rtos#6 0x82a5488 in posix_exit boards/posix/native_posix/main.c:51:2

SUMMARY: libFuzzer: fuzz target exited
```

Thanks to this commit the `k_panic()` location is now immediately
available in test logs without even running anything locally:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
@ WEST_TOPDIR/sof/src/ipc/ipc3/handler.c:1623
ZEPHYR FATAL ERROR: 3
==315176== ERROR: libFuzzer: deadly signal
LLVMSymbolizer: error reading file: No such file or directory
    #0 0x81d9647 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9647)
    #1 0x80cc43b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc43b)
    zephyrproject-rtos#2 0x80ab6be in fuzzer::Fuzzer::CrashCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab77b in fuzzer::Fuzzer::StaticCrashSignalCallback()
    zephyrproject-rtos#4 0xf7f3159f  (linux-gate.so.1+0x59f)
    zephyrproject-rtos#5 0xf7f31578  (linux-gate.so.1+0x578)
    zephyrproject-rtos#6 0xf788ea16  (/usr/lib32/libc.so.6+0x8ea16)
    zephyrproject-rtos#7 0xf783b316 in raise (/usr/lib32/libc.so.6+0x3b316)
    zephyrproject-rtos#8 0xf7822120 in abort (/usr/lib32/libc.so.6+0x22120)
    zephyrproject-rtos#9 0x82afbde in ipc_cmd src/ipc/ipc3/handler.c:1623:2

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better
    crash reports.
SUMMARY: libFuzzer: deadly signal
```

The full stack trace is now immediately available when running
zephyr.exe in gdb:

```
./scripts/fuzz.sh  -- -DEXTRA_CFLAGS="-O0 -g3"

gdb build-fuzz/zephyr/zephyr.exe

run
backtrace

 zephyrproject-rtos#2  0xf783b317 in raise () from /usr/lib32/libc.so.6
 zephyrproject-rtos#3  0xf7822121 in abort () from /usr/lib32/libc.so.6
 zephyrproject-rtos#4  0x082afbdf in ipc_cmd (_hdr=0x8b...) at src/ipc/ipc3/handler.c:1623
 zephyrproject-rtos#5  0x082fbf4b in ipc_platform_do_cmd (ipc=0x8b161c0)
                                    at src/platform/posix/ipc.c:162
 zephyrproject-rtos#6  0x082e1e07 in ipc_do_cmd (data=0x8b161c0 <heapmem+1472>)
                                    at src/ipc/ipc-common.c:328
 zephyrproject-rtos#7  0x083696aa in task_run (task=0x8b161e8 <heapmem+1512>)
                                    at zephyr/include/rtos/task.h:94
 zephyrproject-rtos#8  0x083682dc in edf_work_handler (work=0x8b1621c <heapmem+1564>)
                                    at zephyr/edf_schedule.c:32
 zephyrproject-rtos#9  0x085245af in work_queue_main (workq_ptr=0x8b15b00 <edf_workq>,...)
                                         at zephyr/kernel/work.c:688
 zephyrproject-rtos#10 0x0823a6bc in z_thread_entry (entry=0x8523be0 <work_queue_main>,..
                                    at zephyr/lib/os/thread_entry.c:48
 zephyrproject-rtos#11 0x0829a6a1 in posix_arch_thread_entry (pa_thread_status=0x8630648 ..
                                  at zephyr/arch/posix/core/thread.c:56
 zephyrproject-rtos#12 0x0829c043 in posix_thread_starter (arg=0x4)
                              at zephyr/arch/posix/core/posix_core.c:293
 zephyrproject-rtos#13 0x080f6041 in asan_thread_start(void*) ()
 zephyrproject-rtos#14 0xf788c73c in ?? () from /usr/lib32/libc.so.6
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Feb 5, 2024
Flush all messages and invoke `abort()` when a k_panic() or k_oops() is
hit in native_posix mode.

One of the main purposes of `native_posix` is to provide debug
convenience. When running in a debugger, `abort()` stops execution which
provides a backtrace and the ability to inspect all variables.

A good, sample use case is fuzzing failures in SOF, see an example in:
thesofproject/sof#8632

In such a case, this commit adds value even before using a
debugger. Without this commit, confusingly meaningless stack trace:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
Exiting due to fatal error
==314134== ERROR: libFuzzer: fuzz target exited
    #0 0x81d9637 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9637)
    #1 0x80cc42b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc42b)
    zephyrproject-rtos#2 0x80ab79e in fuzzer::Fuzzer::ExitCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab864 in fuzzer::Fuzzer::StaticExitCallback() (zephyr.exe+
    zephyrproject-rtos#4 0xf783dfe8  (/usr/lib32/libc.so.6+0x3dfe8)
    zephyrproject-rtos#5 0xf783e1e6 in exit (/usr/lib32/libc.so.6+0x3e1e6)
    zephyrproject-rtos#6 0x82a5488 in posix_exit boards/posix/native_posix/main.c:51:2

SUMMARY: libFuzzer: fuzz target exited
```

Thanks to this commit the `k_panic()` location is now immediately
available in test logs without even running anything locally:

```
INFO: seed corpus: files: 1097 min: 1b max: 428b total: 90853b rss: 58Mb
@ WEST_TOPDIR/sof/src/ipc/ipc3/handler.c:1623
ZEPHYR FATAL ERROR: 3
==315176== ERROR: libFuzzer: deadly signal
LLVMSymbolizer: error reading file: No such file or directory
    #0 0x81d9647 in __sanitizer_print_stack_trace (zephyr.exe+0x81d9647)
    #1 0x80cc43b in fuzzer::PrintStackTrace() (zephyr.exe+0x80cc43b)
    zephyrproject-rtos#2 0x80ab6be in fuzzer::Fuzzer::CrashCallback() FuzzerLoop.cpp.o
    zephyrproject-rtos#3 0x80ab77b in fuzzer::Fuzzer::StaticCrashSignalCallback()
    zephyrproject-rtos#4 0xf7f3159f  (linux-gate.so.1+0x59f)
    zephyrproject-rtos#5 0xf7f31578  (linux-gate.so.1+0x578)
    zephyrproject-rtos#6 0xf788ea16  (/usr/lib32/libc.so.6+0x8ea16)
    zephyrproject-rtos#7 0xf783b316 in raise (/usr/lib32/libc.so.6+0x3b316)
    zephyrproject-rtos#8 0xf7822120 in abort (/usr/lib32/libc.so.6+0x22120)
    zephyrproject-rtos#9 0x82afbde in ipc_cmd src/ipc/ipc3/handler.c:1623:2

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better
    crash reports.
SUMMARY: libFuzzer: deadly signal
```

The full stack trace is now immediately available when running
zephyr.exe in gdb:

```
./scripts/fuzz.sh  -- -DEXTRA_CFLAGS="-O0 -g3"

gdb build-fuzz/zephyr/zephyr.exe

run
backtrace

 zephyrproject-rtos#2  0xf783b317 in raise () from /usr/lib32/libc.so.6
 zephyrproject-rtos#3  0xf7822121 in abort () from /usr/lib32/libc.so.6
 zephyrproject-rtos#4  0x082afbdf in ipc_cmd (_hdr=0x8b...) at src/ipc/ipc3/handler.c:1623
 zephyrproject-rtos#5  0x082fbf4b in ipc_platform_do_cmd (ipc=0x8b161c0)
                                    at src/platform/posix/ipc.c:162
 zephyrproject-rtos#6  0x082e1e07 in ipc_do_cmd (data=0x8b161c0 <heapmem+1472>)
                                    at src/ipc/ipc-common.c:328
 zephyrproject-rtos#7  0x083696aa in task_run (task=0x8b161e8 <heapmem+1512>)
                                    at zephyr/include/rtos/task.h:94
 zephyrproject-rtos#8  0x083682dc in edf_work_handler (work=0x8b1621c <heapmem+1564>)
                                    at zephyr/edf_schedule.c:32
 zephyrproject-rtos#9  0x085245af in work_queue_main (workq_ptr=0x8b15b00 <edf_workq>,...)
                                         at zephyr/kernel/work.c:688
 zephyrproject-rtos#10 0x0823a6bc in z_thread_entry (entry=0x8523be0 <work_queue_main>,..
                                    at zephyr/lib/os/thread_entry.c:48
 zephyrproject-rtos#11 0x0829a6a1 in posix_arch_thread_entry (pa_thread_status=0x8630648 ..
                                  at zephyr/arch/posix/core/thread.c:56
 zephyrproject-rtos#12 0x0829c043 in posix_thread_starter (arg=0x4)
                              at zephyr/arch/posix/core/posix_core.c:293
 zephyrproject-rtos#13 0x080f6041 in asan_thread_start(void*) ()
 zephyrproject-rtos#14 0xf788c73c in ?? () from /usr/lib32/libc.so.6
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this pull request Feb 27, 2024
They shouldn't do much but they can't hurt.

See thesofproject#8632 for more details.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this pull request Feb 27, 2024
They shouldn't do much but they can't hurt.

See thesofproject#8632 for more details.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
lgirdwood pushed a commit that referenced this pull request Feb 28, 2024
They shouldn't do much but they can't hurt.

See #8632 for more details.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.