Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsanitized IPC input is used for memory flags #8832

Closed
marc-hb opened this issue Feb 2, 2024 · 11 comments
Closed

Unsanitized IPC input is used for memory flags #8832

marc-hb opened this issue Feb 2, 2024 · 11 comments
Assignees
Labels
bug Something isn't working as expected P3 Low-impact bugs or features
Milestone

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 2, 2024

./scripts/fuzz.sh -o fuzz-stdout.txt -t 600 fails systematically after L3_HEAP commit 58a42e5

Example: https://github.com/thesofproject/sof/actions/runs/7763189541/job/21174929628

This may take a few minutes but never much longer.

P1 because this is affecting daily tests and our ability to fuzz.

Originally posted by @marc-hb in #8632 (comment)

PR #8632 is probably just the messenger but this fuzzing failure looks like a "good catch" to me.

Since #8632 was merged, one of the new k_panic() gets triggered because caps & SOF_MEM_CAPS_L3 is true even when CONFIG_L3_HEAP is false.

I think the reason caps & SOF_MEM_CAPS_L3 is true is because... caps comes directly from untrusted IPC input!? Why would IPCs be able to set caps directly?

The panic happens when ipc_glb_tplg_buffer_new() does this:

ret = ipc_buffer_new(ipc, (struct sof_ipc_buffer *)ipc->comp_data); 

At this point comp_data looks like it came straight from the fuzzer's untrusted input:

(gdb) p /x *desc

$5 = {comp = {hdr = {size = 0x66, cmd = 0x3020ffff}, id = 0xb6b6b600, type = 0xb6b6b6b6, pipeline_id = 0xb6b6b6b6, core = 0xffffb236, 
    ext_data_length = 0xffffffff}, size = 0xffffffff, caps = 0xffffffff, flags = 0xffffffff, reserved = 0xffffffff}
#0  rballoc_align (flags=0, caps=4294903040, bytes=86376703, align=64) at sof/zephyr/lib/alloc.c:398
#1  0x0831d38c in buffer_alloc (size=86376703, caps=4294903040, flags=4294967295, align=64, is_shared=false)
    at sof/src/audio/buffer.c:58
#2  0x082e4d6b in buffer_new (desc=0x8b184c0 <heapmem+1728>, is_shared=false) at sof/src/ipc/ipc-helper.c:48
#3  0x082cc78e in ipc_buffer_new (ipc=0x8b183c0 <heapmem+1472>, desc=0x8b184c0 <heapmem+1728>) at sof/src/ipc/ipc3/helper.c:459
#4  0x082b7249 in ipc_glb_tplg_buffer_new (header=807469055) at sof/src/ipc/ipc3/handler.c:1305
#5  0x082b090c in ipc_glb_tplg_message (header=807469055) at sof/src/ipc/ipc3/handler.c:1416
#6  0x082aff8a in ipc_cmd (_hdr=0x8b184c0 <heapmem+1728>) at sof/src/ipc/ipc3/handler.c:1651
#7  0x082fcb1b in ipc_platform_do_cmd (ipc=0x8b183c0 <heapmem+1472>) at sof/src/platform/posix/ipc.c:162
#8  0x082e2827 in ipc_do_cmd (data=0x8b183c0 <heapmem+1472>) at sof/src/ipc/ipc-common.c:328
#9  0x0836a27a in task_run (task=0x8b183e8 <heapmem+1512>) at sof/zephyr/include/rtos/task.h:94
@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 2, 2024

Debugging this is much easier with zephyrproject-rtos/zephyr#68494 and a bit easier with #8831

@marc-hb marc-hb changed the title I debugged this a bit. This PR is probably just the messenger but this fuzzing failure looks like a "good catch" to me. Unsanitized IPC input is used for memory flags Feb 2, 2024
@marc-hb marc-hb added the bug Something isn't working as expected label Feb 2, 2024
@marc-hb marc-hb added the P1 Blocker bugs or important features label Feb 3, 2024
@cujomalainey
Copy link
Contributor

What is the reason we are just not reverting the change?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 5, 2024

The change looks like just the messenger to me. Commit 58a42e5 adds a k_panic() which stops the fuzzer now but I believe the unsanitized IPC input has always been used for memory flags even before that k_panic().

Fuzzing is great but it's not a silver bullet. In this case, fuzzing never seemed to notice the unsanitized flags because they never caused any obvious corruption?

@cujomalainey
Copy link
Contributor

Agreed, I thought the commit added the fallthrough case not just expressing already broken logic, I was under a bad assumption.

@lgirdwood
Copy link
Member

@marc-hb can you fix and add validation checks around the memory types passed to IPC. Thanks.

marc-hb added a commit to marc-hb/sof that referenced this issue Feb 9, 2024
Fixes lack of SOF_MEM_CAPS_* input validation found in thesofproject#8832

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 9, 2024
Fixes lack of SOF_MEM_CAPS_* input validation found in thesofproject#8832

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 10, 2024
Fixes lack of SOF_MEM_CAPS_* input validation found in thesofproject#8832

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 10, 2024

I have a quick and dirty hack that is passing all the tests in #8850 but it will break again whenever we add a new MEM_CAPS bit.

What is the reason we are just not reverting the change?

I still don't think this L3 heap should be reverted but @jxstelter I think it should be reworked to better handle invalid inputs. k_panic() is simply too extreme to handle invalid inputs (and fuzzing is just the messenger).

In the meantime I submitted a major rework of fuzz.sh because it was good enough for CI but really too inconvenient and too slow for interactive use. With #8851 it's great for both, please review.

jxstelter added a commit to jxstelter/sof that referenced this issue Feb 12, 2024
For rballoc_allign() call when caps are not correct it is
enough to return error. k_panic() call is not required here.
Previous change exposed this issue: thesofproject#8832,
but it is sufficient to log error and return NULL at this point.

Signed-off-by: Jaroslaw Stelter <Jaroslaw.Stelter@intel.com>
jxstelter added a commit to jxstelter/sof that referenced this issue Feb 12, 2024
For rballoc_allign() call when caps are not correct it is
enough to return error. k_panic() call is not required here.
Previous change exposed this issue:
thesofproject#8832,
but it is sufficient to log error and return NULL at this point.

Signed-off-by: Jaroslaw Stelter <Jaroslaw.Stelter@intel.com>
lgirdwood pushed a commit that referenced this issue Feb 14, 2024
For rballoc_allign() call when caps are not correct it is
enough to return error. k_panic() call is not required here.
Previous change exposed this issue:
#8832,
but it is sufficient to log error and return NULL at this point.

Signed-off-by: Jaroslaw Stelter <Jaroslaw.Stelter@intel.com>
@lgirdwood lgirdwood added this to the v2.9 milestone Feb 23, 2024
@marc-hb marc-hb added P3 Low-impact bugs or features and removed P1 Blocker bugs or important features labels Feb 27, 2024
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 27, 2024
As discussed in the alternative approach
zephyrproject-rtos/zephyr#68494, k_panic() in
POSIX mode has various shortcomings that do not provide a useful
trace. Useless pointers to signal handlers or other cleanup routines are
printed instead. Leverage our already existing
k_sys_fatal_error_handler() and dereference a NULL pointer there when in
POSIX mode. This "fails fast" and provides a complete and relevant stack
trace in CI when fuzzing or when using some other static
analyzer. Example of how fuzzing failure thesofproject#8832 would have looked like in
CI results thanks to this commit:

```
./build-fuzz/zephyr/zephyr.exe: Running 1 inputs 1 time(s) each.
Running: ./rballoc_align_fuzz_crash
*** Booting Zephyr OS build zephyr-v3.5.0-3971-ge07de4e0a167 ***
[00:00:00.000,000] <inf> main: SOF on native_posix
[00:00:00.000,000] <inf> main: SOF initialized
@ WEST_TOPDIR/sof/zephyr/lib/alloc.c:391
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x891f8a0 (unknown)
[00:00:00.000,000] <err> zephyr: Halting emulation
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1784402==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000
==1784402==The signal is caused by a WRITE memory access.
==1784402==Hint: address points to the zero page.
    #0 0x829a77d in k_sys_fatal_error_handler zephyr/wrapper.c:352:19
    #1 0x829b8c0 in rballoc_align zephyr/lib/alloc.c:391:3
    thesofproject#2 0x8281438 in buffer_alloc src/audio/buffer.c:58:16
    thesofproject#3 0x826a60a in buffer_new src/ipc/ipc-helper.c:48:11
    thesofproject#4 0x8262107 in ipc_buffer_new src/ipc/ipc3/helper.c:459:11
    thesofproject#5 0x825944d in ipc_glb_tplg_buffer_new src/ipc/ipc3/handler.c:1305:8
    thesofproject#6 0x8257036 in ipc_cmd src/ipc/ipc3/handler.c:1651:9
    thesofproject#7 0x8272e59 in ipc_platform_do_cmd src/platform/posix/ipc.c:162:2
    thesofproject#8 0x826a2ac in ipc_do_cmd src/ipc/ipc-common.c:328:9
    thesofproject#9 0x829b0ab in task_run zephyr/include/rtos/task.h:94:9
    thesofproject#10 0x829abd8 in edf_work_handler zephyr/edf_schedule.c:32:16
    thesofproject#11 0x83560f7 in work_queue_main zephyr/kernel/work.c:688:3
    thesofproject#12 0x82244c2 in z_thread_entry zephyr/lib/os/thread_entry.c:48:2
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 27, 2024
Fixes lack of SOF_MEM_CAPS_* input validation found in thesofproject#8832

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 27, 2024
As discussed in the alternative approach
zephyrproject-rtos/zephyr#68494, k_panic() in
POSIX mode has various shortcomings that do not provide a useful
trace. Useless pointers to signal handlers or other cleanup routines are
printed instead. Leverage our already existing
k_sys_fatal_error_handler() and dereference a NULL pointer there when in
POSIX mode. This "fails fast" and provides a complete and relevant stack
trace in CI when fuzzing or when using some other static
analyzer. Example of how fuzzing failure thesofproject#8832 would have looked like in
CI results thanks to this commit:

```
./build-fuzz/zephyr/zephyr.exe: Running 1 inputs 1 time(s) each.
Running: ./rballoc_align_fuzz_crash
*** Booting Zephyr OS build zephyr-v3.5.0-3971-ge07de4e0a167 ***
[00:00:00.000,000] <inf> main: SOF on native_posix
[00:00:00.000,000] <inf> main: SOF initialized
@ WEST_TOPDIR/sof/zephyr/lib/alloc.c:391
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x891f8a0 (unknown)
[00:00:00.000,000] <err> zephyr: Halting emulation
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1784402==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000
==1784402==The signal is caused by a WRITE memory access.
==1784402==Hint: address points to the zero page.
    #0 0x829a77d in k_sys_fatal_error_handler zephyr/wrapper.c:352:19
    #1 0x829b8c0 in rballoc_align zephyr/lib/alloc.c:391:3
    thesofproject#2 0x8281438 in buffer_alloc src/audio/buffer.c:58:16
    thesofproject#3 0x826a60a in buffer_new src/ipc/ipc-helper.c:48:11
    thesofproject#4 0x8262107 in ipc_buffer_new src/ipc/ipc3/helper.c:459:11
    thesofproject#5 0x825944d in ipc_glb_tplg_buffer_new src/ipc/ipc3/handler.c:1305:8
    thesofproject#6 0x8257036 in ipc_cmd src/ipc/ipc3/handler.c:1651:9
    thesofproject#7 0x8272e59 in ipc_platform_do_cmd src/platform/posix/ipc.c:162:2
    thesofproject#8 0x826a2ac in ipc_do_cmd src/ipc/ipc-common.c:328:9
    thesofproject#9 0x829b0ab in task_run zephyr/include/rtos/task.h:94:9
    thesofproject#10 0x829abd8 in edf_work_handler zephyr/edf_schedule.c:32:16
    thesofproject#11 0x83560f7 in work_queue_main zephyr/kernel/work.c:688:3
    thesofproject#12 0x82244c2 in z_thread_entry zephyr/lib/os/thread_entry.c:48:2
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
lgirdwood pushed a commit that referenced this issue Feb 28, 2024
As discussed in the alternative approach
zephyrproject-rtos/zephyr#68494, k_panic() in
POSIX mode has various shortcomings that do not provide a useful
trace. Useless pointers to signal handlers or other cleanup routines are
printed instead. Leverage our already existing
k_sys_fatal_error_handler() and dereference a NULL pointer there when in
POSIX mode. This "fails fast" and provides a complete and relevant stack
trace in CI when fuzzing or when using some other static
analyzer. Example of how fuzzing failure #8832 would have looked like in
CI results thanks to this commit:

```
./build-fuzz/zephyr/zephyr.exe: Running 1 inputs 1 time(s) each.
Running: ./rballoc_align_fuzz_crash
*** Booting Zephyr OS build zephyr-v3.5.0-3971-ge07de4e0a167 ***
[00:00:00.000,000] <inf> main: SOF on native_posix
[00:00:00.000,000] <inf> main: SOF initialized
@ WEST_TOPDIR/sof/zephyr/lib/alloc.c:391
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x891f8a0 (unknown)
[00:00:00.000,000] <err> zephyr: Halting emulation
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1784402==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000
==1784402==The signal is caused by a WRITE memory access.
==1784402==Hint: address points to the zero page.
    #0 0x829a77d in k_sys_fatal_error_handler zephyr/wrapper.c:352:19
    #1 0x829b8c0 in rballoc_align zephyr/lib/alloc.c:391:3
    #2 0x8281438 in buffer_alloc src/audio/buffer.c:58:16
    #3 0x826a60a in buffer_new src/ipc/ipc-helper.c:48:11
    #4 0x8262107 in ipc_buffer_new src/ipc/ipc3/helper.c:459:11
    #5 0x825944d in ipc_glb_tplg_buffer_new src/ipc/ipc3/handler.c:1305:8
    #6 0x8257036 in ipc_cmd src/ipc/ipc3/handler.c:1651:9
    #7 0x8272e59 in ipc_platform_do_cmd src/platform/posix/ipc.c:162:2
    #8 0x826a2ac in ipc_do_cmd src/ipc/ipc-common.c:328:9
    #9 0x829b0ab in task_run zephyr/include/rtos/task.h:94:9
    #10 0x829abd8 in edf_work_handler zephyr/edf_schedule.c:32:16
    #11 0x83560f7 in work_queue_main zephyr/kernel/work.c:688:3
    #12 0x82244c2 in z_thread_entry zephyr/lib/os/thread_entry.c:48:2
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 28, 2024
Fixes lack of SOF_MEM_CAPS_* input validation found in thesofproject#8832

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 29, 2024

One last open PR on this topic and then we can close:

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 4, 2024

Stable-v2.9 branched, this didn't make the cut, bumping to 2.10.

@kv2019i kv2019i modified the milestones: v2.9, v2.10 Mar 4, 2024
kv2019i pushed a commit that referenced this issue Mar 5, 2024
Fixes lack of SOF_MEM_CAPS_* input validation found in #8832

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@cujomalainey
Copy link
Contributor

@kv2019i given this is a security issue, can we not hotfix?

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 6, 2024

@cujomalainey wrote:

@kv2019i given this is a security issue, can we not hotfix?

I actually thought this was a follow-up and the primary issue was already fixed (and thus the P3 priority).

But if not, let's indeed backport. The main PR is merged yesterday, @marc-hb can you submit a backport to stable-v2.9?

I think we can then close this issue, right? Or anything else pending?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 7, 2024

I think this was a potential security issue. But it's most likely not as long as unknown flags are ignored or rejected. #8853 is already in stable-v2.9 so I think it's enough.

@marc-hb marc-hb closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected P3 Low-impact bugs or features
Projects
None yet
Development

No branches or pull requests

4 participants