Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RK3588S Mali-G610 GPU (Orange Pi 5) video playback segfault #356

Open
martivo opened this issue Feb 11, 2023 · 13 comments
Open

RK3588S Mali-G610 GPU (Orange Pi 5) video playback segfault #356

martivo opened this issue Feb 11, 2023 · 13 comments

Comments

@martivo
Copy link

martivo commented Feb 11, 2023

Hardware: Orange Pi 5 RK3588S Mali-G610 GPU
OS: Armbian Linux Linux (5.10.110-rockchip-rk3588 #trunk.0248 SMP Fri Feb 10 05:25:40 UTC 2023 aarch64 aarch64 aarch64)

When updating to latest version of librockchip_mpp.so.0 the video playback no longer works. I have traced down the cause to this commit 1cc1af1

When I remove commit 1cc1af1 code changes from latest commit a05b01d and build librockchip_mpp.so.0 then video playback is normal.

Syslog message when segfault happens. This is with the commit 1cc1af1.

Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: Assertion fd > 0 failed at heap_fd_open:136
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: os_allocator_dma_heap_open open dma heap type 0 failed!
Feb 10 21:13:09 loovsys mpp[69636]: mpp_allocator: mpp_allocator_get type 1 failed
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: Assertion fd > 0 failed at heap_fd_open:136
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: os_allocator_dma_heap_open open dma heap type 0 failed!
Feb 10 21:13:09 loovsys mpp[69636]: mpp_allocator: mpp_allocator_get type 3 failed
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: Assertion fd > 0 failed at heap_fd_open:136
Feb 10 21:13:09 loovsys mpp[69636]: mpp_dma_heap: os_allocator_dma_heap_open open dma heap type 0 failed!
Feb 10 21:13:09 loovsys mpp[69636]: mpp_allocator: mpp_allocator_get type 4 failed
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->allocator failed at get_group:902
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->alloc_api failed at get_group:903
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->allocator failed at get_group:902
Feb 10 21:13:09 loovsys mpp[69636]: mpp_buffer: Assertion p->alloc_api failed at get_group:903
@amazingfate
Copy link

You need to set the right permission. Here is the udev rules I use:
https://github.com/amazingfate/rockchip-multimedia-config/blob/main/99-rk-device-permissions.rules

@martivo
Copy link
Author

martivo commented Feb 13, 2023

My previous udev rules:

KERNEL=="mpp_service", MODE="0660", GROUP="video"
KERNEL=="rga", MODE="0660", GROUP="video"
KERNEL=="system-dma32", MODE="0666", GROUP="video"
KERNEL=="system-uncached-dma32", MODE="0666", GROUP="video" RUN+="/usr/bin/chmod a+rw /dev/dma_heap"

After changing my existing udev rule to https://github.com/amazingfate/rockchip-multimedia-config/blob/main/99-rk-device-permissions.rules

root@loovsys:~# ls -l /dev/dma_heap/
total 0
crw------- 1 root root  251, 4 Feb 13 09:06 cma
crw------- 1 root root  251, 5 Feb 13 09:06 cma-uncached
crw------- 1 root root  251, 0 Feb 13 09:06 system
crw-rw-rw- 1 root video 251, 1 Feb 13 09:06 system-dma32
crw-rw-rw- 1 root video 251, 2 Feb 13 09:06 system-uncached
crw-rw-rw- 1 root video 251, 3 Feb 13 09:06 system-uncached-dma32
root@loovsys:~# ls -l /sys/class/dma_heap/
total 0
lrwxrwxrwx 1 root root 0 Feb 13 08:56 cma -> ../../devices/virtual/dma_heap/cma
lrwxrwxrwx 1 root root 0 Feb 13 08:56 cma-uncached -> ../../devices/virtual/dma_heap/cma-uncached
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system -> ../../devices/virtual/dma_heap/system
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system-dma32 -> ../../devices/virtual/dma_heap/system-dma32
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system-uncached -> ../../devices/virtual/dma_heap/system-uncached
lrwxrwxrwx 1 root root 0 Feb 13 08:56 system-uncached-dma32 -> ../../devices/virtual/dma_heap/system-uncached-dma32

The segfault goes away BUT the playback is horribly choppy and barely plays. Before it was perfect playback(same video file).

During playback dmesg shows:

[  173.764467] rga: request[727] submit failed!
[  173.824646] rga_mm: RGA_MMU unsupported Memory larger than 4G!
[  173.824668] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  173.824674] rga_mm: rga_mm_map_buffer map dma_buf error!
[  173.824679] rga_mm: job buffer map failed!
[  173.824682] rga_mm: src channel map job buffer failed!
[  173.824687] rga_mm: failed to map buffer
[  173.824691] rga_job: rga_job_commit: failed to map job info
[  173.824703] rga_job: request[728] task[0] job_commit failed.
[  173.824707] rga_job: rga request commit failed!
[  173.824710] rga: request[728] submit failed!
[  173.883175] rga_mm: RGA_MMU unsupported Memory larger than 4G!
[  173.883192] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  173.883196] rga_mm: rga_mm_map_buffer map dma_buf error!
[  173.883199] rga_mm: job buffer map failed!
[  173.883203] rga_mm: src channel map job buffer failed!
[  173.883206] rga_mm: failed to map buffer
[  173.883212] rga_job: rga_job_commit: failed to map job info
[  173.883222] rga_job: request[729] task[0] job_commit failed.
[  173.883226] rga_job: rga request commit failed!
[  173.883229] rga: request[729] submit failed!
[  173.939808] rga_mm: RGA_MMU unsupported Memory larger than 4G!
[  173.939823] rga_mm: scheduler core[4] unsupported mm_flag[0x0]!
[  173.939827] rga_mm: rga_mm_map_buffer map dma_buf error!
[  173.939830] rga_mm: job buffer map failed!
[  173.939833] rga_mm: src channel map job buffer failed!
[  173.939837] rga_mm: failed to map buffer
[  173.939842] rga_job: rga_job_commit: failed to map job info
[  173.939852] rga_job: request[730] task[0] job_commit failed.
[  173.939857] rga_job: rga request commit failed!
[  173.939859] rga: request[730] submit failed!

The board has 16Gb of memory. (only 2-3Gb is in use)

mpv log during the slow playback:

 (+) Video --vid=1 (*) (h264 800x600 8.000fps)
 (+) Audio --aid=1 (*) (aac 6ch 44100Hz)
[vo/gpu/wayland] GNOME's wayland compositor lacks support for the idle inhibit protocol. This means the screen can blank during playback.
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
No video PTS! Making something up. Using 8.000000 FPS.
AO: [pulse] 44100Hz 5.1 6ch float
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
VO: [gpu] 800x600 yuv420p
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
[ffmpeg/video] h264_rkmpp: Doing slow software conversion
[ffmpeg/video] h264_rkmpp: Doing slow software conversion

And when I remove the commit 1cc1af1 from latest build then playback is normal.

dmesg.log

martivo referenced this issue Feb 13, 2023
Change-Id: Ifd2286ecf97fb4477693c24cdaec10c1df15eacf
Signed-off-by: Rimon Xu <rimon.xu@rock-chips.com>
@amazingfate
Copy link

I just did some research on it. The mpp code update makes the kernel check for rga MMU changes: https://github.com/radxa/kernel/blob/linux-5.10-gen-rkr3.4/drivers/video/rockchip/rga3/rga_mm.c#L409.
You can limit the memory of the board to 4G to get rid of it. Or ask the developer to fix.

@martivo
Copy link
Author

martivo commented Feb 13, 2023

Is it possible to make mpp detect that this will not work on boards with more than 4Gb of memory and not use the feature the 1cc1af1 commit adds? Perhaps until this is fixed in the Kernel?

The playback is perfect on 16Gb or memory without this commit. A user defined setting would also solve the issue IMHO or perhaps do a permission check - in case it has no access to "system-uncached" then don't use it? (Seems to be the case I had before changin udev rules - only ending in segfault...).

I need more than 4Gb of memory - it is not an option to limit to 4Gb. I am sure there is others who want to use the hardware to the fullest.

@amazingfate
Copy link

A quick fix (just revert a part of the commit):

diff --git a/osal/allocator/allocator_dma_heap.c b/osal/allocator/allocator_dma_heap.c
index 7e3a637..fd0eff4 100644
--- a/osal/allocator/allocator_dma_heap.c
+++ b/osal/allocator/allocator_dma_heap.c
@@ -74,14 +74,14 @@ typedef enum DmaHeapType_e {
 } DmaHeapType;

 static const char *heap_names[] = {
-    "system-uncached",          /* 0 - default */
+    "system-uncached-dma32",    /* 0 - default */
     "cma-uncached",             /* 1 -                                      DMA_HEAP_CMA */
-    "system",                   /* 2 -                  DMA_HEAP_CACHABLE                */
+    "system-dma32",             /* 2 -                  DMA_HEAP_CACHABLE                */
     "cma",                      /* 3 -                  DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
-    "system-uncached-dma32",    /* 4 - DMA_HEAP_DMA32                                    */
-    "cma-uncached",             /* 5 - DMA_HEAP_DMA32                     | DMA_HEAP_CMA */
-    "system-dma32",             /* 6 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE                */
-    "cma",                      /* 7 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
+    "system-uncached",          /* 4 - DMA_HEAP_DMA64                                    */
+    "cma-uncached",             /* 5 - DMA_HEAP_DMA64                     | DMA_HEAP_CMA */
+    "system",                   /* 6 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE                */
+    "cma",                      /* 7 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
 };

 static int heap_fds[DMA_HEAP_TYPE_NB];

I think this issue is hard to solve because ffmpeg need to convert YUV420SP to YUV420P, and only rga2 kernel driver can do this. And rga2 is hardware limited to memory less than 4G.
I don't know why the developer want to change the priority from 32bit to 64 bit. But that should have solved some issues for some other hardware.

@rimonxu
Copy link
Contributor

rimonxu commented Feb 13, 2023

also can calling rga api
"imconfig(IM_CONFIG_SCHEDULER_CORE, IM_SCHEDULER_RGA3_CORE0 | IM_SCHEDULER_RGA3_CORE1);"
to lock rga3 core. rga3 no limited to memroy less than 4G.

@amazingfate
Copy link

also can calling rga api "imconfig(IM_CONFIG_SCHEDULER_CORE, IM_SCHEDULER_RGA3_CORE0 | IM_SCHEDULER_RGA3_CORE1);" to lock rga3 core. rga3 no limited to memroy less than 4G.

Ffmpeg want to convert YCbCr_420_SP to YCbCr_420_P, which is not supported by the kernel driver: https://github.com/radxa/kernel/blob/linux-5.10-gen-rkr3.4/drivers/video/rockchip/rga3/rga_hw_config.c#L37

@rimonxu
Copy link
Contributor

rimonxu commented Feb 13, 2023

also can calling rga api "imconfig(IM_CONFIG_SCHEDULER_CORE, IM_SCHEDULER_RGA3_CORE0 | IM_SCHEDULER_RGA3_CORE1);" to lock rga3 core. rga3 no limited to memroy less than 4G.

Ffmpeg want to convert YCbCr_420_SP to YCbCr_420_P, which is not supported by the kernel driver: https://github.com/radxa/kernel/blob/linux-5.10-gen-rkr3.4/drivers/video/rockchip/rga3/rga_hw_config.c#L37

I know the limitations in RGA3... So when 420P is needed, I suggest using GPU instead of RGA2 in RK3588S, rga2 has low performance and other limitations..

martivo pushed a commit to martivo/mpp that referenced this issue Feb 13, 2023
@martivo
Copy link
Author

martivo commented Feb 13, 2023

A quick fix (just revert a part of the commit):

diff --git a/osal/allocator/allocator_dma_heap.c b/osal/allocator/allocator_dma_heap.c
index 7e3a637..fd0eff4 100644
--- a/osal/allocator/allocator_dma_heap.c
+++ b/osal/allocator/allocator_dma_heap.c
@@ -74,14 +74,14 @@ typedef enum DmaHeapType_e {
 } DmaHeapType;

 static const char *heap_names[] = {
-    "system-uncached",          /* 0 - default */
+    "system-uncached-dma32",    /* 0 - default */
     "cma-uncached",             /* 1 -                                      DMA_HEAP_CMA */
-    "system",                   /* 2 -                  DMA_HEAP_CACHABLE                */
+    "system-dma32",             /* 2 -                  DMA_HEAP_CACHABLE                */
     "cma",                      /* 3 -                  DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
-    "system-uncached-dma32",    /* 4 - DMA_HEAP_DMA32                                    */
-    "cma-uncached",             /* 5 - DMA_HEAP_DMA32                     | DMA_HEAP_CMA */
-    "system-dma32",             /* 6 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE                */
-    "cma",                      /* 7 - DMA_HEAP_DMA32 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
+    "system-uncached",          /* 4 - DMA_HEAP_DMA64                                    */
+    "cma-uncached",             /* 5 - DMA_HEAP_DMA64                     | DMA_HEAP_CMA */
+    "system",                   /* 6 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE                */
+    "cma",                      /* 7 - DMA_HEAP_DMA64 | DMA_HEAP_CACHABLE | DMA_HEAP_CMA */
 };

 static int heap_fds[DMA_HEAP_TYPE_NB];

I can confirm with this change to latest "develop" branch the problem is not present.
I created a fork for quick fix martivo@38afa76

I know the limitations in RGA3... So when 420P is needed, I suggest using GPU instead of RGA2 in RK3588S, rga2 has low performance and other limitations..

How could this be solved permanently? Where does the change have to take place?

@jjm2473
Copy link

jjm2473 commented Jun 19, 2023

If you are using latest mpp library, there is a better solution to fix RGA2 4GB issue jjm2473/ffmpeg-rk@7e350f9

diff --git a/libavcodec/rkmppdec.c b/libavcodec/rkmppdec.c
index ca7a824ac1bd..e2078c089936 100644
--- a/libavcodec/rkmppdec.c
+++ b/libavcodec/rkmppdec.c
@@ -249,7 +249,7 @@ static int rkmpp_init_decoder(AVCodecContext *avctx)
         goto fail;
     }
 
-    ret = mpp_buffer_group_get_internal(&decoder->frame_group, MPP_BUFFER_TYPE_DRM);
+    ret = mpp_buffer_group_get_internal(&decoder->frame_group, MPP_BUFFER_TYPE_DRM | MPP_BUFFER_FLAGS_DMA32);
     if (ret) {
        av_log(avctx, AV_LOG_ERROR, "Failed to get buffer group (code = %d)\n", ret);
        ret = AVERROR_UNKNOWN;

This patch will force MPP to output frames under dma32 address, so that RGA2 can handle it

@shivabohemian
Copy link

shivabohemian commented Mar 9, 2024

jjm2473/ffmpeg-rk@7e350f9

Hello, does gstreamer-rockchip also need to change this part of the code? The rk3568 also reports an "RGA_MMU unsupported memory larger than 4G" error. As shown in the figure below, does the mpp_buffer_group_get_external function also need to add this parameter?
截屏2024-03-09 14 29 24

@jjm2473
Copy link

jjm2473 commented Mar 9, 2024

@shivabohemian They should all be the same as ffmpeg, but you need to confirm whether group or ext_group is used as the output buffer of MPP decoding. (Or just test it)

@shivabohemian
Copy link

shivabohemian commented Mar 9, 2024

@jjm2473
Thank you for your answer. I found some uploaded gstreamer-rockchip code on GitHub here .
It seems that both group and ext_group buffers are used. However, I'm not proficient in C language. I will go and test it out, and your help is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants