Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): use hw decoding for rkmpp w/o OpenCL if possible #13848

Merged
merged 10 commits into from
Nov 22, 2024

Conversation

zhujunsan
Copy link
Contributor

Set hardware decoding options for rkmpp when hardware decoding is enabled with no OpenCL on non-HDR file

fix #13579

mertalev
mertalev previously approved these changes Oct 31, 2024
Copy link
Contributor

@mertalev mertalev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

server/src/utils/media.ts Outdated Show resolved Hide resolved
@mertalev mertalev dismissed their stale review October 31, 2024 13:11

The command isn’t valid

@mertalev
Copy link
Contributor

When it needs to tone-map and OpenCL isn't available, the command won't work. This is because the HW config's input options will keep the frames in the GPU, but the SW config's filters assume they're on CPU.

The best option here is probably to do scaling on the GPU, then add hwdownload and the SW tonemap filters followed by hwupload. This will still do decoding and scaling on the GPU and only tone-mapping will be on CPU.

@zhujunsan
Copy link
Contributor Author

Thank you for your suggestion, you are correct. I'm not very familiar with ffmpeg and video processing, let me dig into it.

`scale_rkrga=${this.getScaling(videoStream)}:format=nv12`,
'hwdownload',
'format=nv12',
`tonemapx=tonemap=${this.config.tonemap}:desat=0:p=${primaries}:t=${transfer}:m=${matrix}:r=pc:peak=100:format=nv12`,
Copy link
Contributor Author

@zhujunsan zhujunsan Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know why but only nv12 format works.

tried:

all yuv420p

scale_rkrga=-2:720:format=yuv420p,
hwdownload,
format=yuv420p,
tonemapx=tonemap=hable:desat=0:p=bt709:t=bt709:m=bt709:r=pc:peak=100:format=yuv420p,
hwupload

nv12 in rkrga, convert to yuv420p in cpu

scale_rkrga=-2:720:format=nv12,
hwdownload,
format=nv12,
format=yuv420p,
tonemapx=tonemap=hable:desat=0:p=bt709:t=bt709:m=bt709:r=pc:peak=100:format=yuv420p,
hwupload

all with color strips in video like:
微信截图_20241101141726

(and some other format that doesn't even support)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense since the GPU doesn't understand yuv420p, only the (very similar) nv12.

@zhujunsan zhujunsan changed the title fix(server): use hw decoding for rkmpp w/o OpenCL on non HDR file fix(server): use hw decoding for rkmpp w/o OpenCL Nov 1, 2024
return [ // use RKMPP for scaling, CPU for tone mapping
`scale_rkrga=${this.getScaling(videoStream)}:format=nv12`,
'hwdownload',
'format=nv12',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't set the format to nv12 before tone-mapping - it will alter the colors before tone-mapping has a chance to map them to SDR. Does it not work without setting a format here? If not, use yuv420p10le instead to keep it 10-bit.

];
}
return [ // use RKMPP for scaling, CPU for tone mapping
`scale_rkrga=${this.getScaling(videoStream)}:format=nv12`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set the format to p010 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, after some digging, I think my chip(RK3566)'s 2D graphics module doesn't support 10bit output, therefore p010 format (and other 10-bit format like nv15) is not supported. a total sw decoding might be needed here at least for RK3566

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share the logs? It needs to support 10-bit decoding to even get to scale_rkrga since the decoding happens before this filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, here's the log

p010
root@178a1f2d2a09:/usr/src/app/upload/library# /usr/bin/ffmpeg \
> -hwaccel rkmpp -hwaccel_output_format drm_prime -noautorotate \
> -i hdr.mp4 \
> -y -c:a copy -movflags faststart -fps_mode passthrough \
> -map 0:0 -map 0:1 -strict unofficial -g 256 -v verbose \
> -vf "scale_rkrga=-2:720:format=p010,\
> hwdownload,\
> format=yuv420p10le,\
> tonemapx=tonemap=hable:desat=0:p=bt709:t=bt709:m=bt709:r=pc:peak=100:format=nv12,\
> hwupload" \
> -c:v h264_rkmpp -level 51 -rc_mode CQP -qp_init 23 output.mp4
ffmpeg version 7.0.2-Jellyfin Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto=auto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-opencl --enable-libdrm --enable-libxml2 --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libharfbuzz --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libsvtav1 --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=arm64 --cross-prefix=/usr/bin/aarch64-linux-gnu- --toolchain=hardened --enable-cross-compile --enable-rkmpp --enable-rkrga
  libavutil      59.  8.100 / 59.  8.100
  libavcodec     61.  3.100 / 61.  3.100
  libavformat    61.  1.100 / 61.  1.100
  libavdevice    61.  1.100 / 61.  1.100
  libavfilter    10.  1.100 / 10.  1.100
  libswscale      8.  1.100 /  8.  1.100
  libswresample   5.  1.100 /  5.  1.100
  libpostproc    58.  1.100 / 58.  1.100
Routing option strict to both codec and muxer layer
Selecting decoder 'hevc_rkmpp' because of requested hwaccel method rkmpp
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'hdr.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf61.7.100
  Duration: 00:00:20.52, start: 0.000000, bitrate: 9795 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main 10), 1 reference frame (hev1 / 0x31766568), yuv420p10le(tv, bt2020nc/unknown/unknown, left), 2160x3840 [SAR 1:1 DAR 9:16], 9600 kb/s, 30 fps, 30 tbr, 16k tbn (default)
      Metadata:
        handler_name    : VideoHandler
        vendor_id       : [0][0][0][0]
        encoder         : Lavc61.19.100 libx265
      Side data:
        Content Light Level Metadata, MaxCLL=2000, MaxFALL=100
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 194 kb/s (default)
      Metadata:
        handler_name    : SoundHandler
        vendor_id       : [0][0][0][0]
[out#0/mp4 @ 0xaaaad5f89880] Adding streams from explicit maps...
[vost#0:0/h264_rkmpp @ 0xaaaad5f47a10] Created video stream from input stream 0:0
[tonemapx @ 0xaaaad5f2d2f0] Using CPU capability: NEON
[hevc_rkmpp @ 0xaaaad5f41640] Picked up an existing RKMPP hardware device
[aost#0:1/copy @ 0xaaaad5f96850] Created audio stream from input stream 0:1
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (hevc_rkmpp) -> h264 (h264_rkmpp))
  Stream #0:1 -> #0:1 (copy)
[vost#0:0/h264_rkmpp @ 0xaaaad5f47a10] Starting thread...
[vf#0:0 @ 0xaaaad5f3d600] Starting thread...
[vist#0:0/hevc @ 0xaaaad5f3e000] [dec:hevc_rkmpp @ 0xaaaad5f89d30] Starting thread...
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900] Starting thread...
Press [q] to stop, [?] for help
[hevc_rkmpp @ 0xaaaad5f41640] Noticed an info change
[hevc_rkmpp @ 0xaaaad5f41640] Configured with size: 2160x3840 | pix_fmt: drm_prime | sw_pix_fmt: nv15
[tonemapx @ 0xffff9c002d00] Using CPU capability: NEON
[graph 0 input from stream 0:0 @ 0xffff9c006840] w:2160 h:3840 pixfmt:drm_prime tb:1/16000 fr:30/1 sar:1/1 csp:bt2020nc range:tv
[auto_scale_0 @ 0xffff9c008970] w:iw h:ih flags:'' interl:0
[Parsed_tonemapx_3 @ 0xffff9c002c50] auto-inserting filter 'auto_scale_0' between the filter 'Parsed_format_2' and the filter 'Parsed_tonemapx_3'
rga_api version 1.10.1_[3]
[Parsed_scale_rkrga_0 @ 0xffff9c0025e0] 'p010le' is only supported by RGA3
[Parsed_scale_rkrga_0 @ 0xffff9c0025e0] Failed to configure output pad on Parsed_scale_rkrga_0
[vf#0:0 @ 0xaaaad5f3d600] Error reinitializing filters!
[vf#0:0 @ 0xaaaad5f3d600] Task finished with error code: -38 (Function not implemented)
[vost#0:0/h264_rkmpp @ 0xaaaad5f47a10] Encoder thread received EOF
[vost#0:0/h264_rkmpp @ 0xaaaad5f47a10] Could not open encoder before EOF
[vost#0:0/h264_rkmpp @ 0xaaaad5f47a10] Task finished with error code: -22 (Invalid argument)
[vost#0:0/h264_rkmpp @ 0xaaaad5f47a10] Terminating thread with return code -22 (Invalid argument)
[vf#0:0 @ 0xaaaad5f3d600] Terminating thread with return code -38 (Function not implemented)
[vist#0:0/hevc @ 0xaaaad5f3e000] [dec:hevc_rkmpp @ 0xaaaad5f89d30] Decoder returned EOF, finishing
[vist#0:0/hevc @ 0xaaaad5f3e000] [dec:hevc_rkmpp @ 0xaaaad5f89d30] Terminating thread with return code 0 (success)
[vist#0:0/hevc @ 0xaaaad5f3e000] All consumers of this stream are done
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900] EOF while reading input
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900] Terminating thread with return code 0 (success)
[out#0/mp4 @ 0xaaaad5f89880] Nothing was written into output file, because at least one of its streams received no packets.
frame=    0 fps=0.0 q=0.0 Lsize=       0KiB time=N/A bitrate=N/A speed=N/A    
[AVIOContext @ 0xaaaad5f973a0] Statistics: 0 bytes written, 0 seeks, 0 writeouts
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900] Input file #0 (hdr.mp4):
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900]   Input stream #0:0 (video): 17 packets read (762470 bytes); 2 frames decoded; 0 decode errors; 
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900]   Input stream #0:1 (audio): 961 packets read (497398 bytes); 
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaad5f2a900]   Total: 978 packets (1259868 bytes) demuxed
[AVIOContext @ 0xaaaad5f33020] Statistics: 25126543 bytes read, 0 seeks
Conversion failed!

it says 'p010le' is only supported by RGA3

I don't know if it is the thing but according to some documents, RK3588 is the only chip that supports a 10 bit output. Other chip (like mine RK3566) can decode 10bit stream, but not output them.

and also, from chip datasheet:

image

Copy link
Contributor

@mertalev mertalev Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is it possible to not set the format in scale_rkrga and just have hwdownload,format=yuv420p10le (or possibly hwmap=mode=read,format=yuv420p10le) after it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested these three:

/usr/bin/ffmpeg \
-hwaccel rkmpp -hwaccel_output_format drm_prime -noautorotate \
-i hdr.mp4 \
-y -c:a copy -movflags faststart -fps_mode passthrough \
-map 0:0 -map 0:1 -strict unofficial -g 256 -v verbose \
-vf "scale_rkrga=-2:720,\
hwdownload,\
format=yuv420p10le,\
tonemapx=tonemap=hable:desat=0:p=bt709:t=bt709:m=bt709:r=pc:peak=100:format=nv12,\
hwupload" \
-c:v h264_rkmpp -level 51 -rc_mode CQP -qp_init 23 output.mp4
/usr/bin/ffmpeg \
-hwaccel rkmpp -hwaccel_output_format drm_prime -noautorotate \
-i hdr.mp4 \
-y -c:a copy -movflags faststart -fps_mode passthrough \
-map 0:0 -map 0:1 -strict unofficial -g 256 -v verbose \
-vf "scale_rkrga=-2:720,\
hwmap=mode=read,\
hwdownload,\
format=yuv420p10le,\
tonemapx=tonemap=hable:desat=0:p=bt709:t=bt709:m=bt709:r=pc:peak=100:format=nv12,\
hwupload" \
-c:v h264_rkmpp -level 51 -rc_mode CQP -qp_init 23 output.mp4
/usr/bin/ffmpeg \
-hwaccel rkmpp -hwaccel_output_format drm_prime -noautorotate \
-i hdr.mp4 \
-y -c:a copy -movflags faststart -fps_mode passthrough \
-map 0:0 -map 0:1 -strict unofficial -g 256 -v verbose \
-vf "scale_rkrga=-2:720,\
hwmap=mode=read,\
format=yuv420p10le,\
tonemapx=tonemap=hable:desat=0:p=bt709:t=bt709:m=bt709:r=pc:peak=100:format=nv12,\
hwupload" \
-c:v h264_rkmpp -level 51 -rc_mode CQP -qp_init 23 output.mp4

all doesn't work. all report the same error: 'nv15' as output is not supported if RGA2 is requested, where nv15 is the original pixel format of the video. If no pixel format is defined, original video pixel format is used, but still, 10-bit output is not supported, so same error :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's try something else instead. You can change the CPU fallback in this section to check if accelDecode is enabled. If it is, it can try with hardware decoding disabled and hardware encoding enabled before falling back to full CPU.

With this, it's okay if the no-OpenCL command here doesn't work on RK3566 because it will just fall back to SW decoding. I can do testing on an RK3588 to make sure this command works on compatible hardware.

@zhujunsan zhujunsan changed the title fix(server): use hw decoding for rkmpp w/o OpenCL fix(server): use hw decoding for rkmpp w/o OpenCL on non-HDR video Nov 3, 2024
@zhujunsan zhujunsan changed the title fix(server): use hw decoding for rkmpp w/o OpenCL on non-HDR video fix(server): use hw decoding for rkmpp w/o OpenCL if possible Nov 4, 2024
@zhujunsan
Copy link
Contributor Author

Am I doing this right? Any more advice? @mertalev , thanks

@mertalev
Copy link
Contributor

Sorry, I haven't tested it yet. I might have time to take a look at it tomorrow.

@mertalev
Copy link
Contributor

I had to change format=yuv420p10le to format=p010 for the tone-mapping command without OpenCL to work, but it otherwise works pretty well. I also added async_depth=4 to all the scaling commands as I found that it improves speed.

@mertalev mertalev enabled auto-merge (squash) November 22, 2024 06:53
@zhujunsan
Copy link
Contributor Author

wow great!

@mertalev mertalev merged commit 1c82804 into immich-app:main Nov 22, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RKMPP not using hardware decoding when no OpenCL even when tone-mapping is not needed
2 participants