Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native video decoder via ffmpeg CLI #7607

Open
emilk opened this issue Oct 7, 2024 · 0 comments
Open

Native video decoder via ffmpeg CLI #7607

emilk opened this issue Oct 7, 2024 · 0 comments
Labels
enhancement New feature or request 🎞️ video

Comments

@emilk
Copy link
Member

emilk commented Oct 7, 2024

Send samples to ffmpeg over CLI, read back the results, show the user.

A basic demo of this is in this internal repository: https://github.com/rerun-io/video-experiments

The main approach is fairly simple:

 let mut ffmpeg = FfmpegCommand::new()
        .hide_banner()
        // Keep in mind that all arguments that are about the input, need to go before!
        .format("h264")
        .input("-")
        //.args(&["-c", "copy", "out.mp4"]) // For testing.
        .rawvideo() // Output rgb24 on stdout. (todo for later: any format we can read directly on re_renderer would be better!)
        .spawn()
        .expect("faild to spawn ffmpeg");

    let mut stdin = ffmpeg.take_stdin().unwrap();
    std::thread::spawn(move || {
          // Wait for samples to be sent on stdin.
          // then send them to ffmpeg.
          // See `write_sample_to_nalu_stream`
    });

    // On the main thread, run the output instance to completion
    ffmpeg.iter().unwrap().for_each(|e| match e {
        FfmpegEvent::Log(LogLevel::Error, e) => println!("Error: {}", e),
        FfmpegEvent::Progress(p) => println!("Progress: {} / 00:00:15", p.time),
        FfmpegEvent::OutputFrame(frame) => println!(
            "Received frame: time {:?} fmt {:?} size {}x{}",
            frame.timestamp, frame.pix_fmt, frame.width, frame.height
        ),
        evt => println!("Event: {evt:?}"),
    });

The difficult part is that we need to provide ffmpeg with a format that it can stream in on stdin. In this snippet this is done with format("h264"), so ffmpeg expects the .h264 format which is a stream of NAL units in Annex B format (== replace NAL lengths with start headers). Also, at every IDR frame, sequence parameter sets (SPS) and frame parameter sets (FPS) need to be inserted:

fn write_sample_to_nalu_stream(
    avc_box: &re_mp4::Avc1Box,
    nalu_stream: &mut dyn std::io::Write,
    sample: &re_mp4::Sample,
    video_track_data: &[u8],
    state: &mut NaluStreamState,
) -> Result<(), Box<dyn std::error::Error>> {
    let avcc = &avc_box.avcc;

    // Append SPS & PPS NAL unit whenever encountering an IDR frame unless the previous frame was an IDR frame.
    // TODO(andreas): Should we detect this rather from the NALU stream rather than the samples?
    if sample.is_sync && !state.previous_frame_was_idr {
        for sps in (&avcc.sequence_parameter_sets).iter() {
            nalu_stream.write_all(&NAL_START_CODE)?;
            nalu_stream.write_all(&sps.bytes)?;
        }
        for pps in (&avcc.picture_parameter_sets).iter() {
            nalu_stream.write_all(&NAL_START_CODE)?;
            nalu_stream.write_all(&pps.bytes)?;
        }
        state.previous_frame_was_idr = true;
    } else {
        state.previous_frame_was_idr = false;
    }

    // A single sample, may consist of multiple NAL units, each of which need our special treatment.
    // (most of the time it's 1:1, but there might be extra NAL units for info, especially at the start)
    let mut buffer_offset = sample.offset as usize;
    let sample_end = buffer_offset + sample.size as usize;
    while buffer_offset < sample_end {
        // Each NAL unit in mp4 is prefixed with a length prefix.
        // In Annex B this doesn't exist.
        let length_prefix_size = avcc.length_size_minus_one as usize + 1;

        // TODO: improve the error handling here.
        let nal_unit_size = match length_prefix_size {
            4 => u32::from_be_bytes(
                video_track_data[buffer_offset..(buffer_offset + 4)]
                    .try_into()
                    .unwrap(),
            ) as usize,
            2 => u16::from_be_bytes(
                video_track_data[buffer_offset..(buffer_offset + 2)]
                    .try_into()
                    .unwrap(),
            ) as usize,
            1 => video_track_data[buffer_offset] as usize,
            _ => panic!("invalid length prefix size"),
        };
        //println!("nal unit size: {}", nal_unit_size);

        if (sample.size as usize) < nal_unit_size {
            panic!(
                "sample size {} is smaller than nal unit size {nal_unit_size}",
                sample.size
            );
        }

        nalu_stream.write_all(&NAL_START_CODE)?;
        let data_start = buffer_offset + length_prefix_size; // Skip the size.
        let data_end = buffer_offset + nal_unit_size + length_prefix_size;
        let data = &video_track_data[data_start..data_end];

        // Note that we don't have to insert "emulation prevention bytes" since mp4 NALU still use them.
        // (unlike the NAL start code, the preventation bytes are part of the NAL spec!)

        nalu_stream.write_all(data)?;

        buffer_offset = data_end;
    }

    Ok(())
}

Open questions:

  • Does this work with H.265 as well? It uses the same formats overall
  • What about other codecs?
  • Does this approach scale well with other formats?
  • ffmpeg allows arbitrary output formats. Can we be clever about what to pick? E.g. if the decoder internally gives us YUV420 (that's most of the time but not always!), we should pass YUV420 on instead of rgb24 and processes this directly.
@emilk emilk added enhancement New feature or request 👀 needs triage This issue needs to be triaged by the Rerun team 🎞️ video and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Oct 7, 2024
This was referenced Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request 🎞️ video
Projects
None yet
Development

No branches or pull requests

1 participant