In shot-based encoding, a video sequence is split into relatively short segments that are typically few seconds long and that have uniform spatio-temporal characteristics. Such segments are referred to as shots. Each segment is encoded on its own using a closed GoP configuration with only one key frame at the beginning of the shot. When shots are encoded using a multi-pass encoding approach, data from earlier passes are used to make decisions about the encoding settings for subsequent passes. An example of such decisions is the prediction structure to consider in subsequent passes. In SVT-AV1, the shot-based adaptive prediction structure algorithm makes use of information received from the first encoding pass to decide on a suitable prediction structure for the shot being encoded, as illustrated in Figure 1. Moreover, the first encoding pass is configured to use an IPP flat prediction structure.
Input to set_max_mini_gop_size: Generated statistics from previous encoding pass.
Output of set_max_mini_gop_size: Prediction structure for the shot.
Table 1 below summarises the invoked functions when Adaptive Prediction Structure is enabled. The process where each function is called is also indicated as well as a brief description of each function.
File | Function | Purpose |
---|---|---|
EbEncHandle | set_mini_gop_size_controls | Set the mini gop size controls |
EbEncHandle | set_max_mini_gop_size | Select the suitable prediction structure for the input video shot |
The Adaptive Prediction Structure algorithm is a threshold-based algorithm. All
used thresholds are grouped under one control function called
set_mini_gop_size_controls
. Table 2 below summarizes the control
parameters for this feature.
Flag | Level (Sequence/Picture) | Description |
---|---|---|
adaptive_enable | Sequence | 0: Off, 1: Switch at clip level, 2: Switch at GOP level |
short_shot_th | Sequence | Threshold to determine short scene |
animation_type_th | Sequence | Threshold to determine animation scene |
lm_th | Sequence | Threshold to determine low motion scene |
hm_th | Sequence | Threshold to determine high motion scene |
lfr_th | Sequence | Threshold to determine low frame rate scene |
hsa_th | Sequence | Threshold to determine high static area scene |
hmv_di_th | Sequence | Threshold to determine high mv direction scene |
lmv_di_th | Sequence | Threshold to determine low mv direction scene |
The adaptive prediction structure is invoked at the entry point of the final encoding pass as depicted in Figure 2.
Figure 2. Diagram showing where the Adaptive Prediction Structure algorithm is used in the encoder pipeline for the shot-based final encoding pass.
The Adaptive Prediction Structure algorithm aims at selecting the maximum mini-GoP-size based on the average collected statistics (generated by the IPP pass) over all input frames in the shot. The algorithm consists of the following steps:
- Run the IPP pass over the whole shot and collect statistics for each frame.
- Generate average statistics over the whole shot.
- Select the best mini-GoP size based on the collected statistics.
The algorithm tends to select a large mini-GoP size (32 frames) for low motion clips, a small mini-GoP size (8 frames) for high motion clips and the default mini-gop-size (16 frames) otherwise.
The selection of the mini-gop-size is performed as follows:
if ((low_motion_clip > lm_th) && !avoid_long_gop) then max-min_gop_size = 32
elseif (low_motion_clip > hm_th) then max-min_gop_size = 16
else max-min_gop_size = 8
where:
- low_motion_clip is the percentage of non-moving area in the shot and is determined as follows:
where:
low_motion_clip = (pcnt_inter - pcnt_motion) / (number_of_frames)
pcnt_inter
is the sum over all frames in the shot of the frame-based area percentage of blocks with inter prediction error smaller than intra prediction error.pcnt_motion
is the sum over all frames in the shot of the frame-based area percentage of blocks with inter prediction error smaller than intra prediction error and with non-zero motion vectors. The prediction errors are generated in the IPP pass.lm_th
andhm_th
are defined in Table 2 above.avoid_long_gop
is a flag to indicate corner cases where a large mini-gop-size may hurt the quality although it might be a very low motion shot as is the case with short shots or animation shots.
The Adaptive Prediction Structure is enabled by default only in the case of multi-pass encoding
(i.e., when the TWO_PASS_IPP_FINAL
multi-pass encoding mode is used).
The feature settings that are described in this document were compiled at v1.3.0 of the code and may not reflect the current status of the code. The description in this document represents an example showing how features would interact with the SVT architecture. For the most up-to-date settings, it's recommended to review the section of the code implementing this feature.