diff --git a/index.bs b/index.bs index e5a0df35..7d2afc20 100644 --- a/index.bs +++ b/index.bs @@ -1236,25 +1236,19 @@ A [=Mix Presentation=] MAY contain one or more sub-mixes. Common use cases MAY s class MixPresentationOBU() { leb128() mix_presentation_id; leb128() count_label; - for (i = 0; i < count_label; i++) { - string language_label; - } - for (i = 0; i < count_label; i++) { - MixPresentationAnnotations mix_presentation_annotations; - } + string language_label[count_label]; + string localized_presentation_annotations[count_label]; leb128() num_sub_mixes; for (i = 0; i < num_sub_mixes; i++) { leb128() num_audio_elements; for (j = 0; j < num_audio_elements; j++) { leb128() audio_element_id; - for (i = 0; i < count_label; i++) { - MixPresentationElementAnnotations mix_presentation_element_annotations; - } + string localized_element_annotations[count_label]; RenderingConfig rendering_config; - ElementMixConfig element_mix_config; + MixGainParamDefinition element_mix_gain; } - OutputMixConfig output_mix_config; + MixGainParamDefinition output_mix_gain; leb128() num_layouts; for (j = 0; j < num_layouts; j++) { @@ -1273,10 +1267,10 @@ class MixPresentationOBU() { count_label indicates the number of labels in different languages. -language_label specifies the language which both [=mix_presentation_friendly_label=] and [=audio_element_friendly_label=] are written in. It SHALL conform to [[!BCP-47]]. The same language SHALL NOT be duplicated in this loop. -- The labels in the i-th [=mix_presentation_annotations=] and [=mix_presentation_element_annotations=] SHALL be written in the language indicated by the i-th [=language_label=], where i = 0, 1, ..., [=count_label=] -1. +language_label specifies the language which both [=localized_presentation_annotations=] and [=localized_element_annotations=] are written in. It SHALL conform to [[!BCP-47]]. The same language SHALL NOT be duplicated in this loop. +- The labels of the i-th [=localized_presentation_annotations=] and [=localized_element_annotations=] SHALL be written in the language indicated by the i-th [=language_label=], where i = 0, 1, ..., [=count_label=] -1. -mix_presentation_annotations is an instance of the [=MixPresentationAnnotations()=] class, which provides informational metadata that an IA parser SHOULD refer to when selecting the [=Mix Presentation=] to use. The metadata MAY also be used by the playback system to display information to the user but is not used in the rendering or mixing process to generate the final output audio signal. +localized_presentation_annotations specifies a human-friendly label to describe this [=Mix Presentation=], which provides informational metadata that an IA parser SHOULD refer to when selecting the [=Mix Presentation=] to use. The metadata MAY also be used by the playback system to display information to the user but is not used in the rendering or mixing process to generate the final output audio signal. num_sub_mixes specifies the number of sub-mixes. It SHALL NOT be set to 0. @@ -1284,13 +1278,13 @@ class MixPresentationOBU() { audio_element_id indicates the identifier for an [=Audio Element=] which this [=Mix Presentation=] refers to. Parsers SHOULD ignore the [=Mix Presentation OBU=] with an [=Audio Element=] that they don't recognize. -mix_presentation_element_annotations is an instance of the [=MixPresentationElementAnnotations()=] class, which provides informational metadata that the playback system MAY use to display information to the user. It is not used in the rendering or mixing process to generate the final output audio signal. +localized_element_annotations specifies a human-friendly label to describe the referenced [=Audio Element=], which provides informational metadata that the playback system MAY use to display information to the user. It is not used in the rendering or mixing process to generate the final output audio signal. rendering_config is an instance of the [=RenderingConfig()=] class, which provides the metadata required for rendering the referenced [=Audio Element=]. -element_mix_config is an instance of the [=ElementMixConfig()=] class, which provides the metadata required for applying any processing to the referenced and rendered [=Audio Element=] before being summed with other processed [=Audio Element=]s. +element_mix_gain is an instance of the [=MixGainParamDefinition()=] class, which provides the metadata required for applying any processing to the referenced and rendered [=Audio Element=] before being summed with other processed [=Audio Element=]s. The [=MixGainParamDefinition()=] provides the parameter definition for the gain value that is applied to all channels of the rendered [=Audio Element=] signal. The corresponding parameter data to be provided in [=Parameter Block OBU=]s with the same [=parameter_block_obu/parameter_id=] is specified in the [=MixGainParamDefinition()=] class. -output_mix_config is an instance of the [=OutputMixConfig()=] class, which provides the metadata required for post-processing the mixed audio signal to generate the audio signal for playback. +output_mix_gain is an instance of the [=MixGainParamDefinition()=] class, which provides the metadata required for post-processing the mixed audio signal to generate the audio signal for playback. The [=MixGainParamDefinition()=] provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The corresponding parameter data to be provided in [=Parameter Block OBU=]s with the same [=parameter_block_obu/parameter_id=] is specified in the [=MixGainParamDefinition()=] class. num_layouts specifies the number of layouts for this sub-mix on which the [=loudness=] information was measured. @@ -1311,40 +1305,10 @@ If a sub-mix in a [=Mix Presentation OBU=] includes only one single scalable cha - The highest [=loudness_layout=] specified in one sub-mix is the layout that was used for authoring the sub-mix. The exception is when the [=Audio Element=] is a zero-order Ambisonics or Mono channel. - The highest [=loudness_layout=] for a zero-order Ambisonics or Mono channel [=Audio Element=] is Stereo. -mix_presentation_tags is an instance of the [=MixPresentationTags()=] class, which provides informational metadata about a Mix Presentation, in addition to [=mix_presentation_annotations=]. +mix_presentation_tags is an instance of the [=MixPresentationTags()=] class, which provides informational metadata about a Mix Presentation, in addition to [=localized_presentation_annotations=]. The [=MixPresentationTags()=] class MAY or MAY NOT be present in a [=Mix Presentation OBU=]. If the [=obu_size=] of a [=Mix Presentation OBU=] is greater than the size up to the end of [=num_sub_mixes=] loop, the [=MixPresentationTags()=] SHALL be present in the [=Mix Presentation OBU=]. For a given [=IA Sequence=] with multiple [=Mix Presentation OBU=]s, the [=MixPresentationTags()=] MAY be present in some [=Mix Presentation OBU=]s and MAY NOT be present in the other [=Mix Presentation OBU=]s. -### Mix Presentation Annotations Syntax and Semantics ### {#obu-mixpresentation-annotation} - -The MixPresentationAnnotations() class provides informational metadata about a [=Mix Presentation=]. This section specifies the syntax structure of the [=MixPresentationAnnotations()=] class. - -Syntax -``` -class MixPresentationAnnotations() { - string mix_presentation_friendly_label; -} -``` - -Semantics - -mix_presentation_friendly_label specifies a human-friendly label to describe this [=Mix Presentation=]. - - -### Mix Presentation Element Annotations Syntax and Semantics ### {#obu-mixpresentation-elementannotation} - -The MixPresentationElementAnnotations() class provides informational metadata about an [=Audio Element=] referred to a [=Mix Presentation=]. This section specifies the syntax structure of the [=MixPresentationElementAnnotations()=] class. - -Syntax -``` -class MixPresentationElementAnnotations() { - string audio_element_friendly_label; -} -``` - -Semantics - -audio_element_friendly_label specifies a human-friendly label to describe the referenced [=Audio Element=]. ### Rendering Config Syntax and Semantics ### {#syntax-rendering-config} @@ -1380,22 +1344,14 @@ Parsers encountering a reserved value of [=headphones_rendering_mode=] SHALL ign rendering_config_extension_bytes represents reserved bytes for future use. Parsers that don't understand these bytes SHOULD ignore them. -### Element Mix Config Syntax and Semantics ### {#syntax-element-mix-config} - -The ElementMixConfig() class provides metadata for any processing that needs to be applied to the rendered [=Audio Element=] signal. +### Mix Gain Parameter Defintion Syntax and Semantics ### {#syntax-mixgain-parameter-definition} The MixGainParamDefinition() class provides the parameter definition for any mix gains that need to be applied to a signal. -This section specifies the syntax structures of the [=ElementMixConfig()=] and the [=MixGainParamDefinition()=] classes. +This section specifies the syntax structures of the [=MixGainParamDefinition()=] class. Syntax -``` -class ElementMixConfig() { - MixGainParamDefinition mix_gain; -} -``` - ``` class MixGainParamDefinition() extends ParamDefinition() { signed int (16) default_mix_gain; @@ -1404,27 +1360,7 @@ class MixGainParamDefinition() extends ParamDefinition() { Semantics -mix_gain is an instance of the [=MixGainParamDefinition()=] class, which provides the parameter definition for the gain value that is applied to all channels of the rendered [=Audio Element=] signal. The corresponding parameter data to be provided in [=Parameter Block OBU=]s with the same [=parameter_block_obu/parameter_id=] is specified in the [=MixGainParamDefinition()=] class. - -default_mix_gain specifies the default mix gain value to apply when there are no [=Parameter Block OBU=]s with the same [=parameter_block_obu/parameter_id=] provided. This value is expressed in dB and SHALL be applied to all channels in the rendered [=Audio Element=]. It is stored as a 16-bit, signed, two’s complement fixed-point value with 8 fractional bits (i.e., Q7.8)([[Q-Format]]). - - -### Output Mix Config Syntax and Semantics ### {#obu-mixpresentation-outputmix} - -The OutputMixConfig() class provides metadata for any processing that needs to be applied to the mixed audio signal. This section specifies the syntax structure of the [=OutputMixConfig()=] class. - -Syntax - -``` -class OutputMixConfig() { - MixGainParamDefinition output_mix_gain; -} -``` - -Semantics - -output_mix_gain is an instance of the [=MixGainParamDefinition()=] class, which provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The corresponding parameter data to be provided in [=Parameter Block OBU=]s with the same [=parameter_block_obu/parameter_id=] is specified in the [=MixGainParameterData()=] class. - +default_mix_gain specifies the default mix gain value to apply when there are no [=Parameter Block OBU=]s with the same [=parameter_block_obu/parameter_id=] provided. This value is expressed in dB and SHALL be applied to all channels in the rendered [=Audio Element=] or the mixed audio signal. It is stored as a 16-bit, signed, two’s complement fixed-point value with 8 fractional bits (i.e., Q7.8)([[Q-Format]]). ### Layout Syntax and Semantics ### {#syntax-layout} @@ -2457,7 +2393,7 @@ The RECOMMENDED values for specific codecs are as follows: An [=IA Sequence=] MAY contain more than one [=Mix Presentation=]. [[#processing-mixpresentation-selection]] details how a [=Mix Presentation=] SHOULD be selected from multiple of them. -A [=Mix Presentation=] specifies how to render, process and mix one or more [=Audio Element=]s. Each [=Audio Element=] SHALL first be individually rendered and processed before mixing. Then, any additional processing specified by [=output_mix_config=] SHALL be applied to the mixed audio signal in order to generate the final output audio for playback. [[#processing-mixpresentation-rendering]] details how each [=Audio Element=] SHOULD be rendered, while [[#processing-mixpresentation-mixing]] details how the [=Audio Element=]s SHALL be processed and mixed. +A [=Mix Presentation=] specifies how to render, process and mix one or more [=Audio Element=]s. Each [=Audio Element=] SHALL first be individually rendered and processed before mixing. Then, any additional processing specified by [=output_mix_gain=] SHALL be applied to the mixed audio signal in order to generate the final output audio for playback. [[#processing-mixpresentation-rendering]] details how each [=Audio Element=] SHOULD be rendered, while [[#processing-mixpresentation-mixing]] details how the [=Audio Element=]s SHALL be processed and mixed. As stated in [[#architecture]], specific renderers are out of scope. The examples provided are informative. @@ -2465,7 +2401,7 @@ As stated in [[#architecture]], specific renderers are out of scope. The example When an [=IA Sequence=] contains multiple [=Mix Presentation=]s, the IA parser SHOULD select the appropriate [=Mix Presentation=] in the following order. -1. If there are any user-selectable mixes, the IA parser SHOULD select the mix, or mixes, that match the user's preferences. An example might be a mix with a specific language. [=Mix Presentation=]s MAY use [=mix_presentation_friendly_label=] to describe such mixes. +1. If there are any user-selectable mixes, the IA parser SHOULD select the mix, or mixes, that match the user's preferences. An example might be a mix with a specific language. [=Mix Presentation=]s MAY use [=localized_presentation_annotations=] to describe such mixes. 2. If there is more than one valid mix remaining, the IA parser SHOULD select an appropriate mix for rendering, in the following order. 1. If the playback device is headphones: 1. Select the mix with [=mix_presentation_obu/audio_element_id=] whose [=loudspeaker_layout=] is BINAURAL. @@ -2593,13 +2529,13 @@ After rendering all [=Audio Element=]s to a common playback layout, each [=Audio 1. If all [=Audio Element=]s do not have a common sample rate, re-sample them to a common sample rate. This specification RECOMMENDs 48 kHz. 2. If all [=Audio Element=]s do not have a common bit-depth, convert them to a common bit-depth. This specification RECOMMENDs using 16 bits. -3. Apply the per-element gain using the gain value specified in [=element_mix_config=]. +3. Apply the per-element gain using the gain value specified in [=element_mix_gain=]. - If there are no element mix gain [=Parameter Substream=]s associated with the [=Audio Element=], use the [=default_mix_gain=] value. - Else, use the [=param_data=] value provided in [=mix_gain_parameter_data=]. The rendered and processed [=Audio Element=]s SHALL then be summed. -Finally, the output mix gain SHALL be applied using the value specified in [=output_mix_config=] to generate one sub-mixed audio signal. +Finally, the output mix gain SHALL be applied using the value specified in [=output_mix_gain=] to generate one sub-mixed audio signal. - If there are no [=Parameter Block OBU=]s for the [=Parameter Substream=]s associated with the [=Mix Presentation=], use the [=default_mix_gain=] value. - Else, use the [=param_data=] value provided in [=mix_gain_parameter_data=]. @@ -3311,8 +3247,8 @@ where The [=Mix Presentation OBU=] for one single channel-based [=Audio Element=] is set as follows: - [=num_sub_mixes=]: set to 1. - [=num_audio_elements=]: set to 1. -- [=element_mix_config=]: No [=Parameter Block OBU=]s for [=element_mix_config=] and [=default_mix_gain=] = 0 dB. -- [=output_mix_config=]: No [=Parameter Block OBU=]s for [=output_mix_config=] and [=default_mix_gain=] = 0 dB. +- [=element_mix_gain=]: No [=Parameter Block OBU=]s for [=element_mix_gain=] and [=default_mix_gain=] = 0 dB. +- [=output_mix_gain=]: No [=Parameter Block OBU=]s for [=output_mix_gain=] and [=default_mix_gain=] = 0 dB. - [=num_layouts=]: set to N, where N is the number of input channel layouts. - [=loudness_layout=]: set to L(1), L(2), ..., L(N), where L(i) is the measured layout for the i-th layer and i = 1, 2, ..., N. - [=LoudnessInfo()=] for L(1), [=LoudnessInfo()=] for L(2), ..., [=LoudnessInfo()=] for L(N): loudness information of the audio rendered to to the measured layout L(i). @@ -3323,8 +3259,8 @@ NOTE: If the input channel layouts do not include Stereo, then [=num_layers=] is The [=Mix Presentation OBU=] for one single scene-based [=Audio Element=] is set as follows: - [=num_sub_mixes=]: set to 1 - [=num_audio_elements=]: set to 1 -- [=element_mix_config=]: set to [=mix_gain=] -- [=output_mix_config=]: set to [=output_mix_gain=] +- [=element_mix_gain=]: set to [=element_mix_gain=] +- [=output_mix_gain=]: set to [=output_mix_gain=] - [=num_layouts=]: set to M1, the number of layouts for which loudness information is provided. - [=loudness_layout=]: set to L(1), L(2), ..., L(M1), where L(i) is the measured layout for the i-th loudness information and i = 1, 2, ..., M1. - One of them is Stereo. @@ -3334,19 +3270,19 @@ The [=Mix Presentation OBU=] for one single scene-based [=Audio Element=] is set The [=Mix Presentation OBU=] for 2 [=Audio Element=]s is set as follows: - [=num_sub_mixes=]: set to 1 - [=num_audio_elements=]: set to 2 -- [=element_mix_config=] for each [=Audio Element=]: set to [=mix_gain=] -- [=output_mix_config=]: set to [=output_mix_gain=] +- [=element_mix_gain=] for each [=Audio Element=]: set to [=element_mix_gain=] +- [=output_mix_gain=]: set to [=output_mix_gain=] - [=num_layouts=]: set to M2, the number of layouts for which loudness information is provided. - [=loudness_layout=]: set to L(1), L(2), ..., L(M2), where L(i) is the measured layout for the i-th loudness information and i = 1, 2, ..., M2. - One of them is Stereo. - [=LoudnessInfo()=] on L(1), [=LoudnessInfo()=] on L(2), ..., [=LoudnessInfo()=] on L(M2): loudness information of the audio rendered to the measured layout L(i). - This [=Mix Presentation=] is authored using the highest [=loudness_layout=]. -#### Annex A3.1:Element Mix Config (Informative) #### {#iamfgeneration-mixpresentation-mix} +#### Annex A3.1:Element Mix Gain (Informative) #### {#iamfgeneration-mixpresentation-mix} -This section provides a guideline to generate [=element_mix_config=]. +This section provides a guideline to generate [=element_mix_gain=]. -An IA multiplexer may merge two [=IA Sequence=]s (or two [=Audio Element=]s). In this case, it adjusts the gain values for [=element_mix_config=]s as necessary to describe the desired relative gains between the [=IA Sequence=]s (or two [=Audio Element=]s) when they are summed to generate the final mix. It also ensures that the gains selected do not result in clipping when the final mix is generated. +An IA multiplexer may merge two [=IA Sequence=]s (or two [=Audio Element=]s). In this case, it adjusts the gain values for [=element_mix_gain=]s as necessary to describe the desired relative gains between the [=IA Sequence=]s (or two [=Audio Element=]s) when they are summed to generate the final mix. It also ensures that the gains selected do not result in clipping when the final mix is generated. ### Annex A4: Two Audio Elements Encoding with One Codec Config (Informative) ### {#iamfgeneration-multipleaudioelements-onecodec} @@ -3381,9 +3317,9 @@ This section provides a way to generate metadata for post-processing. This section provides a way to generate [=LoudnessInfo()=], given a [=Mix Presentation OBU=] and a [=loudness_layout=]. 1. Each [=Audio Element=] specified in the given [=Mix Presentation OBU=] is rendered to the given [=loudness_layout=]. -2. Each rendered [=Audio Element=] specified in the given [=Mix Presentation OBU=] has a gain applied using the value from [=mix_gain=] specified in its [=element_mix_config=]. +2. Each rendered [=Audio Element=] specified in the given [=Mix Presentation OBU=] has a gain applied using the value specified in [=element_mix_gain=]. 3. All rendered and processed [=Audio Element=]s specified in the given [=Mix Presentation OBU=] are summed. -3. The summed audio (i.e., [=Rendered Mix Presentation=]) has a gain applied using the value from [=mix_gain=] specified in [=output_mix_config=]. +3. The summed audio (i.e., [=Rendered Mix Presentation=]) has a gain applied using the value specified in [=output_mix_gain=]. 4. Generate [=LoudnessInfo()=] for the [=Rendered Mix Presentation=] according to [[#obu-mixpresentation-loudness]]. ## Annex B: ID Linking Scheme (Informative) ## {#idlinkingscheme}