Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add algorithms to the audio spec #23

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

youennf
Copy link
Contributor

@youennf youennf commented Oct 15, 2024

As discussed at TPAC, this PR introduces algorithms to define the behavior of AudioSession, in particular how state is handled, how it is interacting with existing audio sources/sinks or with media session.


Preview | Diff

index.bs Outdated
* A <dfn data-lt="default type" for="audio session">default type</dfn>, which is used to compute the [=audio session=] [=audio session/type=], in case of "{{AudioSessionType/auto}}".
* An <dfn>audible flag</dfn>, which is either `true` if the element is playing/recording audio or `false` otherwise.

An audio session [=audio session/element=] is an <dfn>audible element</dfn> if its [=audible flag=] is `true`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to clarify how to set the audible flag to true. Does this flag accurately reflect the actual audio output? We haven't yet discussed how to bind a media element (or other audible sources) to an audio session, which makes it somewhat confusing to define what constitutes an audible session.

Perhaps we should start by defining the criteria for determining if a session is audible and outline which types of elements can be used as sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined later on for each audio source/sink in the Audio source and sink integration.
There are 3 audio elements being defined: media element, AudioContext, microphone MediaStreamTrack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding audio sources, I believe it would be clearer to slightly mention them here and link to the section below. This approach will help readers understand the audible behavior associated with the actual source, rather than just the AudioSession element.

For the audible flag, I recommend creating a separate section to explain it comprehensively. Even in Section 6, "Audio Source and Sink Integration," the definition of "audible" is not adequately addressed.

For example, if a non-muted media element is playing a silent audio track or has its audio track volume set to near inaudible levels, should this element be considered audible?

Additionally, how should we define "audible" for non-media elements, such as AudioContext and Web Speech?

If you agree with this approach, we could file another issue and create a separate PR for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #27 and #28 to keep track of this.

index.bs Outdated
A [=top-level browsing context=] is said to have <dfn lt="audio focus" for="audio session">audio focus</dfn> if its [=selected audio session=] is not `null` and its state is {{AudioSessionState/active}}.

<div class=note>
User agents can decide whether to allow several [=top-level browsing context=] to have [=audio focus=], or to enforce that only a single [=top-level browsing context=] has [=audio focus=] at any given time.</div>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example of a situation where multiple audio focus is necessary? Is it premature to mention audio focus at this stage? It might be more effective to include a full paragraph defining what audio focus is. Perhaps we should explain it thoroughly after introducing the different types?

For instance, does an audio session with the ambient type qualify as having audio focus, given that it can play and mix with other types?

Additionally, should we consider adding an attribute to the AudioSession to indicate whether it is a selected audio session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On desktop, UAs tend to allow multiple tabs to render audio, similar to several native applications playing audio at the same time (without being ambient).

For instance, does an audio session with the ambient type qualify as having audio focus, given that it can play and mix with other types?

It does not qualify.

should we consider adding an attribute to the AudioSession to indicate whether it is a selected audio session?

We could consider it but I am a bit reluctant to do this for now until we have a good use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification, I think it would be worth to provide some examples here. But we could do that in following up PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #29.

index.bs Outdated
@@ -40,7 +72,33 @@ The {{AudioSession}} is the main interface for this API, which is accessed throu
};
</pre>

## Audio session states ## {#audio-session-types}
To create an {{AudioSession}} object in |realm|, run the following steps:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving "Extensions to the Navigator Interface" to this section to improve the clarity of the specification? First, we could establish that the AudioSession is a singleton per window, followed by the steps for creating an AudioSession.

Additionally, I would like to understand the purpose of isTypeBeingApplied. For instance, if we first set the type to A and then immediately change it to B, the current specification indicates that type B won't be applied correctly because isTypeBeingApplied will remain true for type A if the queued task hasn't executed. What is the rationale behind this mechanism? Shouldn't we allow a new type to be set for an AudioSession at all times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine reshuffling a bit the spec to make it clearer. I'd like to minimize the rebasing of this PR though.

Wrt isTypeBeingApplied, the principle is that we only update the audio session type once per event loop.
The AudioSession object's .type can be changed many times, but only one task will be run to update its audio session type, with the last type set on the AudioSession object.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine reshuffling a bit the spec to make it clearer. I'd like to minimize the rebasing of this PR though.

Sounds good, thanks!

I am fine reshuffling a bit the spec to make it clearer. I'd like to minimize the rebasing of this PR though.

It sounds like it's similar with updating a MSE source buffer? If so, should we add a updating to indicate the update is still ongoing? In addition, is there a particular reason we only want to update type once per event loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this one open until we discuss the exact algorithms.

@youennf
Copy link
Contributor Author

youennf commented Oct 18, 2024

Converting to draft until the other sections are upstreamed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants