Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AudioRecord support for audio track #229

Merged
merged 1 commit into from
Dec 21, 2022

Conversation

IanDBird
Copy link
Contributor

I'm hoping to introduce Camera support as an input into Litr. This would allow scenarios where the raw audio/video is not sourced from a static file, but instead coming from a "live" source. I am planning the following, separate, pull requests:

  • Audio Record Support: Ability to pull raw PCM from an audio source (e.g. microphone) to mux with video from another source
  • External Surface Support: Ability to expose the raw surface that is used by the filter pipeline, allowing some other component (e.g. Camera2, CameraX, 3rd party library) to render directly into this. This will essentially bypass the Demux/Decode steps of the pipeline.
  • Camera2/CameraX Support: Helper classes/source for most common camera source scenarios.

This first PR covers Audio Record Support. It introduces a new AudioRecordMediaSource class which is responsible for taking raw PCM from AudioRecord, computing the appropriate PTS and feeding into the rest of the pipeline via the MediaSource interface. For now, the Demo app is simply muxing this together with a blank/red video stream. In future updates, this will be combined with a Camera2 integration that will merge recorded video and audio into the same file.

// This demo fragment requires Android M or newer, in order to support reading data from
// AudioRecord in a non-blocking way. Let's double check that the current device supports
// this.
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.M) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement could be potentially removed in the future, but would require additional changes in AudioRecordMediaSource. We would need something like a ring buffer, with a separate thread reading via the older/blocking api and updating the buffer. Then we would read from this buffer in a non-blocking way. The downside is obviously additional complexity along with increased memory usage. For now, I thought requiring Android M or newer was better, at least for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for anything that complicated, just leave things as is. Demo app is supposed to be a space for easy experimentation and sample code, not a prod ready app. What you are doing here is exactly what it is for.

// Create a single (synthetic) video track to ensure our output is playable by the demo
// app. We only use 1 second of video (a solid color) so the duration of the video will
// likely depend on the length of the audio recording.
VideoTrackFormat videoTrackFormat = new VideoTrackFormat(0, MimeType.VIDEO_AVC);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To play a video-less video in the Demo app required additional changes. Therefore, I went with a simple approach of adding a short/synthetic video stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is totally fine.

// need to leave some buffer for the AudioRecord to continue to queue too. We also don't
// want to block waiting for more data, we'll take whatever is available.
val targetSize = (bufferSize / 2).coerceAtMost(buffer.capacity())
val readBytes = audioRecord?.read(buffer, targetSize, AudioRecord.READ_NON_BLOCKING) ?: -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To correctly satisfy MediaSource contract we should use offset value here and read into buffer at that value: https://developer.android.com/reference/android/media/MediaExtractor#readSampleData(java.nio.ByteBuffer,%20int)
This is working because offset is 0, but it might not always be. This can be in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how is using AudioRecord.READ_NON_BLOCKING working? readSampleData is supposed to be a blocking call. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To correctly satisfy MediaSource contract we should use offset value here and read into buffer at that value

Nice catch. I could probably do something around slicing the buffer at the offset before passing in. Will also need to make sure the targetSize parameter takes into account the potentially non-zero start.

Also, how is using AudioRecord.READ_NON_BLOCKING working?

So, my understanding is that the two parameters mean the following:

  • AudioRecord.READ_NON_BLOCKING: Read the as much of the existing recorded data that is available right now.
  • AudioRecord.READ_BLOCKING: Read the recorded data once "enough" is available. The definition of enough, is some internal target buffer size.

In this primitive scenario, where we have a synthetic video stream, or you are stitching together with a previously recorded video, it won't really make a difference. However, when using another "live" media source (like the future external surface), we need to avoid blocking for "too long". Since it appears a shared thread is used to read all MediaSources, if one blocks and waits internally for some buffer to be filled, it could result in the other live source skipping a frame.

Another option could be to introduce separate threading for media sources, but that seems overly complex compared to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation, now this makes sense. So basically we are telling AudioRecord not to wait until it fills its internal buffer and fill ours with whatever it has at the moment, synchronously. This makes a lot of sense and should work - as long as size is not -1 AudioTrackTranscoder should keep going.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, exactly.

companion object {
private val TAG = AudioRecordMediaSource::class.simpleName

private const val MEDIA_FORMAT_PCM_KEY = "pcm-encoding"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private vals don't really have to be inside companion object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Have checked a few other files and noted that you move them before the class definition (e.g. AudioOverlayFilter). Will copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Kotlin doesn't really like static/companion things. Better use const val in class.

private const val AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT
private const val BYTES_PER_SAMPLE = 2

const val DEFAULT_AUDIO_SOURCE = MediaRecorder.AudioSource.MIC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make these public explicitly? They usually should be for libraries. (I should add that check)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to add public but Android Studio will complain (since that's what it is by default). I don't see any other public const val in the code base, but no objections to adding if preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then you can keep it as is. I will add library style enforcement later, perhaps.

class AudioRecordMediaSource(
private val audioSource: Int = DEFAULT_AUDIO_SOURCE,
private val sampleRate: Int = DEFAULT_SAMPLE_RATE,
private val channelConfig: Int = DEFAULT_CHANNEL_CONFIG,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it is possible to easily mismatch channelConfig and channelCount, which would cause problems, I presume. Should we just accept channelConfig and derive channelCount from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@izzytwosheds - What about we change to force either mono or stereo (either an enum, or a boolean, like isStereo). That way, we can internally manage the config vs count and only allow consumers to configure with known/supported configurations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

class AudioRecordMediaSource(
        private val audioSource: Int = DEFAULT_AUDIO_SOURCE,
        private val sampleRate: Int = DEFAULT_SAMPLE_RATE,
        private val isStereo: Boolean = false
) : MediaSource {

    private val channelCount: Int by lazy { 
        return@lazy if (isStereo) 2 else 1
    }
    
    private val channelConfig: Int by lazy {
        return@lazy if (isStereo) AudioFormat.CHANNEL_IN_STEREO else AudioFormat.CHANNEL_IN_MONO
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I would probably keep the parameter an integer channelCount - masks internal AudioConfig implementation and is more future-proof (who knows, maybe someday Android devices will record a 6 channel sound?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will tweak 👍

// Create a single (synthetic) video track to ensure our output is playable by the demo
// app. We only use 1 second of video (a solid color) so the duration of the video will
// likely depend on the length of the audio recording.
VideoTrackFormat videoTrackFormat = new VideoTrackFormat(0, MimeType.VIDEO_AVC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is totally fine.

// This demo fragment requires Android M or newer, in order to support reading data from
// AudioRecord in a non-blocking way. Let's double check that the current device supports
// this.
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.M) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for anything that complicated, just leave things as is. Demo app is supposed to be a space for easy experimentation and sample code, not a prod ready app. What you are doing here is exactly what it is for.

private val sampleRate: Int = DEFAULT_SAMPLE_RATE,
private val isStereo: Boolean = false
) : MediaSource {
private val channelCount: Int by lazy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much are we saving by using lazy here. There is no complex class initialization going on. I think we can remove all lazy intiatizations here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I think I just got carried away :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been there, done that... many times... :)

@IanDBird IanDBird force-pushed the audio-record branch 3 times, most recently from b160e8c to d93e7e5 Compare December 20, 2022 19:02
}

companion object {
private const val REQUEST_AUDIO_RECORD_PERMISSION = 14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nip: we can probably move this out of companion object as well

}
}

private val channelConfig =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably do something like this here:
private val channelConfig = when (channelCount) {
1 -> AudioFormat.CHANNEL_IN_MONO
2 -> AudioFormat.CHANNEL_IN_STEREO
else -> throw IllegalArgumentException("Unsupported channel count configuration")
}

This would get rid of init and make AudioFormat selection more explicit and readable.

if (channelCount == 1) AudioFormat.CHANNEL_IN_MONO else AudioFormat.CHANNEL_IN_STEREO

// Compute the appropriate buffer size based upon the audio configuration.
private val bufferSize: Int by lazy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for lazy

}

// Compute the number of bytes per microsecond of audio.
private val bytesPerUs: Double by lazy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for lazy

const val DEFAULT_SAMPLE_RATE = 44100
const val DEFAULT_CHANNEL_COUNT = 1
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nip: add empty last line

@IanDBird IanDBird force-pushed the audio-record branch 2 times, most recently from ba03025 to 3d130b2 Compare December 20, 2022 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants