-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement AudioRecord support for audio track #229
Conversation
// This demo fragment requires Android M or newer, in order to support reading data from | ||
// AudioRecord in a non-blocking way. Let's double check that the current device supports | ||
// this. | ||
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.M) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement could be potentially removed in the future, but would require additional changes in AudioRecordMediaSource
. We would need something like a ring buffer, with a separate thread reading via the older/blocking api and updating the buffer. Then we would read from this buffer in a non-blocking way. The downside is obviously additional complexity along with increased memory usage. For now, I thought requiring Android M or newer was better, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for anything that complicated, just leave things as is. Demo app is supposed to be a space for easy experimentation and sample code, not a prod ready app. What you are doing here is exactly what it is for.
// Create a single (synthetic) video track to ensure our output is playable by the demo | ||
// app. We only use 1 second of video (a solid color) so the duration of the video will | ||
// likely depend on the length of the audio recording. | ||
VideoTrackFormat videoTrackFormat = new VideoTrackFormat(0, MimeType.VIDEO_AVC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To play a video-less video in the Demo app required additional changes. Therefore, I went with a simple approach of adding a short/synthetic video stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is totally fine.
// need to leave some buffer for the AudioRecord to continue to queue too. We also don't | ||
// want to block waiting for more data, we'll take whatever is available. | ||
val targetSize = (bufferSize / 2).coerceAtMost(buffer.capacity()) | ||
val readBytes = audioRecord?.read(buffer, targetSize, AudioRecord.READ_NON_BLOCKING) ?: -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To correctly satisfy MediaSource
contract we should use offset
value here and read into buffer
at that value: https://developer.android.com/reference/android/media/MediaExtractor#readSampleData(java.nio.ByteBuffer,%20int)
This is working because offset
is 0, but it might not always be. This can be in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how is using AudioRecord.READ_NON_BLOCKING
working? readSampleData
is supposed to be a blocking call. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To correctly satisfy MediaSource contract we should use offset value here and read into buffer at that value
Nice catch. I could probably do something around slicing the buffer at the offset before passing in. Will also need to make sure the targetSize
parameter takes into account the potentially non-zero start.
Also, how is using AudioRecord.READ_NON_BLOCKING working?
So, my understanding is that the two parameters mean the following:
AudioRecord.READ_NON_BLOCKING
: Read the as much of the existing recorded data that is available right now.AudioRecord.READ_BLOCKING
: Read the recorded data once "enough" is available. The definition of enough, is some internal target buffer size.
In this primitive scenario, where we have a synthetic video stream, or you are stitching together with a previously recorded video, it won't really make a difference. However, when using another "live" media source (like the future external surface), we need to avoid blocking for "too long". Since it appears a shared thread is used to read all MediaSources, if one blocks and waits internally for some buffer to be filled, it could result in the other live source skipping a frame.
Another option could be to introduce separate threading for media sources, but that seems overly complex compared to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explanation, now this makes sense. So basically we are telling AudioRecord
not to wait until it fills its internal buffer and fill ours with whatever it has at the moment, synchronously. This makes a lot of sense and should work - as long as size is not -1 AudioTrackTranscoder
should keep going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, exactly.
companion object { | ||
private val TAG = AudioRecordMediaSource::class.simpleName | ||
|
||
private const val MEDIA_FORMAT_PCM_KEY = "pcm-encoding" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private val
s don't really have to be inside companion object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Have checked a few other files and noted that you move them before the class definition (e.g. AudioOverlayFilter
). Will copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Kotlin doesn't really like static/companion things. Better use const val
in class.
private const val AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT | ||
private const val BYTES_PER_SAMPLE = 2 | ||
|
||
const val DEFAULT_AUDIO_SOURCE = MediaRecorder.AudioSource.MIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make these public
explicitly? They usually should be for libraries. (I should add that check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to add public
but Android Studio will complain (since that's what it is by default). I don't see any other public const val
in the code base, but no objections to adding if preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then you can keep it as is. I will add library style enforcement later, perhaps.
class AudioRecordMediaSource( | ||
private val audioSource: Int = DEFAULT_AUDIO_SOURCE, | ||
private val sampleRate: Int = DEFAULT_SAMPLE_RATE, | ||
private val channelConfig: Int = DEFAULT_CHANNEL_CONFIG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it is possible to easily mismatch channelConfig
and channelCount
, which would cause problems, I presume. Should we just accept channelConfig
and derive channelCount
from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@izzytwosheds - What about we change to force either mono
or stereo
(either an enum, or a boolean, like isStereo
). That way, we can internally manage the config vs count and only allow consumers to configure with known/supported configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this:
class AudioRecordMediaSource(
private val audioSource: Int = DEFAULT_AUDIO_SOURCE,
private val sampleRate: Int = DEFAULT_SAMPLE_RATE,
private val isStereo: Boolean = false
) : MediaSource {
private val channelCount: Int by lazy {
return@lazy if (isStereo) 2 else 1
}
private val channelConfig: Int by lazy {
return@lazy if (isStereo) AudioFormat.CHANNEL_IN_STEREO else AudioFormat.CHANNEL_IN_MONO
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I would probably keep the parameter an integer channelCount
- masks internal AudioConfig
implementation and is more future-proof (who knows, maybe someday Android devices will record a 6 channel sound?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will tweak 👍
// Create a single (synthetic) video track to ensure our output is playable by the demo | ||
// app. We only use 1 second of video (a solid color) so the duration of the video will | ||
// likely depend on the length of the audio recording. | ||
VideoTrackFormat videoTrackFormat = new VideoTrackFormat(0, MimeType.VIDEO_AVC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is totally fine.
// This demo fragment requires Android M or newer, in order to support reading data from | ||
// AudioRecord in a non-blocking way. Let's double check that the current device supports | ||
// this. | ||
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.M) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for anything that complicated, just leave things as is. Demo app is supposed to be a space for easy experimentation and sample code, not a prod ready app. What you are doing here is exactly what it is for.
2c70262
to
2e8586b
Compare
private val sampleRate: Int = DEFAULT_SAMPLE_RATE, | ||
private val isStereo: Boolean = false | ||
) : MediaSource { | ||
private val channelCount: Int by lazy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much are we saving by using lazy
here. There is no complex class initialization going on. I think we can remove all lazy
intiatizations here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I think I just got carried away :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been there, done that... many times... :)
b160e8c
to
d93e7e5
Compare
} | ||
|
||
companion object { | ||
private const val REQUEST_AUDIO_RECORD_PERMISSION = 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nip: we can probably move this out of companion object
as well
} | ||
} | ||
|
||
private val channelConfig = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could probably do something like this here:
private val channelConfig = when (channelCount) {
1 -> AudioFormat.CHANNEL_IN_MONO
2 -> AudioFormat.CHANNEL_IN_STEREO
else -> throw IllegalArgumentException("Unsupported channel count configuration")
}
This would get rid of init and make AudioFormat
selection more explicit and readable.
if (channelCount == 1) AudioFormat.CHANNEL_IN_MONO else AudioFormat.CHANNEL_IN_STEREO | ||
|
||
// Compute the appropriate buffer size based upon the audio configuration. | ||
private val bufferSize: Int by lazy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for lazy
} | ||
|
||
// Compute the number of bytes per microsecond of audio. | ||
private val bytesPerUs: Double by lazy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for lazy
const val DEFAULT_SAMPLE_RATE = 44100 | ||
const val DEFAULT_CHANNEL_COUNT = 1 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nip: add empty last line
d93e7e5
to
a94347f
Compare
ba03025
to
3d130b2
Compare
3d130b2
to
8420d86
Compare
I'm hoping to introduce Camera support as an input into Litr. This would allow scenarios where the raw audio/video is not sourced from a static file, but instead coming from a "live" source. I am planning the following, separate, pull requests:
This first PR covers Audio Record Support. It introduces a new
AudioRecordMediaSource
class which is responsible for taking raw PCM fromAudioRecord
, computing the appropriate PTS and feeding into the rest of the pipeline via theMediaSource
interface. For now, the Demo app is simply muxing this together with a blank/red video stream. In future updates, this will be combined with a Camera2 integration that will merge recorded video and audio into the same file.