Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets: Handle converting int16 audio data in VoiceSample. #26

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

shaper
Copy link
Contributor

@shaper shaper commented Jun 12, 2024

We saw VoiceSample failing the assert on float32 audio data when playing around with the Gradio infer app and submitting an mp3 file. We didn't dig deeper into Gradio (I'm sure it's possible to alter/convert there as well), but it seems potentially useful for VoiceSample to handle int16 audio data on top of what it already handles.

Copy link
Contributor

@farzadab farzadab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

ultravox/data/datasets.py Outdated Show resolved Hide resolved
ultravox/data/datasets.py Show resolved Hide resolved
Copy link
Contributor

@juberti juberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's easy, suggest adding a test to datasets_test.py, a la the existing create_sample.

ultravox/data/datasets.py Outdated Show resolved Hide resolved
ultravox/data/datasets.py Show resolved Hide resolved
@shaper
Copy link
Contributor Author

shaper commented Jun 13, 2024

If it's easy, suggest adding a test to datasets_test.py, a la the existing create_sample.

Added tests, using some type hints in the test resulted in the need to enhance VoiceSample.audio type from audio: Optional[np.ndarray] = None to audio: Optional[NDArray[np.float32]] = None which I think is beneficial (and didn't create any other type issues elsewhere).

Will wait a bit to merge in case there's feedback on the tests.

We saw `VoiceSample` failing the assert on `float32` audio data when
playing around with the Gradio infer app and submitting an `mp3`
file. We didn't dig deeper into Gradio (I'm sure it's possible to
alter/convert there as well), but it seems potentially useful for
`VoiceSample` to handle `int16` and `int32` audio data on top of what
it already handles.
@shaper
Copy link
Contributor Author

shaper commented Jun 13, 2024

@juberti need you or @farzadab to land as I don't have write access. Excitement! Thanks for the onboarding guidance.

@farzadab farzadab merged commit 1aa11bf into fixie-ai:main Jun 13, 2024
1 check passed
@farzadab
Copy link
Contributor

Merged and gave you write access. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants