Replies: 1 comment 2 replies
-
You should preprocess the audio files with noise reduction or suppression before passing to the whisper. There is no such perfect pipeline. Better mechanisms are demucs https://github.com/facebookresearch/demucs for separation of tracks, noise reduce https://github.com/timsainb/noisereduce for noise suppression and https://github.com/haoheliu/voicefixer for speech enhancement. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear
I am working on converting the speech of a specific individual into text from audio that contains both background noise and speech from other people. Could you advise on the appropriate pipeline to achieve clear and accurate transcription? Should I perform steps such as noise reduction and speaker isolation, or can advanced models handle the raw audio data effectively?
Best,
Payam
Beta Was this translation helpful? Give feedback.
All reactions