A pipeline for converting speech to text from noisy environment #2356

payamsash · 2024-09-26T19:53:26Z

payamsash
Sep 26, 2024

Dear

I am working on converting the speech of a specific individual into text from audio that contains both background noise and speech from other people. Could you advise on the appropriate pipeline to achieve clear and accurate transcription? Should I perform steps such as noise reduction and speaker isolation, or can advanced models handle the raw audio data effectively?

Best,
Payam

samarasimhapeyala · 2024-10-29T06:05:03Z

samarasimhapeyala
Oct 29, 2024

You should preprocess the audio files with noise reduction or suppression before passing to the whisper. There is no such perfect pipeline. Better mechanisms are demucs https://github.com/facebookresearch/demucs for separation of tracks, noise reduce https://github.com/timsainb/noisereduce for noise suppression and https://github.com/haoheliu/voicefixer for speech enhancement.

2 replies

gongouveia Oct 31, 2024

@samarasimhapeyala @payamsash from my experience I don't agree with this comment. Pre processing audio with demucs highly degrades the initial signal, although hearing it is great, the signal is degraded and with some artifacts in phase.
I recommend to fine tune it with noisy audio.

samarasimhapeyala Nov 1, 2024

@gongouveia I agree with the point that the audio signal degradation and artifacts with the pre-processing steps but hoping this mayn't affect much to the transcription.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A pipeline for converting speech to text from noisy environment #2356

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

A pipeline for converting speech to text from noisy environment #2356

payamsash Sep 26, 2024

Replies: 1 comment · 2 replies

samarasimhapeyala Oct 29, 2024

gongouveia Oct 31, 2024

samarasimhapeyala Nov 1, 2024

payamsash
Sep 26, 2024

Replies: 1 comment 2 replies

samarasimhapeyala
Oct 29, 2024