We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
对多人对话的音频,想区分不同的人,在哪个时间区间说了什么话。该使用语音分类还是声纹识别呢?如何实现呢?谢谢。
The text was updated successfully, but these errors were encountered:
SV + ASR
Sorry, something went wrong.
多谢多谢!尝试了一下,方案是: 1 切分语音片段 --> 2 声纹识别(比较切片的声纹 区分不同的说话人 但比较耗时) --> 3 结合语音片段 和 整个音频的asr结果
又有了新问题: 1 不能很好的分割音频片段(即:一个片段只包含一个人的语音) 2 对电话录音,声纹识别的效果似乎不太好(2人以上的对话 处理起来也很耗时) 3 asr的识别结果(无时间戳) 和 音频片段无法很好对应起来 麻烦多多指点一下,我的方案不合理的部分,或者能更具体的说一下咱们paddle的方案吗? 万分感谢!
https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ami/sd0 speaker diarization,区分不同的说话人, #1850 可以看下这个pr,time stamp已经添加。所以结合起来做。ps 电话录音,看具体是什么问题,杂音是否很多,还耗时在哪里,都可以说一下。
zh794390558
SmileGoat
No branches or pull requests
对多人对话的音频,想区分不同的人,在哪个时间区间说了什么话。该使用语音分类还是声纹识别呢?如何实现呢?谢谢。
The text was updated successfully, but these errors were encountered: