We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,我对wenetspeech数据抽检了一小部分音频和标注文件,发现标注有很多是错误的: Y0000000768_10jLYDtPEpg_S00000.wav 原:中国工商银行在国账市场上 正:中国工商银行在国际市场上 Y0000000768_10jLYDtPEpg_S00004.wav 原:我们整个的银行体系已经从技术角皮续产了 正:我们整个的银行体系从技术角度已经续产了
备注:以上音频已经根据切分好的以sid命名的音频文件
这种情况咱们处理,人工筛选成本有点太高了
The text was updated successfully, but these errors were encountered:
从你的抽检看下来,大概是什么样错误比例?因为数据是自动化标注来的,本身有一定的错误率,我们通过自动化算法删选出来高置信度的,但总有一部分漏网之鱼。
Sorry, something went wrong.
No branches or pull requests
你好,我对wenetspeech数据抽检了一小部分音频和标注文件,发现标注有很多是错误的:
Y0000000768_10jLYDtPEpg_S00000.wav
原:中国工商银行在国账市场上
正:中国工商银行在国际市场上
Y0000000768_10jLYDtPEpg_S00004.wav
原:我们整个的银行体系已经从技术角皮续产了
正:我们整个的银行体系从技术角度已经续产了
备注:以上音频已经根据切分好的以sid命名的音频文件
这种情况咱们处理,人工筛选成本有点太高了
The text was updated successfully, but these errors were encountered: