Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

标志数据错误问题 #48

Open
xyx361100238 opened this issue Mar 29, 2023 · 1 comment
Open

标志数据错误问题 #48

xyx361100238 opened this issue Mar 29, 2023 · 1 comment

Comments

@xyx361100238
Copy link

xyx361100238 commented Mar 29, 2023

你好,我对wenetspeech数据抽检了一小部分音频和标注文件,发现标注有很多是错误的:
Y0000000768_10jLYDtPEpg_S00000.wav
原:中国工商银行在国账市场上
正:中国工商银行在国际市场上
Y0000000768_10jLYDtPEpg_S00004.wav
原:我们整个的银行体系已经从技术角皮续产了
正:我们整个的银行体系从技术角度已经续产了

备注:以上音频已经根据切分好的以sid命名的音频文件

这种情况咱们处理,人工筛选成本有点太高了

@robin1001
Copy link
Contributor

从你的抽检看下来,大概是什么样错误比例?因为数据是自动化标注来的,本身有一定的错误率,我们通过自动化算法删选出来高置信度的,但总有一部分漏网之鱼。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants