Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model output #5

Open
Jmallins opened this issue Jan 23, 2021 · 2 comments
Open

Model output #5

Jmallins opened this issue Jan 23, 2021 · 2 comments

Comments

@Jmallins
Copy link

Jmallins commented Jan 23, 2021

Hi,

I was wondering if it was possible to get the models (transcription) output on the test set?

Thank you!

@DewiBrynJones
Copy link
Collaborator

Helo!

The models we've just released at https://github.com/techiaith/docker-deepspeech-cy/releases/tag/21.01 give the following output with the testset

tf-docker /DeepSpeech > ./bin/bangor_welsh/evalutate.sh -t /data/bangor/testsets/data/trawsgrifio/arddweud_200617/deepspeech.csv -s /data/bangor/lm/trawsgrifiwr/kenlm.scorer

....

Test on /data/bangor/testsets/data/trawsgrifio/arddweud_200617/deepspeech.csv - WER: 0.127551, CER: 0.054299, loss: 28.399717
--------------------------------------------------------------------------------
Best WER:
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 47.123161
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/3ab609ce009994539322f5598a11d42ae1c74d89b9b2b3434d7413b3f298238e.wav
 - src: "cyfathrach fiolegol rhwng gwryw a benyw neu anifail gwrywaidd a benywaidd yw rhyw"
 - res: "cyfathrach fiolegol rhwng gwryw a benyw neu anifail gwrywaidd a benywaidd yw rhyw"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 45.556278
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/16bd9a4454c26f8330d3a5f4f6b77c442529d5516f0d32de7ad1eb873c3f2f48.wav
 - src: "ei unig addysg yn blentyn ifanc yng nghymru oedd yn yr ysgol sul"
 - res: "ei unig addysg yn blentyn ifanc yng nghymru oedd yn yr ysgol sul"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 38.195763
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/4cd002f71a534443da15f727b1beaf4ae11ae50a08fe830fd448dfd1918f3ae6.wav
 - src: "bywgraffiad john ac alun gan glyn roberts yw john ac alun"
 - res: "bywgraffiad john ac alun gan glyn roberts yw john ac alun"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 35.653397
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/afad548b16f1f2d204ec9c3d4d49971d4b5385f4ef7b7c722ad6b0ab82b51c73.wav
 - src: "llyfryn cryno a gyhoeddwyd dan nawdd sefydliad cwricwlwm prydain"
 - res: "llyfryn cryno a gyhoeddwyd dan nawdd sefydliad cwricwlwm prydain"
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 34.879398
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/65cc0ffbd87d34f9f401fd5046b5b989c4c66d1caaaf0fe0d3041c4605034a15.wav
 - src: "cafodd y porthladd gweledol ymateb cadarnhaol gan adolygwyr"
 - res: "cafodd y porthladd gweledol ymateb cadarnhaol gan adolygwyr"
--------------------------------------------------------------------------------
Median WER:
--------------------------------------------------------------------------------
WER: 0.000000, CER: 0.000000, loss: 3.886697
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/e38724b206df5856ff8c93ccf4f4526ae886679238c86aedbf699993401ef30a.wav
 - src: "mae'r pedwar cefnder yn hoff iawn ohoni"
 - res: "mae'r pedwar cefnder yn hoff iawn ohoni"
--------------------------------------------------------------------------------
WER: 0.076923, CER: 0.047619, loss: 44.489361
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/8d19582b46e5a64afc3a731842442fb8359fcca180f754810bf4ec94adf0c0b6.wav
 - src: "er gwaethaf yr uchelgeisiau hyn daeth llenyddiaeth i'r amlwg fel ei phrif ddiddordeb"
 - res: "er gwaethaf yr uchelgeisiau hyn daeth llenyddiaeth i'r amlwg ei phrif ddiddordeb"
--------------------------------------------------------------------------------
WER: 0.090909, CER: 0.054545, loss: 26.258184
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/577b39baf6202d4b64b8a541c7db18805fd86e37ca5002c56ef03ccd773dbe94.wav
 - src: "treuliodd y cwpl hefyd lawer o amser yn genefa a pharis"
 - res: "treuliodd y bobl hefyd lawer o amser yn genefa a pharis"
--------------------------------------------------------------------------------
WER: 0.090909, CER: 0.019231, loss: 23.268820
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/5a9121a3cc11d92d6d6ea88c2a1a512e9836c202020cd5d345e2f200b90f948d.wav
 - src: "rhan o gyfres o lyfrau sy'n rhoi sylw i flodau cymru"
 - res: "rhan o gyfres o lyfrau sy'n rhoi sylw i fodau cymru"
--------------------------------------------------------------------------------
WER: 0.100000, CER: 0.037736, loss: 35.211090
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/9309735ac2992394fac1208d71a7809fcdb0f949c4f1e3994d8aa453e71fe75e.wav
 - src: "cyfrol o gerddi gan donald evans yw y cyntefig cyfoes"
 - res: "cyfrol o gerddi gan donald evans yw cyntefig cyfoes"
--------------------------------------------------------------------------------
Worst WER:
--------------------------------------------------------------------------------
WER: 0.333333, CER: 0.163265, loss: 27.433905
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/a180fdd8ac1254273c0d5e5aaa875cca6d62485c9130744d17b55b168ab77b03.wav
 - src: "mae'n fwy amlwg mewn merched nag ydyw mewn dynion"
 - res: "mae'n fwy amlwg fel merched cadw mewn dynion"
--------------------------------------------------------------------------------
WER: 0.333333, CER: 0.106383, loss: 14.075192
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/8e2d53b7e82a5f4d0abc696a196667d3b9952cfcc3f1b3bc1d3ffbe93302bb19.wav
 - src: "mae'r nerf genol yn sicrhau teimlad yn y daflod"
 - res: "mae'r nerf dynol yn sicrhau teimlad yn dafod"
--------------------------------------------------------------------------------
WER: 0.384615, CER: 0.271186, loss: 80.174637
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/9ca9b602defd00b1bb9ffc37deba0efe2b9d7e598fbc8fc01a0fb844eb229e7c.wav
 - src: "bu i fab john lloyd ac un o ferched trefor briodi ei gilydd"
 - res: "bu ei fab droi ac un o ferched trefor roi ei diwedd"
--------------------------------------------------------------------------------
WER: 0.444444, CER: 0.211538, loss: 61.126965
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/97a1f39d89dff072f548c51349250cef1300bf29bef910b0a08fa0b8eea2ba40.wav
 - src: "gellir ei atal rhag digwydd drwy wisgo dillad cynnes"
 - res: "gellir atal ac digwydd gwisgo dillad cynnes"
--------------------------------------------------------------------------------
WER: 0.500000, CER: 0.162162, loss: 53.349140
 - wav: file:///data/bangor/testsets/data/trawsgrifio/arddweud_200617/clips/a3f4579aad73733b705f2e8d1cfd07bfe34d3b9c51502acbedb61bd02d8055a8.wav
 - src: "gelwir hyn yn 'ddeddf gymudol lluosi'"
 - res: "gelwir hyn yn ddeddf gydol lloi"
--------------------------------------------------------------------------------

However, beware of the low WER. At the moment, the testset is unfortunately too similar to the training data. We hope to have more comprehensive and distinct test sets in place for future releases.

Diolch!

@Jmallins
Copy link
Author

Thank you! This is a bit lazy of me, but would it be possible for you to upload /data/bangor/testsets/data/trawsgrifio/arddweud_200617/deepspeech.csv ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants