Replies: 1 comment 1 reply
-
Hi @kalle07 So what the ttsdiff is doing, is using the Whisper model to look at the original text that you input to see if matches with the generated TTS wav files. Anything that doesn't match up is flagged within the interface. The documentation on this is here: https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-analyzing-generated-tts-for-errors RE - so if i have a text answer let it be 20 words, the generated sound-file cant be longer than aprox 20sec. RE - or can you check the soundfile for silence more than 3sec. Does this cover what you asked? Thank |
Beta Was this translation helpful? Give feedback.
-
TTS Generator - TTSDiff now scans generated text and TTS for errors.
what doeas that mean ?
and if your answer is totaly different maybe an idea:
i know maybe it is not possible to check that ... but it becomes more important.
so if i have a text answer let it be 20 words, the generated sound-file cant be longer than aprox 20sec.
or can you check the soundfile for silince more than 3sec.
or internal iam shure you can play that soundfile (via whisper_stt) and compare it to the written words (but than it need much time), maybe for some fintuning, learning ?!?
you know what i mean ?
Beta Was this translation helpful? Give feedback.
All reactions