Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

ArsenicBismuth · 2024-11-29T00:07:53Z

Hi, so I wanted to try some free TTS API, which is why I'm here now. But for some reason the TTS would always cut midway of the input, in the strangest place.

Example, with this input:
Oh! Let me tell you about something that happened to me at exactly at 12pm on November 29th, 2024! I was in professor's robotics lab, working late on a project.

It'd say something like this (-- to indicate long pause):
Oh! Let me tell you about something that happened to me at exactly at -- 12pm on November 29th, 2024! -- I was in pro--fessor's robotics lab, working late on a project.

I'm using play_async(), but similar behavior appear when using play() (tho, not as bad).

This behavior is apparent on those two engines I'm able to test.

The text was updated successfully, but these errors were encountered:

ArsenicBismuth · 2024-11-29T00:11:34Z

Is there maybe something about sentence length limit that I'm unaware of? So for example I should've cut it into multiple valid sentences, so that the TTS won't automatically cut it into improper ones.

KoljaB · 2024-11-29T08:26:55Z

Will look into that, thank you for pointing this out.

ArsenicBismuth · 2024-11-29T15:22:34Z

@KoljaB So I tested again with SystemEngine, and I got similar random cuts. And it sometimes cuts mid-word, like "excite--d", "trad--ing". As if I'm streaming incomplete word (I'm not, I gave it full prompt result without streaming).

For reference, before using your package, I tested using the engines in other python module, such as SystemEngine (PyTTSx3) & EdgeEngine (edge_tts), neither has this behavior.

ArsenicBismuth · 2024-11-29T17:29:07Z

I setup an Azure account to check AzureEngine, and turns out this engine has no such issue. It speaks flawlessly to the end with no strange cut at all.

KoljaB · 2024-11-29T19:27:54Z

Tried to reproduce, but couldn't. This one leaves me puzzled tbh. The fact it cuts midword makes me think it must have something to do with pyaudio and the way we stream the generated bytes to the output device. But why would it not also occur with Azure then?
I'm sorry, this is not obvious for me. Like a riddle, I need to think about that for a while maybe.

KoljaB · 2024-11-29T19:34:51Z

It might have to do with this commit.

Do you have the same problems when using version v0.4.10?

pip install realtimetts[all]==0.4.10

ArsenicBismuth · 2024-11-29T19:48:11Z

@KoljaB No, the way it "cuts midway" is NOT like the audio is cut abruptly (like when you pause a video). You can check below the audio (sorry for not doing that earlier):

2024-11-29.20-44-09.mp4

Here you can hear the "check" is split into "chec" and "k" with its proper "kay" alphabet sound. So, it's more like the text input are split into chunks. Ofc, you can also hear the long pauses after certain words.

KoljaB · 2024-11-29T19:59:13Z

You're right, this has clearly nothing to do with audio streaming.
Can I see the parameters for the calls to TextToAudioStream constructor (tokenizer, language etc) and play method (lots of parameters here have influence on the sentence splitting)?

ArsenicBismuth · 2024-11-29T20:01:47Z

This is the code, nothing else:

class SpeechEngine:

    def __init__(self, voice: int):

        self.voice_engine = EdgeEngine()
        self.voice_engine.set_voice(voice)
        self.voice_stream = TextToAudioStream(self.voice_engine, language="en")

    def speak_text_async(self, text: str):
        self.voice_stream.feed(text).play_async()

KoljaB · 2024-11-29T20:11:32Z

Ahh, now I can reproduce. There's something deeply wrong in the sentence splitting.

Still unsure what, but now I will find it. Thank you so much for providing all the information, that was great help!

KoljaB · 2024-11-29T21:49:13Z

Should be solved in v0.4.14 now. Thanks again for best support possible!

ArsenicBismuth · 2024-11-29T22:52:48Z

Ok yeah this fixed it! Thank you. It's now pausing properly at the end of sentences.

ArsenicBismuth closed this as completed Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

ArsenicBismuth commented Nov 29, 2024 •

edited

Loading

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024 •

edited

Loading

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024

Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

Comments

ArsenicBismuth commented Nov 29, 2024 • edited Loading

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024 • edited Loading

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024

KoljaB commented Nov 29, 2024

KoljaB commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024

ArsenicBismuth commented Nov 29, 2024 •

edited

Loading

ArsenicBismuth commented Nov 29, 2024 •

edited

Loading