Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

Closed
ArsenicBismuth opened this issue Nov 29, 2024 · 12 comments
Closed

Speech stops/cuts mid-sentence with GTTSEngine and EdgeEngine #223

ArsenicBismuth opened this issue Nov 29, 2024 · 12 comments

Comments

@ArsenicBismuth
Copy link

ArsenicBismuth commented Nov 29, 2024

Hi, so I wanted to try some free TTS API, which is why I'm here now. But for some reason the TTS would always cut midway of the input, in the strangest place.

Example, with this input:
Oh! Let me tell you about something that happened to me at exactly at 12pm on November 29th, 2024! I was in professor's robotics lab, working late on a project.

It'd say something like this (-- to indicate long pause):
Oh! Let me tell you about something that happened to me at exactly at -- 12pm on November 29th, 2024! -- I was in pro--fessor's robotics lab, working late on a project.

I'm using play_async(), but similar behavior appear when using play() (tho, not as bad).

This behavior is apparent on those two engines I'm able to test.

@ArsenicBismuth
Copy link
Author

Is there maybe something about sentence length limit that I'm unaware of? So for example I should've cut it into multiple valid sentences, so that the TTS won't automatically cut it into improper ones.

@KoljaB
Copy link
Owner

KoljaB commented Nov 29, 2024

Will look into that, thank you for pointing this out.

@ArsenicBismuth
Copy link
Author

ArsenicBismuth commented Nov 29, 2024

@KoljaB So I tested again with SystemEngine, and I got similar random cuts. And it sometimes cuts mid-word, like "excite--d", "trad--ing". As if I'm streaming incomplete word (I'm not, I gave it full prompt result without streaming).

For reference, before using your package, I tested using the engines in other python module, such as SystemEngine (PyTTSx3) & EdgeEngine (edge_tts), neither has this behavior.

@ArsenicBismuth
Copy link
Author

I setup an Azure account to check AzureEngine, and turns out this engine has no such issue. It speaks flawlessly to the end with no strange cut at all.

@KoljaB
Copy link
Owner

KoljaB commented Nov 29, 2024

Tried to reproduce, but couldn't. This one leaves me puzzled tbh. The fact it cuts midword makes me think it must have something to do with pyaudio and the way we stream the generated bytes to the output device. But why would it not also occur with Azure then?
I'm sorry, this is not obvious for me. Like a riddle, I need to think about that for a while maybe.

@KoljaB
Copy link
Owner

KoljaB commented Nov 29, 2024

It might have to do with this commit.

Do you have the same problems when using version v0.4.10?

pip install realtimetts[all]==0.4.10

@ArsenicBismuth
Copy link
Author

@KoljaB No, the way it "cuts midway" is NOT like the audio is cut abruptly (like when you pause a video). You can check below the audio (sorry for not doing that earlier):

2024-11-29.20-44-09.mp4

Here you can hear the "check" is split into "chec" and "k" with its proper "kay" alphabet sound. So, it's more like the text input are split into chunks. Ofc, you can also hear the long pauses after certain words.

@KoljaB
Copy link
Owner

KoljaB commented Nov 29, 2024

You're right, this has clearly nothing to do with audio streaming.
Can I see the parameters for the calls to TextToAudioStream constructor (tokenizer, language etc) and play method (lots of parameters here have influence on the sentence splitting)?

@ArsenicBismuth
Copy link
Author

This is the code, nothing else:

class SpeechEngine:

    def __init__(self, voice: int):

        self.voice_engine = EdgeEngine()
        self.voice_engine.set_voice(voice)
        self.voice_stream = TextToAudioStream(self.voice_engine, language="en")

    def speak_text_async(self, text: str):
        self.voice_stream.feed(text).play_async()

@KoljaB
Copy link
Owner

KoljaB commented Nov 29, 2024

Ahh, now I can reproduce. There's something deeply wrong in the sentence splitting.

Still unsure what, but now I will find it. Thank you so much for providing all the information, that was great help!

@KoljaB
Copy link
Owner

KoljaB commented Nov 29, 2024

Should be solved in v0.4.14 now. Thanks again for best support possible!

@ArsenicBismuth
Copy link
Author

Ok yeah this fixed it! Thank you. It's now pausing properly at the end of sentences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants