-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming mode not working on Firefox #143
Comments
Hi @yamatazen I've spent a good 2+ hours looking at this. I've tested Firefox with Windows and Linux and the same issue persists. Looking in the Developer console (typically F12 on a web browser) you can see: Media resource http://127.0.0.1:7851/api/tts-generate-streaming?text=This+is+a+test+of+streaming+audio&voice=arnold.wav&language=en&output_file=demo_output.wav&streaming=true could not be decoded, error: Error Code: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006) I've hunted around the internet for The typical response I can see from developers who looked into this is along the lines of, this is a Firefox specific issue and either Mozilla need to fix it OR you have to transcode all your media. Best I can tell from looking at it, Firefox either isn't good at handling different bit depths of audio, or may be too strict. There are no settings in the backend of Firefox that can be changed to resolve this (and I tried quite a few). The strange thing is that the wav audio produced by the TTS scripts for streaming, is exactly the same wav format it uses to generate a wav file when not streaming. So bit depth, encoding etc is all the same, its just that Firefox doesn't want to handle it. So, I don't see any way I can resolve this:
As such, unless someone else has any bright ideas for solutions, or Mozilla change something, I will have to just leave this as Firefox doesn't support streaming and using Chrome, Edge or basically another browser is the solution currently. Sorry and thanks. |
So this is a browser issue. I see. |
Sorry to butt in this closed issue, I have been looking into making my own little server for Piper and used that code snippets as reference. I have since learned that the way the code generate the audio for streaming creates a malformed WAV file. From Wikipedia, WAV expects in it header 4 bytes denoting the size of the sample. The code below create the header and stream it to the browser first before streaming the sample. However, this means that the header for the sample size is zero. file_chunks = []
wav_buf = io.BytesIO()
with wave.open(wav_buf, "wb") as vfout:
vfout.setnchannels(1)
vfout.setsampwidth(2)
vfout.setframerate(24000)
vfout.writeframes(b"")
wav_buf.seek(0)
yield wav_buf.read() WAV is not a suitable format for on-the-fly generation and streaming, since it requires knowing the sample size ahead of time. But the size is not known ahead of time with the way the sample is generated and streamed. In my testing, the generated WAV file via streaming works in VLC and MPV, but the audio players have to guess the length of the audio, since no sample size is provided. The malformed WAV can also crash Audacity. |
@TheBill2001 You are correct in the case of Piper, however the code I have for Piper I know does NOT support streaming and it is disabled elsewhere in the Piper models settings, which means it blocks use of that code. However, due to the way the model is called there has to be some pseudo code for streaming otherwise it creates other errors when you try to call it as an async function, which causes errors elsewhere. I would suggest referencing the Piper site for documentation/code on creating Piper as a streaming setup. The code I have there is only correct for streaming the XTTS AI model, and, as mentioned, used a pseudo code to make Python happy about the function call as an async process. Thanks |
Streaming mode didn't play sound on Firefox, but it worked on Chrome.
The text was updated successfully, but these errors were encountered: