Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problem: Lack of UTF-8 Support for JSON POST Requests in Transcription Module #505

Open
VewMet opened this issue Oct 9, 2023 · 0 comments

Comments

@VewMet
Copy link

VewMet commented Oct 9, 2023

Description

Transcripted languages appear as '?' other than english at SEND_JSON_REMOTE_URLS of jigasi module like (other than English) Hindi, it's crucial to ensure that the content is being sent and received using the UTF-8 character encoding to avoid any misinterpretation of characters.

Current behavior

whenever i have spoken in hindi, it hasn’t understood the non-ASCII and posted the '?' in my streams.

image

when sending the JSON data to the server, the character encoding is not explicitly set. By default, it might be using the system's default character encoding which might not be UTF-8

Expected Behavior

The transcription text should correctly represent the spoken content in any supported language without encoding issues.

Possible Solution

I've created a pull request that addresses this issue by ensuring the Content-Type header for JSON POST requests is explicitly set to application/json; charset=UTF-8. Additionally, I've ensured that the JSON string is converted to bytes using UTF-8 encoding before sending.

PR Link: #504

Steps to reproduce

  1. Set up Jigasi with transcription service.
  2. Use the transcription feature with a non-ASCII language, e.g., Hindi.
  3. Observe the returned transcription text containing unexpected characters or question marks.
  4. org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=<remote json accepting url>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant