Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added utf-8 support for SEND JSON POST Request #504

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

VewMet
Copy link

@VewMet VewMet commented Oct 8, 2023

Description

This PR adds UTF-8 encoding support for SEND JSON POST requests in the transcription module of Jigasi.

org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=https://ts.meet.jit.si/transcriptions

This ensures proper handling of non-ASCII characters, especially for languages like Hindi, Tamil, Japanese, etc.

Changes:

Explicitly set the Content-Type header to application/json; charset=UTF-8 to indicate that the JSON data is UTF-8 encoded.
Modified the byte conversion of the JSON string to use UTF-8 encoding.

Change-1:

conn.setRequestProperty("Content-Type", "application/json");

To:

conn.setRequestProperty("Content-Type", "application/json; charset=UTF-8");

Change-2:

os.write(json.toString().getBytes());

To:

os.write(json.toString().getBytes("UTF-8"));

Motivation:

While the transcriptions worked well in English, issues arose when changing the language to Hindi or others. The received text contained numerous question marks, indicating an encoding issue. By ensuring the data is sent using UTF-8 encoding, this PR aims to resolve such issues and ensure the correct interpretation of non-ASCII characters.

Testing:

Tested the transcription feature with multiple languages, including Hindi, Tamil, and Japanese.
Verified that the JSON POST requests in jigasi sip-communicator.properties are being sent with the correct UTF-8 encoding.
org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=https://ts.meet.jit.si/transcriptions

Impact:

This change ensures that Jigasi can handle transcription for a wide variety of languages without any encoding-related issues, enhancing its versatility and robustness.

Additional Notes (if any):

Mention any related issues, potential side effects, or further improvements that can be made.

@VewMet
Copy link
Author

VewMet commented Oct 9, 2023

kindly check the issue we have raised for the same and can close once this PR get's merged. Thanks Jitsi Team for awesome project.
#505

Copy link

@bharath-naik bharath-naik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have tested in mandarin & hindi language. i can see the characters with this change and without change i cannot see the characters other than the english language. so i approve it.

@damencho
Copy link
Member

Sorry for the late attentiuon but @VewMet:
Hi, thanks for your contribution!
If you haven't already done so, could you please make sure you sign our CLA (https://jitsi.org/icla for individuals and https://jitsi.org/ccla for corporations)? We would, unfortunately, be unable to merge your patch unless we have that piece :(.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants