A python library to convert from Indian scripts to Pakistani scripts and vice-versa.
- Rule-based conversion
- Faster, but does not support short vowels
- Will not be accurate, especially for Arabic-to-Indic
- Uses an online endpoint for the conversion
- Produces much better results, but much slower
Pre-requisites:
- Use Python 3.7+
pip install git+https://github.com/GokulNC/indic_nlp_library
pip install indo-arabic-transliteration
from indo_arabic_transliteration.mapper import script_convert
script_convert(text: str, from_script: str, to_script: str)
from indo_arabic_transliteration.sangam_api import online_transliterate
online_transliterate(text: str, from_script: str, to_script: str)
We use the standard BCP 47 language tags to refer to the language-script combinations.
Language | Script | Code |
---|---|---|
Hindi | Devanagari | hi-IN |
Urdu | Perso-Arabic | ur-PK |
Example:
# Rule-based
script_convert("हैदराबाद", 'hi-IN', 'ur-PK') # حیدرآباد
script_convert("حيدرآباد", 'ur-PK', 'hi-IN') # हीदराबाद
# Online-API
online_transliterate("حيدرآباد", 'ur-PK', 'hi-IN') # हैदराबाद
online_transliterate("हैदराबाद", 'hi-IN', 'ur-PK') # حیدرآباد
Notes & Resources:
- Both the nations share a common national language (Hindustani) but written in different scripts and also registered as different languages.
- Official Tools
- Devanagari to PersoArabic mapping
Language | Script | Code |
---|---|---|
East Punjabi | Gur'Mukhi | pa-IN |
West Punjabi | ShahMukhi | pa-PK |
Example:
# Rule-based
script_convert("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سںگھ
script_convert("سںگھ", 'pa-PK', 'pa-IN') # ਸਂਘ
# Online-API
online_transliterate("سنگھ", 'pa-PK', 'pa-IN') # ਸਿੰਘ
online_transliterate("ਸਿੰਘ", 'pa-IN', 'pa-PK') # سِنگھ
Notes & Resources:
- You can also use these JavaScript libraries:
- Gurmukhi to Shahmukhi mapping
Language | Script | Code |
---|---|---|
Indian Sindhi | Devanagari | sd-IN |
Pakistani Sindhi | Perso-Arabic | sd-PK |
Example:
# Rule-based
script_convert("हैदराबाद", 'sd-IN', 'sd-PK') # حیدرآباد
script_convert("حيدرآباد", 'sd-PK', 'sd-IN') # हीदराबाद
# Online-API
online_transliterate("حيدرآباد", 'sd-PK', 'sd-IN') # हैदराबाद
online_transliterate("हैदराबाद", 'sd-IN', 'sd-PK') # حیدرآباد
Notes & Resources:
- Before Devanagari standardization, Sindhi was written in Landa scripts like Khojki, Khudawadi, Multani, Gurmukhi, etc. depending upon the region.
- To convert from Devanagari to the above legacy scripts, use AksharaMukha's python library.
- You can also use this JavaScript library or online converter.
- Sindhi-PersoArabic to Devanagari mapping
- Uses LibIndicTrans library for models
- Install it by
pip install git+https://github.com/libindic/indic-trans
- Install it by
- Currently supports only Hindi-Urdu languages
API:
from indo_arabic_transliteration.ml_based import ml_transliterate
# Same interface as script_convert()
- Indic scripts are mostly phonetic. Use this to retain diacritics in PersoArabic
- Currently only supports Hindustani (Hindi to Urdu) and Punjabi (Gurmukhi to Shahmukhi)
- Uses AksharaMukhi library
API:
from indo_arabic_transliteration.lossless_converter import convert_with_diacritics
# Same interface as script_convert()
- For help in using the library, please use the GitHub Issues section.
- For script conversion errors from the online API, please write directly to the Sangam team. We are not related to them in anyway and this is not an official library.