Skip to content

Latest commit

 

History

History
181 lines (175 loc) · 16.2 KB

Data-Files-in-different-versions.md

File metadata and controls

181 lines (175 loc) · 16.2 KB

Languages/Scripts supported in different versions of Tesseract

Languages

LangCode Language 3.02 3.04 4.00 4.0.0 4.0.0 4.0.0
Nov. 2016 tessdata tessdata_best tessdata_fast
afr Afrikaans x x x x x x
amh Amharic x x x x x
ara Arabic x x x x x x
asm Assamese x x x x x
aze Azerbaijani x x x x x
aze_cyrl Azerbaijani - Cyrilic x x x x x x
bel Belarusian x x x x x x
ben Bengali x x x x x x
bod Tibetan x x x x x
bos Bosnian x x x x x
bre Breton x x x x
bul Bulgarian x x x x x x
cat Catalan; Valencian x x x x x x
ceb Cebuano x x x x x
ces Czech x x x x x x
chi_sim Chinese - Simplified x x x x x x
chi_tra Chinese - Traditional x x x x x x
chr Cherokee x x x x x x
cos Corsican x x x
cym Welsh x x x x x
dan Danish x x x x x x
dan_frak Danish - Fraktur (contrib) x x
deu German x x x x x x
deu_frak German - Fraktur (contrib) x x
deu_latf German (Fraktur Latin) x x x x
dzo Dzongkha x x x x x
ell Greek, Modern (1453-) x x x x x x
eng English x x x x x x
enm English, Middle (1100-1500) x x x x x x
epo Esperanto x x x x x x
equ Math / equation detection module x x x x x
est Estonian x x x x x x
eus Basque x x x x x x
fao Faroese x x x
fas Persian x x x x x
fil Filipino (old - Tagalog) x x x
fin Finnish x x x x x x
fra French x x x x x x
frk German - Fraktur (now deu_latf) x x x x x x
frm French, Middle (ca.1400-1600) x x x x x x
fry Western Frisian x x x
gla Scottish Gaelic x x x
gle Irish x x x x x
glg Galician x x x x x x
grc Greek, Ancient (to 1453) (contrib) x x x x x x
guj Gujarati x x x x x
hat Haitian; Haitian Creole x x x x x
heb Hebrew x x x x x x
hin Hindi x x x x x x
hrv Croatian x x x x x x
hun Hungarian x x x x x x
hye Armenian x x x
iku Inuktitut x x x x x
ind Indonesian x x x x x x
isl Icelandic x x x x x x
ita Italian x x x x x x
ita_old Italian - Old x x x x x x
jav Javanese x x x x x
jpn Japanese x x x x x x
kan Kannada x x x x x x
kat Georgian x x x x x
kat_old Georgian - Old x x x x x
kaz Kazakh x x x x x
khm Central Khmer x x x x x
kir Kirghiz; Kyrgyz x x x x x
kmr Kurmanji (Kurdish - Latin Script) x x x x
kor Korean x x x x x x
kor_vert Korean (vertical) x x x x
kur Kurdish (Arabic Script) x
lao Lao x x x x x
lat Latin x x x x x
lav Latvian x x x x x x
lit Lithuanian x x x x x x
ltz Luxembourgish x x x x
mal Malayalam x x x x x x
mar Marathi x x x x x
mkd Macedonian x x x x x x
mlt Maltese x x x x x x
mon Mongolian x x x x
mri Maori x x x x
msa Malay x x x x x x
mya Burmese x x x x x
nep Nepali x x x x x
nld Dutch; Flemish x x x x x x
nor Norwegian x x x x x
oci Occitan (post 1500) x x x x x
ori Oriya x x x x x
osd Orientation and script detection module x x x x x x
pan Panjabi; Punjabi x x x x x
pol Polish x x x x x x
por Portuguese x x x x x x
pus Pushto; Pashto x x x x x
que Quechua x x x x
ron Romanian; Moldavian; Moldovan x x x x x x
rus Russian x x x x x x
san Sanskrit x x x x x
sin Sinhala; Sinhalese x x x x x
slk Slovak x x x x x x
slk_frak Slovak - Fraktur (contrib) x x
slv Slovenian x x x x x x
snd Sindhi x x x x
spa Spanish; Castilian x x x x x x
spa_old Spanish; Castilian - Old x x x x x x
sqi Albanian x x x x x x
srp Serbian x x x x x x
srp_latn Serbian - Latin x x x x x
sun Sundanese x x x x
swa Swahili x x x x x x
swe Swedish x x x x x x
syr Syriac x x x x x
tam Tamil x x x x x x
tat Tatar x x x x
tel Telugu x x x x x x
tgk Tajik x x x x x
tgl Tagalog (new - Filipino) x x x
tha Thai x x x x x x
tir Tigrinya x x x x x
ton Tonga x x x x
tur Turkish x x x x x x
uig Uighur; Uyghur x x x x x
ukr Ukrainian x x x x x x
urd Urdu x x x x x
uzb Uzbek x x x x x
uzb_cyrl Uzbek - Cyrilic x x x x x
vie Vietnamese x x x x x x
yid Yiddish x x x x x
yor Yoruba x x x x

Scripts

Script 3.02 3.04 4.00 4.0.0 4.0.0 4.0.0
Nov 2016 tessdata tessdata_best tessdata_fast
arab Arabic x x x
armn Armenian x x x
beng Bengali x x x
cans Canadian_Aboriginal x x x
cher Cherokee x x x
cyrl Cyrillic x x x
deva Devanagari x x x
ethi Ethiopic x x x
frak Fraktur x x x
geor Georgian x x x
grek Greek x x x
gujr Gujarati x x x
guru Gurmukhi x x x
hans HanS (Han simplified) x x x
hans-vert HanS_vert (Han simplified vertical) x x x
hant HanT (Han traditional) x x x
hant-vert HanT_vert (Han traditional vertical) x x x
hang Hangul x x x
hang-vert Hangul_vert (Hangul vertical) x x x
hebr Hebrew x x x
jpan Japanese x x x
jpan-vert Japanese_vert (Japanese vertical) x x x
knda Kannada x x x
khmr Khmer x x x
laoo Lao x x x
latn Latin x x x
mlym Malayalam x x x
mymr Myanmar x x x
orya Oriya(Odia) x x x
sinh Sinhala x x x
syrc Syriac x x x
taml Tamil x x x
telu Telugu x x x
thaa Thaana x x x
thai Thai x x x
tibt Tibetan x x x
viet Vietnamese x x x

For detalls about the languages that each Script.traindata file supports, see the files that end with langs.txt (e.g. Latin.langs.txt) here.