MaxentLanguageIdentification

The language identification module as used in: http://www.clips.ua.ac.be/sites/default/files/coco.pdf

If you use this model, please cite the paper as follows:

@inproceedings {hoogeveen2011,
title = {CorpusCollie - A Web Corpus Mining Tool for Resource-Scarce Languages},
booktitle = {Proceedings of Conference on Human Language Technology for Development},
year = {2011},
pages = {44-49},
publisher = {Bibliotheca Alexandrina},
organization = {Bibliotheca Alexandrina},
address = {Alexandria, Egypt},
attachments = {http://www.clips.ua.ac.be/sites/default/files/coco.pdf},
author = {Hoogeveen, Doris and De Pauw, Guy}
}

The script can be used to perform language identification using a maxent classifier and a set of language models.

USAGE: language_identification.py <inputfile> <maxent_dir>

<inputfile> is the file you would like to know the language off. This should be a plain text file encoded in UTF-8.
<maxent_dir> is the path to the directory that contains maxent.exe

The script will output the iso name of the guessed language.

Example: language_identification.py inputfile /home/hoogeveen/maxent/

Prerequisite: Wine needs to be installed to be able to run the classifier.

The test text in 'inputfile' is a Dutch story by Toon Tellegen, taken from http://www.dbnl.org/tekst/tell003lang01_01/tell003lang01_01_0001.php
I do not own the rights to this text in any way.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
maxent		maxent
README.md		README.md
inputfile		inputfile
language_identification.py		language_identification.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaxentLanguageIdentification

About

Releases

Packages

Languages

D1Doris/MaxentLanguageIdentification

Folders and files

Latest commit

History

Repository files navigation

MaxentLanguageIdentification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages