TamilCorpus Open Source Tamil Corpus of 58M words Source : Wikipedia,TheHindu(Tamil) Usage Run extract.sh to extract the compressed files. P.S : A little cleansing might be needed.