Word Embedding benchmark project By Shahid Beheshti University NLP Lab
Please read Our Wiki Page for more information
Folder structure :
- data/corpus This must be empty as the codes will downlaod the corpus from some external repository to here.
- data/analogy Contains the analogy dataset(s)
- data/wordsim Contains the word similarity dataset(s)
- data/categories Contains the catgories dataset(s)
- code This folder contains codes that will be used to run all evaluation related tasks and utulities to downlaod the corpus files
- scripts This folder contains cleansing/crawling and any other once off activity that needs to be done.