A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.
Download MSD-1030.zip
for the data.
Please cite the following paper when referring to MSD-1030 in academic publications and papers.
Ting-Yu Yen, Yang-Yin Lee, Yow-Ting Shiue, Hen-Hsen Huang, and Hsin-Hsi Chen. 2020. MSD-1030: A Well-built Multi-Sense Evaluation Dataset for Sense Representation Models. In Proceedings of 12th Language Resources and Evaluation Conference (LREC 2020), May 11-16, 2020, Palais du Pharo, France.