Skip to content

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

Notifications You must be signed in to change notification settings

ghdi6758/SemAxis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

Authors: Jisun An, Haewoon Kwak, and Yong-Yeol Ahn

Abstract

Because word semantics can substantially change across communities and contexts, capturing domain-specific word semantics is an important challenge. Here, we propose SemAxis, a simple yet powerful framework to characterize word semantics using many semantic axes in word-vector spaces beyond sentiment. We demonstrate that SemAxis can capture nuanced semantic representations in multiple online communities. We also show that, when the sentiment axis is examined, SemAxis outperforms the state-of-the-art approaches in building domain-specific sentiment lexicons.

Highlights

Building a lexicon for various semantic axes (including and beyond sentiment)

alt text

Content analysis with SemAxis

/r/The_Donald community feels Guns more safe than /r/SandersForPresident.

alt text

Citing SemAxis

If you make use of this work in your research please cite the following paper:

Jisun An, Haewoon Kwak, and Yong-Yeol Ahn. 2018. SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'18)

Bibtex

@InProceedings{P18-1228,
author = "An, Jisun
and Kwak, Haewoon
and Ahn, Yong-Yeol",
title = "SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "2450--2461",
location = "Melbourne, Australia",
url = "http://aclweb.org/anthology/P18-1228"
}

Using the code

To use SemAxis you will need to download some that are pre-trained. Once this is done, you would specify the path (variable: EMBEDDING_PATH) to these embeddings in semaxis.py. The file semaxis.py contains implementations for computing semantic axes given two pole words and projecting target word on the semantic axes along with some comments/documentation on how to use them.

Pre-trained word embeddings used in the study

We make pre-trained word embeddings used in this study availalbe to download.

732 Pre-defined Semantic Axes for download

We systematically induce 732 semantic axes based on the antonym pairs from ConceptNet. You can download them in the following: 732 Pre-defined Semantic Axes for download. The file includes 732 antonym word pairs. The file is tab-separated.

Dependencies

An up-to-date Python 3.5 distribution, with the standard packages provided by the anaconda distribution is required.

In particular, the code was tested with:

numpy (1.14.0)
gensim (3.4.0)
scipy (1.0.0)

About

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages