Skip to content

This repository contains data and code to replicate experiments of author profiling using new type of features

Notifications You must be signed in to change notification settings

Litvinova1984/author_profiling_contexts

Repository files navigation

This repository contains data and code to reproduce experiments described in the paper "Individual Differences in the Most Frequent Content Word Usage as a New Type of Features in the Authorship Profiling Task" by Litvinova T.A. et al. presented at ICRES 2024. This paper introduces a new type of features in authorship profiling task. Authorship profiling is a task of revealing an author’s characteristics (i.e., gender, age, personality traits, etc.) of a text based on the analysis of linguistic features. This task is not only a purely theoretical but also a practical one. Identifying the characteristics of text authors is a task of great importance in marketing, sociology, forensics, etc. The task of authorship profiling is often approached as that of text classification or clustering. Different types of features have been introduced – lexical, morphological, syntactical ones, etc. In recent years, deep learning (DL) architectures have frequently been applied, along with traditional machine learning methods. However, for AP DL approaches regularly underperform, left behind by classical machine learning approaches, which indicates the complexity of the task. Also, interpretability is the key demand for authorship profiling methods limiting the use of DL methods for AP in real life. There is a need in more sophisticated but interpretable features for AP. In this paper, we propose a completely new type of features for this task – semantic characteristics of the contexts of the most frequent content words extracted using word embedding model, semantic relation extraction method and hand-crafted set of variables reflecting different aspects of word meaning. We present the results of the experiments where we applied this type of features to the texts of two genres extracted from the RusIdiolect dataset for the detection of gender and Big-5 personality traits.

About

This repository contains data and code to replicate experiments of author profiling using new type of features

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages