GitHub - Litvinova1984/author_profiling_contexts: This repository contains data and code to replicate experiments of author profiling using new type of features

This repository contains data and code to reproduce experiments described in the paper "Individual Differences in the Most Frequent Content Word Usage as a New Type of Features in the Authorship Profiling Task" by Litvinova T.A. et al. presented at ICRES 2024. This paper introduces a new type of features in authorship profiling task. Authorship profiling is a task of revealing an author’s characteristics (i.e., gender, age, personality traits, etc.) of a text based on the analysis of linguistic features. This task is not only a purely theoretical but also a practical one. Identifying the characteristics of text authors is a task of great importance in marketing, sociology, forensics, etc. The task of authorship profiling is often approached as that of text classification or clustering. Different types of features have been introduced – lexical, morphological, syntactical ones, etc. In recent years, deep learning (DL) architectures have frequently been applied, along with traditional machine learning methods. However, for AP DL approaches regularly underperform, left behind by classical machine learning approaches, which indicates the complexity of the task. Also, interpretability is the key demand for authorship profiling methods limiting the use of DL methods for AP in real life. There is a need in more sophisticated but interpretable features for AP. In this paper, we propose a completely new type of features for this task – semantic characteristics of the contexts of the most frequent content words extracted using word embedding model, semantic relation extraction method and hand-crafted set of variables reflecting different aspects of word meaning. We present the results of the experiments where we applied this type of features to the texts of two genres extracted from the RusIdiolect dataset for the detection of gender and Big-5 personality traits.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Feature_set		Feature_set
Big_5_letter.xlsx		Big_5_letter.xlsx
Big_5_picture.xlsx		Big_5_picture.xlsx
Litvinova_ICRES1 2.docx		Litvinova_ICRES1 2.docx
README.md		README.md
contexts_code.R		contexts_code.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Litvinova1984/author_profiling_contexts

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages