Skip to content

ICS02: 10. Text analysis with R

Gabriel Bodard edited this page Mar 14, 2019 · 17 revisions

Sunoikisis Digital Classics, Spring 2019

Session 10. Text analysis, stylometry and visualisation using R

Thursday Mar 14, 16:00 UK = 18:00 EET

Convenors: Maciej Eder (Kraków), Robert Gorman (University of Nebraska–Lincoln) & Christopher Ohge (University of London)

YouTube link: https://youtu.be/2Fo4HxGZ5o4

Notebooks:

  1. Recap: C Ohge's notebook from previous session (HTML+visualisations)
  2. R Gorman's notebook (HTML) and XML files (5 files).
  3. M Eder's notebook (HTML) and small corpus (22 files)

Slides: tba

Session outline

This session will examine some specialist libraries in R for text analysis. We will review the tidytext package from the previous session, then examine in depth two crucial (and complementary) forms of text analysis. The first will work with larger datasets with Stylo, and the second will show how to analyse and visualise encoded texts in XML.

  1. Review tidytext from previous session (Ohge).
  2. Stylometry: intro to the Stylo package (Eder). Stylometry, or applying statistical methods to trace stylistic differences between (literary) texts, is usually associated with the question of authorship attribution. It relies on the assumption that each author has his/her own distinct lexical profile, e.g. reflected in idiosyncrasies of word frequencies. The R package ‘stylo’ provides a set of functions, convenient supplemented by a graphical user interface for high-level exploratory analyses, which makes it especially suited for novice users, without programming skills.
  3. XML library: treebanking and linguistic analyses of encoded texts (Gorman).

Seminar readings

Further reading

Essay title

  • tba

Exercise

  1. tba
Clone this wiki locally