Title: Investigating the Alignment of Quantitative and Qualitative Literary Analysis in American Literature
Author: Emma Angela Montecchiari
Course: Università di Trento - Computational Linguistics 2022/23
Date: September 7, 2023
In this repository, you will find:
- Code: The source code and scripts used in the project.
- Data: Corpora relevant to the research.
- Results: Graphs of the results [Appendix 2].
This project seeks to explore the alignment between quantitative investigative techniques and qualitative analysis in the field of literary studies. Specifically, the aim is to investigate the consistency between traditionally attributed characteristics of literary movements and computational methods. The central hypothesis is that distant reading analysis can complement and validate close reading techniques, thereby revealing a potential synergy between the two approaches.
To achieve these objectives, I have undertaken the following steps:
-
Corpus Selection: I have compiled a diverse corpus of literary movements, selecting one for in-depth analysis.
-
Stylistic Characterization: Within the chosen corpus, I employed stylistic characterization techniques to describe these literary movements and assess their alignment with traditional categorizations.
-
Genre Analysis: I conducted a more refined analysis of a specific genre within the chosen literary movement and compared it to traditionally assigned characteristics. The genre chosen for a detailed comparison is American Gothic.
-
Computational Linguistics Techniques: Computational linguistics techniques were then applied to identify similarities and differences among literary movements, broadening the scope of our analysis.
-
Incorporating External Material: Throughout this research, I incorporated external material from classical qualitative analyses of the chosen movements, allowing for a dual perspective while predominantly applying quantitative methods.
-
Corpus Movements:
- Plain text data selected and downloaded from Project Gutenberg. Sorted by literary movement in sub-folders. Each folder is divided by authors.
-
TF-IDF:
- Retrieval of the most frequent words using the TF-IDF metric.
- Tables for plotting and visualization.
-
Complexity Indices:
- Metrics for measuring readability and complexity of the texts.
- Results plotting and comparison.
-
Cosine Similarity:
- Code and results for computing cosine similarity matrices.
- Comparison between and within the literary movements.
-
Appendix 2:
- Graphs of the results for the complexity indices.
- Graphs of the results for the cosine similarity measures.