This project aims to collect customer review data from the web through utilizing appropriate web scraping techniques and performs text mining on the data subsequently through constructing a database from scratch.
The project is structured in three (3) parts.
The first part (Part A) covers constructing and demonstrating the handling of text data. It aims to implement the principles of text mining, the bag-of-words model and the development of metrics that can be used to analyze structural elements of text, such as normalizing and cleaning textual corpora. The core of this project involves the translation of these insights to actionable features that can be used to predict an outcome variable of business interest.
The second and third parts (Part B and Part C) are concerned with the identification of features and in particular (a) polarity – whether the text under consideration is positive or negative, (b) sentiment – the extraction of affective states from the text and (c) the evaluation and extraction of important topics that are covered and elaborated in the corpus that have been constructed (Part C).
- RStudio
- Bag-of-Words Analysis
- Word Clouds
- Top words and frequent words analysis
- Sentiment Analysis
- Topic Mining
- LDA (Latent Dirichlet Allocation)
Project is: complete