Text Analysis through Web Scraping

This project aims to collect customer review data from the web through utilizing appropriate web scraping techniques and performs text mining on the data subsequently through constructing a database from scratch.

General Information

The project is structured in three (3) parts.

The first part (Part A) covers constructing and demonstrating the handling of text data. It aims to implement the principles of text mining, the bag-of-words model and the development of metrics that can be used to analyze structural elements of text, such as normalizing and cleaning textual corpora. The core of this project involves the translation of these insights to actionable features that can be used to predict an outcome variable of business interest.

The second and third parts (Part B and Part C) are concerned with the identification of features and in particular (a) polarity – whether the text under consideration is positive or negative, (b) sentiment – the extraction of affective states from the text and (c) the evaluation and extraction of important topics that are covered and elaborated in the corpus that have been constructed (Part C).

Technologies Used

RStudio

Features

Bag-of-Words Analysis
Word Clouds
Top words and frequent words analysis
Sentiment Analysis
Topic Mining
LDA (Latent Dirichlet Allocation)

Project Status

Project is: complete

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
Text_Mining.Rmd		Text_Mining.Rmd
review_data_en.rds		review_data_en.rds
reviews.rds		reviews.rds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analysis through Web Scraping

Table of Contents

General Information

Technologies Used

Features

Project Status

About

Releases

Packages

RSAKIB78/Text-Mining-R

Folders and files

Latest commit

History

Repository files navigation

Text Analysis through Web Scraping

Table of Contents

General Information

Technologies Used

Features

Project Status

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages