This folder includes codes and part of sample data used for Big Data AT&T Fall Case Competition. This project is ranked Top 5 in this competition.
Goal of customer insights project is to identify top customer concerns, analyze customer sentiment related to ATT and provide recommendation strategies for CRM system. This project consumes documents from various social media sources and applies various natural language processing techniques and models. The programming languages in this project are R and Python.
- Collect and preprocess 50000+ reviews and tweets by APIs and Python.
- Top customer concern by social media feeds (LDA).
- Customer tweet sentiment analysis and prediction (SVM , TFIDF).
- Custom ranking algorithm to measure the overall service quality of retailer stores in Dallas area.
- Provide visualized presentation of such findings in CRM recommendation engine on top of Tableau platform. ![alt text][logo] [logo]:https://github.com/fairypp/ATT_Fall_Case_Competition_Code/blob/master/overall_rank.png
/-------R Code
| |--------Sample Data
| |--------ATT_LDA.R
| |--------Corr.R
| |--------Preprocess.R
| |--------Sentiment.R
| |--------TwitterPublicData.R
| |--------TwitterStreamData.R
| |--------mystopwords.txt
|
|-----Python Code
| |--------ReadMeForPython.docx
| |--------fetch_google.py
| |--------fetch_yelp.py
| |--------top 100 populated cities in US.txt
-
ATT_LDA.R : extract customer service topics by LDA method.
-
Corr.R : compute the correlation matrix of different demographics factors.
-
Preprocess.R : normalize all collected review ratings and prepare the training corpus for sentiment prediction.
-
Sentiment.R : predict sentiment for tweets by Max Entropy and SVM.
-
TwitterPublicData.R : fetch Twitter history data by Twitter APIs.
-
TwitterStreamData.R : fetch Twitter real-time streaming data by Twitter APIs.
File “mystopwords.txt” is used for text preprocessing. -
fetch_google.py : fetch Google reviews by Google Search APIs.
-
fetch_yelp.py : fetch part of Yelp reviews by Yelp APIs.
File "top 100 populated cities in US.txt" is used to store geographic information of US top 100 populated cities for fetch_google.py.
- ATT_dallas_rank_YGF.csv : all overall ranks of AT&T retail stores in Dallas area from 3 main social media platforms (Yelp, Google and Facebook), and other information like zipcode, store address, lat and long.
- ATT_dallas_reviews.csv : sample review data of AT&T retail stores in Dallas area got from Yelp, Google and Facebook.
- ATT_US_reviews.csv : sample review data of AT&T retail stores all over US got from Google reviews.
- Demographic.csv : the demographic information we collected for Dallas area.
- realtime_twitter.csv : some sample Twitter streaming data.
- TMobileHelp_twitter_users.csv : some sample Twitter data related users.
- LDA 15 TopicsToTerms.xlsx : LDA-extract customer service topics.
Due to Yelp APIs’ limitation, chrome tools is chosen to called Web Scraper to fetch all Yelp reviews from webpages. For the same reason, all reviews are fetched from Facebook.
Due to time limitation of competetion, custom implementation for web scraping was not developed.
Input file paths are hardcoded. This can be easily modified to be command line parameter(s).