Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 919 Bytes

README.md

File metadata and controls

23 lines (12 loc) · 919 Bytes

Twitter Research Code

This code was written for my Masters Thesis Degree in Data Journalism while studying in the Higher School of Economics in Moscow

Here you can find:

How to scrap tweets from Twitter through the Twitter API.

How to clean and create a dataframe with the scrapped data from Twitter

How to tokenize, lemmatize and remove stopwords in order to analyze the content of the tweets

How to count the common words in the collection of tweets

How to create a WordCloud based in the common words in the collection of tweets

How to create a Bigram Network based file .gexf to be used in Gephi

Run a LDA algorithm in order to know the most predominant topics in the collection of tweets

Generate a LDA algorithm visualization in order to explore easily the results given by the LDA algorithm

Run a series of tests like :UMass, c_v and perplexity in order to check the accuracy of the LDA algorithm