This code was written for my Masters Thesis Degree in Data Journalism while studying in the Higher School of Economics in Moscow
Here you can find:
How to scrap tweets from Twitter through the Twitter API.
How to clean and create a dataframe with the scrapped data from Twitter
How to tokenize, lemmatize and remove stopwords in order to analyze the content of the tweets
How to count the common words in the collection of tweets
How to create a WordCloud based in the common words in the collection of tweets
How to create a Bigram Network based file .gexf to be used in Gephi
Run a LDA algorithm in order to know the most predominant topics in the collection of tweets
Generate a LDA algorithm visualization in order to explore easily the results given by the LDA algorithm
Run a series of tests like :UMass, c_v and perplexity in order to check the accuracy of the LDA algorithm