Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 509 Bytes

README.md

File metadata and controls

8 lines (6 loc) · 509 Bytes

Text Analysis in Arabic Tweets

6,000 tweets were downloaded from Twitter using Tweepy in three batches of 2,000 that were found using three keywords in Egyptian Arabic:

  • Man/guy - ragil - راجل
  • Yes - aywa -أيوا
  • Well - kuweiss كويس

In this notebook, we clean Arabic tweets, return ngrams, term frequncies and Tfidf, segment common words found together, and cluster words by topic. Tools: NLTK, googletrans, sklearn feature extraction for text, kmeans clustering, and matplotlib 3D axes.