Skip to content

A notebook for cleaning Arabic tweets, feature extraction, term frequencies, and clustering words by topic

Notifications You must be signed in to change notification settings

neongreen13/clustering_Arabic_tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Text Analysis in Arabic Tweets

6,000 tweets were downloaded from Twitter using Tweepy in three batches of 2,000 that were found using three keywords in Egyptian Arabic:

  • Man/guy - ragil - راجل
  • Yes - aywa -أيوا
  • Well - kuweiss كويس

In this notebook, we clean Arabic tweets, return ngrams, term frequncies and Tfidf, segment common words found together, and cluster words by topic. Tools: NLTK, googletrans, sklearn feature extraction for text, kmeans clustering, and matplotlib 3D axes.

About

A notebook for cleaning Arabic tweets, feature extraction, term frequencies, and clustering words by topic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published