disaster-tweets-nlp

Natural Language Processing with Disaster Tweets

Data set: Kaggle

Predict which Tweets are about real disasters and which ones are not

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster.

Steps:

analyzed dataset imbalance and compared dataset-specific metrics for each label, such as: count, word length, character length
pre-processed the texts from the tweets
- convert to lowercase, strip and remove punctuations
- removed stopwords
- lemmatized the result using a helper to map NTLK position tags
split the data into train / test
extracted features using tf-idf vectorizer
initialized a default MLPClassifier
ran GridSearchCV to find the best satisfying basic hyperparams
tuned the MLPClassifier with the previously found params
obtained a result of 82% accuracy!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
MeanEmbeddingVectorizer.py		MeanEmbeddingVectorizer.py
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

disaster-tweets-nlp

Natural Language Processing with Disaster Tweets

About

Releases

Packages

Languages

shynxe/disaster-tweets-nlp

Folders and files

Latest commit

History

Repository files navigation

disaster-tweets-nlp

Natural Language Processing with Disaster Tweets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages