-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
35 lines (16 loc) · 1.53 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#### SENTIMENTAL ANALYSIS ASSIGNMENT
1. The folder named ancient- is the one which is a Flask application.
A version has been deployed to heroku, but it results in timeout due to complexity of the model. But work fine locally.
Run "python routes.py"
2. The sentimental_analysis.ipynb is the ipython version which is quite self explanatory , trained on sts-gold data. But as the data is less, it will be not perfect for new test samples.
But it results in an accuracy of 77% when repeated strings like "happyyy",
"yesss" will be added as features. This is also mentioned in the ipyython notebook.
3. Sentimental_new.ipynb is the one which trained on stanford data. As the test data is more than 16 lakhs, I wont be able to run in my system. i tried with 2 lakhs, it wont run. But all the model creation is also memntioned in data.
4. Heroku link is "https://ancient-peak-6490.herokuapp.com/"
in the search box add a tweet . By deafult it will wait for 30 sec. if it picks up no tweet, the request will be rejected. Used tweepy module and multithreading.
5. As a part of feature extraction I have used the following:
a.) Count vectorizer for counting the total number of words after stemming and tokenization.
b.) Tf-idf , which doesnt make much change in accuracy.
c.) word2vec features, which is a genesim library package, which is an implementation of deep-learning based, word to context features. It results in 71% accuracy.
More can be done and experimented with a sophisticated system.
System used : Mac Yosemite with 8gb Ram.