In this project I have applied logistic regression algorithm
on DonorChoose dataset to predict whether a given project will be approved funding or not.
I have created 4 dataset. Each dataset contains text features encoded with different encoding techniques.
Set1 | Text features encoded with simple Bag of words
vectorizer.
Set2 | Text Features encoded with TFIDF
vectorizer.
Set3 | Text features encoded with Avarage Word2Vec
vectorizer.
Set4 | Text features encoded with TFIDF Word2Vec
vectorizer.
Then LR is applied on all 4 datasets.
Conclusion after applying LR to all datasets
LR
is able to predict fund approval for project with 0.72 AUC score
with Set 2
. With 1/3
point is miss classified as False positive
and 1/2
points are miss classified as False Negative
.