Meet RoboMax! Your personal assistant for all your queries about the US election ¯\(ツ)/¯
This repository contains self-sustained Jupyter notebooks used for training our assistant RoboMax to make it capable enough to answer open-domain questions. We use tweets as a source for our knowledge-base and attempt to reflect back the opinion of the world
about your question of interest. At the moment, we've tweaked our RoboMax to answer questions about the 2016 US election from tweets gracefully made available at https://www.kaggle.com/kinguistics/election-day-tweets/#election_day_tweets.csv
The notebook robomax-training-notebook.ipynb serves as the starting point for this project which constitutes of the major data exploration and feature engineering tasks.
The notebook robomax-election-tweets-bot.ipynb involves tweaking RoboMax in order to answer questions based on election tweets.
Due to the unavailability of a twitter based question-answer dataset, we resorted to using the pretty standard SQuAD reading comprehension dataset in a modified way. Instead of predicting the factual answers, we trained our model to identify the sentence containing the required answer.
We built our model based on rather nominal features with a baseline Random Forest Classifier which leaves a huge scope for improvement. AuC served as our metric to optimize due to the traditional class imbalance issue. We aimed to improve the recall for the sentences containing the correct answer over our prediction precision.
We use a combination of indexing, predicting and summarizing to formulate an answer to the given question. Whoosh serves as our go-to indexing library. Our pre-trained model generates scores for the results from the indexer in terms of which tweet is closest to the question followed by capping the best results using an Edmundson summarizer to finally bake up an answer.