An arabic chatbot that can detect sentiment and reply accordingly.
- Read and merge train and test datasets
- Combine all contexts into either positive or negative sentiment.
- Use Arabic library ("qalsadi.lemmatizer") for tokenization, removing stop-words and lemmatization.
- We create a new replicated column of the available sentences and then we add it to the current dataset but shifted up by 1.
- We remove the last sentence from every conversation in the dataset as it doesn’t have a reply (the next sentence will be for another conversation).
- We divide the dataset into training/testing datasets.
- Train machine learning Logistic Regression model on the training dataset
- Run the trained model on the entered query to classify its sentiment.
- Create Tf-idf for all sentences that have the same sentiment as the query.
- Create Tf-idf for the entered query
- Calculate cosine similarity between the entered query and all sentences that have the same sentiment.
- Choose the sentence with the highest cosine similarity
- Output the following sentence as it was the reply for the most similar sentence.
- Ahmed Osama Mohamed 40-9418
- Mostafa walid 40-5470
- Omar Khaled Khairy 40-5535
- Malak Osama 40-1389