This is pre-trained roBERTa model to identify the hateful and offensive speech. Model is trained on the original dataset and has the accuracy of 94.5% on the test datasets. Model can classify the given input text into three different categories which are hateful, offensive and neither of them.
Before you run this projects, you should install the requirements.txt
file so you don't get any package error in future.
Install all the required package by running following command:
pip install -r requirements.txt
Packages used by the model are:
- transformers
- datasets
- scikit-learn
- wandb
To train the model by yourself, go over to the file text-multiclassification.ipynb
. This file is use to train the model from the scratch.
Feel free to try this notebook and train a model by yourself. The comments in file will guide you through your journey.
If you want to direct run or use the model then head over to the application.py
file. This file has the nice interface to interact with the model and display it nicely.
You can find the dataset that is used to train the model, on this page.
The dataset split into three parts train, validation and test.
You can directly use this pre-trained model for your own use or personal project from this page.
This model is currently live on the Internet. To interact with model and play with model, head over to this space and enjoy! =)
Contributions are always welcome!
Open a Pull Request for any appropriate model changes and feature updates.