This project uses the DistilBERT model to perform sentiment classification on the IMDb dataset. The script fine-tunes the pre-trained DistilBERT model using the IMDb dataset and evaluates its performance.
- Python 3.8 or later
- An NVIDIA GPU with CUDA installed (optional, but recommended for faster training)
-
Clone the repository:
git clone https://github.com/your-repository/imdb-sentiment-classification.git cd imdb-sentiment-classification
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
To train and evaluate the model, run the following command:
python tfm_classifier.py
This script will:
- Load and preprocess the IMDb dataset.
- Fine-tune the DistilBERT model.
- Evaluate the model on the test set.
- Save the fine-tuned model and tokenizer.
- torch: PyTorch for model training and inference.
- transformers: Hugging Face Transformers library for using the DistilBERT model.
- datasets: Hugging Face Datasets library for loading and processing the IMDb dataset.
- pandas: Data manipulation library.
- Replace
your-repository
with the actual repository URL if you have one. - Make sure the
tfm_classifier.py
script contains the training and evaluation code provided earlier.
This README.md
provides comprehensive instructions on setting up, running, and using the project, ensuring clarity for any users or contributors.
This project is licensed under the MIT License. See the LICENSE file for details.