"Caption It!" is a deep learning project focusing on the automation of image captioning. Utilizing the Flickr 8k Dataset and pre-trained CNN/ResNet models, this project compares two approaches: CNN+LSTM and ResNet+GRU, evaluating their performance using BLEU scores.
- Problem Statement
- Technical Approach
- Dataset Analysis
- Exploratory Data Analysis
- Deep Learning Approaches: VGG16+LSTM and RESNET50+GRU
- Performance Evaluation
- Conclusion
Our goal was to develop a model that automatically generates captions for images using advanced deep learning techniques.
- Import libraries and modules
- Load and preprocess dataset
- Perform feature extraction and caption tokenization
- Model building, training, and evaluation
- For further exploration and model generalization, the Flickr 8k Images with Captions dataset is also available on Kaggle and can be accessed at: Flickr 8k Images with Captions on Kaggle.
The Flickr 8k Dataset, comprising 8000 images each with 5 captions, was used. This dataset offers a diverse range of images and high-quality captions, ideal for training image captioning models. Below are some visuals from the dataset!
- VGG16: A pre-trained CNN for image classification.
- LSTM: A recurrent neural network excellent for capturing temporal dependencies.
Output of trained VGG16 & LSTM model
- ResNet50: A deep residual network for image recognition.
- GRU: Efficient at capturing temporal relationships in sequence modeling.
Output of trained ResNet50 & GRU model
- Model performance was evaluated using BLEU scores.
BLUE Scores comparision
- The VGG16+LSTM model exhibited higher BLEU scores, indicating its effectiveness in generating more accurate captions.
The project reveals the impact of different pre-trained image feature extraction models and sequence models on the quality of generated captions. It opens pathways for further improvements in the field of automated image captioning.
This project requires the following libraries:
- TensorFlow
- Keras
- NumPy
- Pandas
- Matplotlib
- PIL
- NLTK
Install these using pip
or conda
as shown in the provided Python notebooks.
To run this project, follow these steps:
- Clone the repository to your local machine.
- Ensure you have Jupyter Notebook installed.
- Open and run the
eda.ipynb
notebook for exploratory data analysis. - Proceed with
vgg16_lstm.ipynb
for the VGG16+LSTM model training and evaluation. - Finally, execute
resnet_gru.ipynb
for the ResNet50+GRU model training and evaluation. - Compare the BLEU scores as outputted by the notebooks to evaluate the models.
- Refer PPT for flow of the project
Please refer to each notebook for detailed instructions on the steps involved in the respective processes.