Autocaption

Image caption generation in PyTorch using an encoder-decoder architecture

Overview

This work implements a variant model based on the paper Show and Tell: A Neural Image Caption Generator. Given an image, the model is able to describe in natural language the contents of the image. The model is comprised of the encoder, a pretrained CNN, which extracts high-level features from the image and feeds them to the decoder, an LSTM, which generates the sequences of words.

Live demo

Documentatation

Slides

Prerequisites

Conda or Virtualenv
Flickr8k dataset for training (downloadable here)

Installation

Extract the images from the Flickr8k dataset under ./data/images

$ git clone https://github.com/nhabbash/autocaption
$ cd autocaption
$ conda env create -p .\cenv -f .\environment.yml # using conda
$ jupyter nbextensions_configurator enable --user # optional

Training

Uses:

PyTorch for deep learning
Ax for hyperparameter tuning
Weights and Biases for experiment tracking

For a detailed example, check the training notebook under ./notebooks/training

Notes

The best model obtained after training and hyperparameter tuning achieves an average BLEU score on the test split of 11, compared to 27.2 of the original paper. (See the report or the slides for more details on the performance)
The model works best with pictures similar to those it has been trained on. In the case of Flickr8k, pictures with one or two subjects doing some simple activities. It works pretty good with dogs playing around and people engaging in a couple of sports (e.g. surfing, trekking on mountains).
The demo is made in Vue.js for the frontend and FastAPI for the backend. The backend is deployed on Heroku, and if it's the first time running in a while it does take a couple of minutes to start up and generate the first caption, after that it usually takes a dozen seconds. If running the demo locally (You can if you have Docker Compose) caption generation takes about 5 seconds.

Authors

Nassim Habbash - nhabbash

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
demo		demo
docs		docs
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
data.py		data.py
environment.yml		environment.yml
models.py		models.py
requirements.txt		requirements.txt
train_utils.py		train_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autocaption

Overview

Live demo

Documentatation

Slides

Prerequisites

Installation

Training

Notes

Authors

About

Releases

Packages

Contributors 2

Languages

nhabbash/autocaption

Folders and files

Latest commit

History

Repository files navigation

Autocaption

Overview

Live demo

Documentatation

Slides

Prerequisites

Installation

Training

Notes

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages