Eye-For-Blind

Problem statement:

This problem statement is an application of both DL and NLP. This kind of model is a use-case for blind people so that they can understand any image with the help of speech.

The features of an image will be extracted by a CNN-based encoder and this will be decoded by an RNN model. The caption generated through a CNN-RNN model will be converted to speech using a text to speech library.

The project is an extended application of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention paper.

Dataset:

The dataset is taken from the Kaggle website and it consists of sentence-based image descriptions having a list of 8091 images that are each paired with five different captions which provide clear descriptions of the salient entities and events of the image. The link is: https://www.kaggle.com/adityajn105/flickr8k

Steps followed:

Data Understanding: load the data and understand the representation.
Data Preprocessing: preprocess both images and captions to the desired format.
Train-Test Split: Combine both images and captions to create the train and test dataset.
Model Building: create image captioning model by building Encoder, Attention and Decoder model.
Model Evaluation: Evaluate the models using greedy search and BLEU score.
Model testing

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Architecture.jpg		Architecture.jpg
Image_to_text_and_to_speech.ipynb		Image_to_text_and_to_speech.ipynb
README.md		README.md
text_to_speech.mp3		text_to_speech.mp3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eye-For-Blind

Problem statement:

Dataset:

Steps followed:

About

Releases

Packages

Languages

atharvapathak/Eye_For_Blind_Project

Folders and files

Latest commit

History

Repository files navigation

Eye-For-Blind

Problem statement:

Dataset:

Steps followed:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages