Comparitive analysis of image captioning model using RNN, BiLSTM and Transformer model architectures on the Flickr8K dataset and InceptionV3 for image feature extraction.
-
Updated
Sep 29, 2024 - Python
Comparitive analysis of image captioning model using RNN, BiLSTM and Transformer model architectures on the Flickr8K dataset and InceptionV3 for image feature extraction.
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
Generate captions from images
Yet another im2txt (show and tell: A Neural Image Caption Generator)
Add a description, image, and links to the flickr8k-dataset topic page so that developers can more easily learn about it.
To associate your repository with the flickr8k-dataset topic, visit your repo's landing page and select "manage topics."