GitHub - DarkKnightSgh/Text-Image-Text: Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.

TEXT-IMAGE-TEXT

Algorithms for information Retrieval Project

About:

Image-Text

The image-to-text retrieval system takes an image as input and passes it through the BLiP (Bidirectional Language-Image Pretraining) model, to generate a descriptive caption for the input image. Using semantic embedding techniques, the generated caption is then compared against a dataset of existing captions and their corresponding images. Through this comparison, the system retrieves the top five captions that are semantically most similar to the generated caption. Along with these similar captions, the system also retrieves the corresponding images associated with each caption. Additionally, a similarity score is calculated for each retrieved caption, indicating the degree of semantic similarity between the generated caption and the retrieved captions.

Text-Image

The text-to-image retrieval system operates by receiving a description of the desired image from the user. Leveraging the pre-trained model 'all-MiniLM-L6-v2', the system processes the predicted description, alongside the preprocessed textual captions of images within the dataset, encoding their semantic meaning. Utilizing cosine similarity, the system calculates the resemblance between the descriptions and the captions, applying a threshold of 0.5. Subsequently, the system ranks the similarities, presenting them in descending order, and exhibits the top five images most closely aligned with the input description.

Tech Used:

Vision Transformer model
BlipProcessor, BlipForConditionalGeneration
Semantic Embeddings (TensorFlow)
BLEU ,Similarity and Relevance scores
Mini LM L6 V2 model
Streamlit(Frontend)

Dataset Link:

https://www.kaggle.com/datasets/adityajn105/flickr8k (Make sure to adjust the paths accordingly while running)

HuggingFaceModels:

https://huggingface.co/Salesforce/blip-image-captioning-base - BLIP model
https://huggingface.co/nlpconnect/vit-gpt2-image-captioning - ViT model
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 - all-MiniLM-L6-V2

Disclaimer:

Streamlit sometimes needs python/conda virtual environment to run properly
Make sure to run model2.py once before running main.py
Runs on : http://localhost:8501/

Drive Link for Encoder-Decoder Model:

https://drive.google.com/file/d/1MyHcYK7cAvOq3bxb3z2OGMfuNZS0xLyN/view?usp=drive_link

Downloading and Loading Universal Sentence Encoder (Version 4) using TensorFlow Hub

import tensorflow_hub as hub
import ssl
import certifi
import requests
import tarfile
import os

ssl_context = ssl.create_default_context(cafile=certifi.where())

ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE

session = requests.Session()
session.mount('https://', requests.adapters.HTTPAdapter(pool_connections=1, pool_maxsize=1, max_retries=3))

response = session.get("https://tfhub.dev/google/universal-sentence-encoder/4?tf-hub-format=compressed")
if response.status_code == 200:
    # Save the model to a temporary file
    with open("universal_sentence_encoder_4.tar.gz", "wb") as f:
        f.write(response.content)
    
    # Extract the model
    with tarfile.open("universal_sentence_encoder_4.tar.gz", "r:gz") as tar:
        tar.extractall()
    
    # Load the model
    model = hub.load(os.path.join("universal_sentence_encoder_4"))
else:
    print("Failed to download the model:", response.status_code)

After Downloading:

Extract the file,you would get : saved_model.pb variables/variables.data-00000-of-00001 variables/variables.index
add all of them to a directory and provide path for model in app.py

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Flickr_8k		Flickr_8k
Initial Codes		Initial Codes
Loader.py		Loader.py
ReadME.md		ReadME.md
Text_Preprocess.py		Text_Preprocess.py
app.py		app.py
captions.txt		captions.txt
descriptions.txt		descriptions.txt
main.py		main.py
model2.py		model2.py
transformers.py		transformers.py
vit.py		vit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEXT-IMAGE-TEXT

About:

Tech Used:

Dataset Link:

HuggingFaceModels:

Drive Link for Encoder-Decoder Model:

Downloading and Loading Universal Sentence Encoder (Version 4) using TensorFlow Hub

After Downloading:

About

Releases

Packages

Contributors 2

Languages

DarkKnightSgh/Text-Image-Text

Folders and files

Latest commit

History

Repository files navigation

TEXT-IMAGE-TEXT

About:

Tech Used:

Dataset Link:

HuggingFaceModels:

Drive Link for Encoder-Decoder Model:

Downloading and Loading Universal Sentence Encoder (Version 4) using TensorFlow Hub

After Downloading:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages