Deep Semantic Search

This repository contains a system designed for embedding, indexing, and applying semantic search for personal folders containing text and image data.
The system is capable of processing, analyzing, and visualizing the data, with additional features such as clustering, image captioning, and retrieval-augmented generation.

Components:

Multi-modal Semantic Search:

Embedding and indexing text data using the nli-mpnet-base-v2 model.
Embedding and indexing image data using the CLIP model.
Semantic search for both text and image data (searching images by both image and text queries).
Additional keyword text search feature for enhanced search results.

Clustering and Image Captioning:

Clustering image embeddings using the PyTorch KMeans implementation (with GPU support).
Image captioning utilizing the BLIP model.

Retrieval-Augmented Generation RAG:

Utilization of a local instance of the Ollama API to run open-source LLM models (running with docker-compose).
Answering questions based on search results.
Summarizing search results.
Generating topics for provided image captions.

Web User Interface Using Gradio:

Provides a user-friendly interface for interacting with the system.

Visualization (In experiments directory)

Visualizes data and results.
Facilitates exploration of topic relationships through semantic graphs.
Applies PCA dimensionality reduction for 2D and 3D visualizations of cluster embeddings.

Backend API Support:

Offers a RESTful API for data retrieval and processing.

Download the Example Testing Dataset:

A sample testing dataset can be downloaded from here.

Installation (Linux / MacOS):

(Recommended)

Configuration:

cp .env.example .env

Starting the System:

./start.sh

Access the web interface at http://127.0.0.1:7860/.

Running Tests:

python ./src/api.py
cd src/tests
pytest

How to Run Manually (Windows):

Please note that the system is primarily designed to run on Linux. Running on Windows may require additional adjustments and is not guaranteed to work seamlessly.

# Set environment variables
set OLLAMA_LLM_MODEL=your_model # default is mistral:7b
set DEFAULT_SEARCH_FOLDER_PATH=\path\to\your\dataset\folder # optional

# Create a virtual environment and install dependencies
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

# Start Ollama API and pull the model
docker compose up -d
docker exec -it ollama-api ollama pull %OLLAMA_LLM_MODEL%

# Start the application
python .\src\app.py

Access the web interface at http://127.0.0.1:7860/.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
experiments		experiments
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
start.api.sh		start.api.sh
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Semantic Search

Components:

Download the Example Testing Dataset:

Installation (Linux / MacOS):

Configuration:

Starting the System:

Running Tests:

How to Run Manually (Windows):

About

Releases

Packages

Languages

Harduex/deep-semantic-search

Folders and files

Latest commit

History

Repository files navigation

Deep Semantic Search

Components:

Download the Example Testing Dataset:

Installation (Linux / MacOS):

Configuration:

Starting the System:

Running Tests:

How to Run Manually (Windows):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages