PaperMatch: arXiv Search with Embeddings and Milvus

This project allows users to search for arXiv papers either by ID or abstract. The search functionality is powered by a machine learning embedding model and Milvus, a vector database. Gradio is used to create a user-friendly web interface for interaction.

See implemented demo at papermatch.mitanshu.tech

See full explanation at the corresponding blog post: mitanshu.tech/posts/papermatch

Features

Search by Abstract: Convert the abstract into a vector using the mixedbread-ai/mxbai-embed-large-v1 model and find similar papers based on cosine similarity.
Search by ID: Retrieve information directly by arXiv ID.
Top K Results: Display the top K results from Milvus based on similarity.

Requirements

Python 3.7+
Gradio
Milvus
mixedbread-ai/mxbai-embed-large-v1 (or any compatible embedding model)

Installation

Clone the repository:

git clone [<repository-url>](https://github.com/mitanshu7/search_arxiv.git)
cd search_arxiv

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```
Set up Milvus:
- Follow the Milvus installation guide to get Milvus up and running.
- Configure Milvus with your preferred settings.
- Or use standalone_embed.sh in this repo made compatible with Fedora.

Usage

Prepare Milvus:

# Command to prepare Milvus 
python prepare_milvus.py

Setup API key : Get your key from Mixedbread and paste it in .env file. See .env.sample for config.
Run the Gradio app:
```
python app.py
```
Interact with the web interface:
- Open your web browser and go to http://localhost:7860 to access the Gradio interface.
- Use the search bar to input arXiv ID or abstract and view the search results.

Configuration

Embedding Model: The embedding model used is mixedbread-ai/mxbai-embed-large-v1.

Example

Here is a basic example of how to use the search feature:

Search by Abstract:
- Enter the abstract of the paper in the provided text box.
- The system will convert it to a vector, query Milvus, and return the most relevant papers.
Search by ID:
- Input an arXiv ID directly.
- Retrieve and display the corresponding paper details.

Run at startup (systemd):

create a file ~/.config/systemd/user/search_arxiv.service using: nano ~/.config/systemd/user/search_arxiv.service with the following contents (assuming user=milvus, and using anaconda package manager with env name search_arxiv):

[Unit]
Description=Search ArXiv  Web App
After=network.target

[Service]
WorkingDirectory=/home/milvus/search_arxiv/
ExecStart=/bin/bash -c "source /home/milvus/miniforge3/bin/activate search_arxiv && python app.py"
Restart=always

[Install]
WantedBy=default.target

Issue systemctl --user daemon-reload to reload systemd.
issue systemctl --user start search_arxiv.service to start the app.
Issue systemctl --user enable search_arxiv.service to enable app at start up.

To Do:

Automate embedding of new metadata each month.
Learn incremental databse update.
Automate setting up of the app.
Find more sources to integrate.

Contributing

Feel free to contribute to the project by submitting issues, pull requests, or suggestions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or feedback, please contact mitanshu.sukhwani@gmail.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperMatch: arXiv Search with Embeddings and Milvus

Features

Requirements

Installation

Usage

Configuration

Example

Run at startup (systemd):

To Do:

Contributing

License

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
app.py		app.py
prepare_milvus.py		prepare_milvus.py
requirements.txt		requirements.txt
standalone_embed.sh		standalone_embed.sh

mitanshu7/PaperMatch

Folders and files

Latest commit

History

Repository files navigation

PaperMatch: arXiv Search with Embeddings and Milvus

Features

Requirements

Installation

Usage

Configuration

Example

Run at startup (systemd):

To Do:

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages