Athena is an AI-Assist protoype powered by Cohere-AI and Embed-v3 to faciliate scientific Research. Its key differentiating features include:
- Advanced Semantic Search: Outperforms traditional keyword searches with state-of-the-art embeddings, offering a more nuanced and effective data retrieval experience that understands the complex nature of scientific queries.
- Human-AI Collaboration: Enables easier review of research literature, highlighting key topics, and augmenting human understanding.
- Admin Support: Provides assistance with tasks such as categorization of research articles, e-mail drafting, and tweets generation.
As part of this project we have created two datasets of 50.000 arXiv articles related to AI and NLP using Cohere Embedv3:
- https://huggingface.co/datasets/dcarpintero/arXiv.cs.AI.CL.CV.LG.MA.NE.embedv3
- https://huggingface.co/datasets/dcarpintero/arXiv.cs.CL.embedv3
Steps:
- Retrieve Articles' Metadata from ArXiv. See ./data_pipeline/retrieve_arxiv.py
- Embed Articles' Title and Abstract using Embedv3. See ./data_pipeline/embed_arxiv.py
- Store Articles' Metadata and Embeddings in Weaviate. See ./data_pipeline/index_arxiv.py
Some of our tasks such as enriching abstracts with Wikipedia Links, crafting a glossary, composing e-mails and tweeting rely on a set of:
Those prompts are then composed into a LangChain chain as in the following code snippets:
- Enrich Abstract
- Keywords
- E-mail Drafting w/ JSON Formatting
- Tweet Generation w/ JSON Formatting and Pydantic Validation
See ArxivArticle Class.
The coral.py class provides an abstraction layer over Cohere endpoints.
See app.py
- Clone the repository:
git@github.com:dcarpintero/athena.git
- Create and Activate a Virtual Environment:
Windows:
py -m venv .venv
.venv\scripts\activate
macOS/Linux
python3 -m venv .venv
source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Run Data Pipeline (optional)
python retrieve_arxiv.py
python embed_arxiv.py
python index_arxiv.py
- Launch Web Application
streamlit run ./app.py