This repository contains Python scripts demonstrating the use of a Neural Retriever in a Retrieval Augmented Generation (RAG) pipeline. The scripts demonstrate three different implementations of a Neural Retriever using Apache Solr, Elasticsearch, and Watson Discovery as document stores.
-
Elasticsearch: Demonstrates the use of Elasticsearch as a document store.
-
Solr: Demonstrates the use of Apache Solr as a document store.
-
Watson Discovery: Demonstrates the use of Watson Discovery as a document store.
-
ProcessElastic.py: Re-usable Script to retrieve documents from Elasticsearch instance.
- Clone this repository.
- Install the required dependencies (see the Dependencies section below).
- Modify the config.yaml to update the
retriever
pointing to your service - Run the ProcessElastic.py to see the neural retriever in action.
Run the ProcessElastic.py after updating config.yaml to see the neural retriever in action.
Each script defines a function for the information retriever (SolrRetriever
, ESRetriever
, or WDRetriever
) takes a query and returns the top matching documents from the respective document store.
Here's a basic example of how you might use the SolrRetriever
:
retriever = SolrRetriever(solr_url='http://localhost:8983/solr', collection_name='my_collection')
results = retriever.retrieve('What is DataOps?')
print(results)
These scripts require Python 3.6 or later. They also require the following Python libraries:
pysolr
(forsolr_retriever.py
)elasticsearch
(fores_retriever.ipynb
)requests
(forwd_retriever.py
)
You can install these libraries using pip:
pip install pysolr elasticsearch requests