Query OpenSearch indices with natural language.
This is an example application that showcases LLMs ability to translate natural language into OpenSearch/ElasticSearch queries that can be used to return results to the user.
The application is written in Python and utilizes AWS via CDK IaC.
- The user asks a question in natural language.
- This is hardcoded in the lambda in this example.
- The lambda calls OpenAI text-davinci model to convert query into an OpenSearch query.
- OpenAI API key managed in secrets manager rather than environment variables for security.
- OpenSearch queried using response from text-davinci.
- User receives OpenSearch documents based on their query.
OpenSearch movie data:
title | director | year |
---|---|---|
Moneyball | Bennett Miller | 2011 |
Star Wars: Episode I - The Phantom Menace | George Lucas | 1999 |
28 Days Later | Danny Boyle | 2002 |
Shaun of the Dead | Edgar Wright | 2004 |
The Grand Budapest Hotel | Wes Anderson | 2014 |
Q: Find all movies that were made after 2010
A: Moneyball, 28 Days later, Shaun of the Dead, The Grand Budapest Hotel
Q: Find all movies that were directed by George Lucas with Star Wars in the title
A: Star Wars: Episode I - The Phantom Menace
- Provide more data in prompt - The model has to assume the structure of the documents fields. If it understood the schema then it could be made more generic and extensible.
- Validating input and output - both for the user and the model's output. We wish to avoid giving OpenSearch bad queries and similarly sending user prompts to the language model raw could lead to hijacking.
- Model fine tuning - having a specific language model for open search queries could yield better results.
- Testing, DevOps etc
This project is setup for CDK development with Python.
The cdk.json
file tells the CDK Toolkit how to execute your app.
This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the .venv
directory. To create the virtualenv it assumes that there is a python3
(or python
for Windows) executable in your path with access to the venv
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.
To manually create a virtualenv on MacOS and Linux:
$ python3 -m venv .venv
After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.
$ source .venv/bin/activate
If you are a Windows platform, you would activate the virtualenv like this:
% .venv\Scripts\activate.bat
Once the virtualenv is activated, you can install the required dependencies.
$ pip install -r requirements.txt
At this point you can now synthesize the CloudFormation template for this code.
$ cdk synth
To add additional dependencies, for example other CDK libraries, just add
them to your setup.py
file and rerun the pip install -r requirements.txt
command.
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentationpython -m pytest
runs unit tests
Docker must be installed to bundle the python lambda with the appropriate packages.