- Features
- Architecture
- Precautions
- Preview Access and Service Quotas
- Models Providers
- RAG Sources
- Deploy
- Clean up
- Authors
- Credits
- License
This sample provides code ready to use so you can start experimenting with different LLMs and prompts.
Supported models providers:
- Amazon Bedrock (currently in preview)
- Amazon SageMaker self hosted models from Foundation, Jumpstart and HuggingFace.
- External providers via API such as AI21 Labs, Cohere, OpenAI, etc. See available langchain integrations for a comprehensive list.
This sample provides comes with CDK constructs to allow you to optionally deploy one or more of:
Example with Kendra as RAG source | Example with Amazon OpenSearch Vector Search as RAG source |
The repository includes a CDK construct to deploy a full-fledged UI built with React to interact with the deployed LLMs as chatbots. Hosted on Amazon S3 and distributed with Amazon CloudFront. Protected with Amazon Cognito Authentication to help you interact and experiment with multiple LLMs, multiple RAG sources, conversational history support and documents upload. The interface layer between the UI and backend is build on top of Amazon API Gateway WebSocket APIs.
Build on top of AWS Cloudscape Design System.
This repository comes with several reusable CDK constructs. Giving you freedom to decide what the deploy and what not.
Here's an overview:
This CDK constructs provides necessary Amazon Cognito resources to support user authentication.
This CDK constructs deployes a websocket based interface layer to allow two-way communication between the user interface and the model interface.
This is not a CDK construct but it's important to note that messages are delivered via Amazon SQS FIFO queues and routed via an Amazon SNS FIFO Topic.
FIFO is used to ensure the correct order of messages inflow/outflow to keep a "chatbot conversation" always consistent for both user and LLM. Also to ensure that, where streaming tokens, is used the order of tokens is also always respected.
CDK constructs which deploye resources, dependencies and data storage to integrate with multiple LLM sources and providers. To facilitate further integrations and future updates and reduce amount of customization required, we provide code built with known existing LLM oriented frameworks.
Pre-built model interafaces:
- LangchainModelInterface: python-centric and built on top of Langchain framework and leveraging Amazon DynamoDB as LangChain Memory.
The model interface carries a concept of ModelAdapter with it. It's a class that you can inherit and ovveride specific methods to integrate with different models that might have different requirements in terms of prompt structure or parameters.
It also natively support subscription to LangChain Callback Handlers.
This repository provides some sample adapetrs that you can take inspiration from to integrate with other models. Read more about it here.
A prupose-built CDK Construct, SageMakerModel, which helps facilitate the deployment of model to SageMaker, you can use this layer to deploy:
- Models from SageMaker Foundation Models/Jumpstart
- Model supported by HuggingFace LLM Inference container.
- Models from HuggingFace with custom inference code.
The Layer construct in CDK provides an easier mechanism to manage and deploy AWS Lambda layers. You can specify dependencies and requirements in a local folder and the layer will pack, zip and upload the depedencies autonomously to S3 and generate the Lambda Layer.
This CDK construct simply deploys public, private, and isolated subnets. Additionally, this stack deploys VPC endpoints for SageMaker endpoints, AWS Secrets Manager, S3, and Amazon DynamoDB, ensuring that traffic stays within the VPC when appropriate.
This repo also comes with CDK constructs to help you getting started with pre-built RAG sources.
All RAG constructs leverages the same pattern of implementing:
- An ingestion queue to recieve upload/delete S3 events for documents
- An ingestion, converstion and storage mechanism which is specific to the RAG source
- An API endpoint to expose RAG results to consumers, in our case the model interface.
In this sample each RAG sources is exposes endpoints and formats results in order to be used as LangChain RemoteRetriever from the Model Interface as part of a ConversationalRetrievalChain.
This aims to allow seamless integration with Langchain chains and workflows.
The CDK construct deployes a vector database on Amazon Aurora PostgreSQL with pgvector and embeddings.
Embeddings Model
: sentence-transformers/all-MiniLM-L6-v2Ranking Model
: cross-encoder/ms-marco-MiniLM-L-12-v2
Hybrid search is performed with a combination of
- Similiary Search
- Full Text Search
- Reranking of results
Check here to learn how to enable it in the stack.
The CDK construct deployes a AOSS vector database capabilities with required collection, VPC endpoints, data access, encryption policies and a an index that can be used with embeddings produced by Amazon Titan Embeddings
Embeddings Model
: Amazon Titan Embeddings
Check here to learn how to enable it in the stack.
This CDK Construct deployes an Amazon Kendra Index and necessary resoures to ingest documents and search them via LangChain Amazon Kendra Index Retriever.
Make sure to review Amazon Kendra Pricing before deploying it.
Check here to learn how to enable it in the stack.
Before you begin using the sample, there are certain precautions you must take into account:
-
Cost Management with self hosted models: Be mindful of the costs associated with AWS resources, especially with SageMaker models which are billed by the hour. While the sample is designed to be cost-effective, leaving serverful resources running for extended periods or deploying numerous LLMs can quickly lead to increased costs.
-
Licensing obligations: If you choose to use any datasets or models alongside the provided samples, ensure you check LLM code and comply with all licensing obligations attached to them.
-
This is a sample: the code provided as part of this repository shouldn't be used for production workloads without further reviews and adaptation.
-
Amazon Bedrock If you are looking to interact with models from Amazon Bedrock FMs, you need to request preview access from the AWS console. Futhermore, make sure which regions are currently supported for Amazon Bedrock.
-
Instance type quota increase You might consider requesting an increase in service quota for specific SageMaker instance types such as the
ml.g5
instance type. This will give access to latest generation of GPU/Multi-GPU instances types. You can do this from the AWS console. -
Foundation Models Preview Access If you are looking to deploy models from SageMaker foundation models, you need to request preview access from the AWS console. Futhermore, make sure which regions are currently supported for SageMaker foundation models.
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from Amazon and leading AI startups available through an API, so you can choose from various FMs to find the model that's best suited for your use case. With the Amazon Bedrock serverless experience, you can quickly get started, easily experiment with FMs, privately customize FMs with your own data, and seamlessly integrate and deploy them into your applications using AWS tools and capabilities.
If your account has access to Amazon Bedrock, there's no additional action required and you can deploy this sample as it is and Bedrock models will appear in your model list.
This sample comes with a prupose-built CDK Construct, SageMakerModel, which helps abstracting 3 different types of model deployments:
- Models from SageMaker Foundation Models/Jumpstart.
- Model supported by HuggingFace LLM Inference container.
- Models from HuggingFace with custom inference code.
Read more details here.
You can also interact with external providers via their API such as AI21 Labs, Cohere, OpenAI, etc.
The provider must be supported in the Model Interface, see available langchain integrations for a comprehensive list of providers.
Usually an API_KEY
is required to integrated with 3P models. To do so, the Model Interface deployes a Secrets in AWS Secrets Manager, intially with an empty JSON {}
, where you can add your API KEYS for one or more providers.
These keys will be injected at runtime into the Lambda function Environment Variables, they won't be visibile in the AWS Lambda Console.
For example, if you wish to be able to interact with AI21 Labs., OpenAI's and Cohere endponts:
- Open the Model Interface Keys Secret in Secrets Manager. You can find the secret name in the stack output too.
- Update the Secrets by adding a key to the JSON
{
"AI21_API_KEY": "xxxxx",
"OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxx",
"COHERE_API_KEY": "xxxxx",
}
N.B: In case of no keys needs, the secret value must be an empty JSON {}
, NOT an empty string ''
.
make sure that the environment variable matches what is expected by the framework in use, like Langchain (see available langchain integrations.
If you want to use Amazon Bedrock you must sign up for preview access from the AWS console.
If access is granted you need to add the region
and endpoint_url
provided as part of the preview access in lib/aws-genai-llm-chatbot-stack.ts
const bedrockRegion = 'region';
const bedrockEndpointUrl = 'https://endpoint-url';
After this you can jump to the next step: Enviroment.
If you don't have access to Amazon Bedrock you can choose to:
To facilitate this steps there are 2 commented examples on how to deploy:
More instructions on how to deploy other models here.
You can find how here.
If you'd like to use AWS Cloud9 to deploy the solution from you will need the following before proceeding:
- at least
m5.large
as Instance type. - use
Amazon Linux 2
as the platform. - increase the instance's EBS volume size to at least 100GB. To do this, run the following commands from the Cloud9 terminal. See the documentation for more details here.
./assets/cloud9-resize.sh 100
Verify that your environment satisfies the following prerequisites:
You have:
- An AWS account
AdministratorAccess
policy granted to your AWS account (for production, we recommend restricting access as needed)- Both console and programmatic access
- NodeJS 18 installed
- If you are using
nvm
you can run the following before proceeding -
nvm install 18 && nvm use 18
- If you are using
- AWS CLI installed and configured to use with your AWS account
- Typescript 3.8+ installed
- AWS CDK CLI installed
- Docker installed
- N.B.
buildx
is also required. For Windows and macOSbuildx
is included in Docker Desktop
- N.B.
- Python 3+ installed
The solution will be deployed into your AWS account using infrastructure-as-code wih the AWS Cloud Development Kit (CDK).
- Clone the repository:
git clone https://github.com/aws-samples/aws-genai-llm-chatbot.git
- Navigate to this project on your computer using your terminal:
cd aws-genai-llm-chatbot
- Install the project dependencies by running this command:
npm install
- (Optional) Bootstrap AWS CDK on the target account and regioon
Note: This is required if you have never used AWS CDK before on this account and region combination. (More information on CDK bootstrapping).
npx cdk bootstrap aws://{targetAccountId}/{targetRegion}
- Verify that Docker is running with the following command:
docker version
Note: If you get an error like the one below, then Docker is not running and need to be restarted:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
- Deploy the sample using the following CDK command:
npx cdk deploy
Note: This step duration can vary a lot, depending on the Constructs you are deploying. Can go from 6m with basic usage with Amazon Bedrock to 40m deploying all RAG sources an self hosted models.
-
You can view the progress of your CDK deployment in the CloudFormation console in the selected region.
-
Once deployed, take note of the
User Interface
,User Pool
and, if you want to interact with 3P models providers theSecret
that will hold the variousAPI_KEYS
.
...
Outputs:
AwsGenaiLllmChatbotStack.WebInterfaceUserInterfaceUrlXXXXX = dxxxxxxxxxxxxx.cloudfront.net
AwsGenaiLllmChatbotStack.AuthenticationUserPoolLinkXXXXX = https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
AwsGenaiLllmChatbotStack1.LangchainInterfaceKeysSecretsNameXXXX = LangchainInterfaceKeySecret-xxxxxx
...
-
Open the generated Cognito User Pool Link from outputs above i.e.
https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
-
Add a user that will be used to login into the web interface.
-
Open the
User Interface
Url frin the outputs above i.e.dxxxxxxxxxxxxx.cloudfront.net
-
Login with the user created in .6, you will be asked to change the password and you'll be logged in in the main page.
You can remove the stacks and all the associated resources created in your AWS account by running the following command:
npx cdk destroy
This sample was made possible thanks to the following libraries:
- langchain from LangChain AI
- unstructured from Unstructured-IO
- pgvector from Andrew Kane
This library is licensed under the MIT-0 License. See the LICENSE file.
- Changelog of the project.
- License of the project.
- Code of Conduct of the project.
- CONTRIBUTING for more information.