This application serves as a reverse proxy with caching capabilities, specifically tailored for language model model API requests. Built with Golang, it facilitates interactions with models hosted on platforms like OpenAI by caching responses and minimizing redundant external API calls.
The goal is to allow for you to develop against llm api's without running up a bill.
- Reverse Proxy Functionality: Directs API requests to the appropriate machine learning model service provider. Currently only supports OpenAI.
- Caching Mechanism: Stores successful responses to reduce API calls and improve performance. Cache hits serve responses directly from the cache.
- Token Counting: Leveraging
tiktoken-go
, the service estimates the number of tokens in each request's payload to keep track of usage. - Dynamic Service Resolution: Looks up the configured model service provider based on the requested model in the API call.
- Extensibility: Supports registering multiple model providers through the
IModelProvider
interface, each with its own set of API models and endpoints.
The entry point main()
initiates the application by setting up a response cache and a service resolver that includes an OpenAIProvider
responsible for handling OpenAI API requests. The HTTP server listens on port 8080
and processes incoming requests through a handler which:
- Generates a cache key based on the request's path, body, and header.
- Checks the cache for a stored response corresponding to the cache key.
- If a cache hit occurs, it serves the response directly from the cache.
- On a cache miss, it determines the correct service provider and reverse proxies the request to the target machine learning model API.
The application exposes a single HTTP endpoint /
that accepts requests with model specifications in the body. Examples of supported model names include "gpt-4", "gpt-3.5-turbo", and "text-embedding-3-large".
To run the service, ensure you have the following:
- Golang installed and configured.
tiktoken-go
library installed (go get github.com/pkoukk/tiktoken-go
).
To start the server, execute:
go run main.go
The server will listen on http://localhost:8080
.
Download the repository and install the dependencies:
go get
Then install via:
go install
Make an API request to the service with the desired machine learning model name and payload:
curl -X POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" \
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
Replace the model
value and the messages
array content with your specific requirements.
You can drop in the proxt via the
const llm = new OpenAI({
baseURL: "http://localhost:8080/v1",
});
const res = await llm.chat.completions.create({
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
],
model: "gpt-3.5-turbo",
temperature: 1,
});
ls $GOPATH/bin
The generateCacheKey()
function creates a unique cache key by hashing the request's path, body, and the 'Authorization' bearer token if present.
The CountTokens()
method, part of the OpenAIProvider
implementation of IModelProvider
, counts tokens in the request content for a given model, aiding in managing token usage.
This application serves as an example and may require additional security and error handling features to be production-ready.