Step 1: Put your API keys in .env Copy the .env.template and put in the relevant keys (e.g. OPENAI_API_KEY="sk-..")
Step 2: Test your proxy Start your proxy server
$ cd litellm-proxy && python3 main.py
Make your first call
import openai
openai.api_key = "sk-litellm-master-key"
openai.api_base = "http://0.0.0.0:8080"
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])
print(response)
-
Make
/chat/completions
requests for 50+ LLM models Azure, OpenAI, Replicate, Anthropic, Hugging FaceExample: for
model
useclaude-2
,gpt-3.5
,gpt-4
,command-nightly
,stabilityai/stablecode-completion-alpha-3b-4k
{ "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1", "messages": [ { "content": "Hello, whats the weather in San Francisco??", "role": "user" } ] }
-
Consistent Input/Output Format
- Call all models using the OpenAI format -
completion(model, messages)
- Text responses will always be available at
['choices'][0]['message']['content']
- Call all models using the OpenAI format -
-
Error Handling Using Model Fallbacks (if
GPT-4
fails, tryllama2
) -
Logging - Log Requests, Responses and Errors to
Supabase
,Posthog
,Mixpanel
,Sentry
,LLMonitor
,Traceloop
,Helicone
(Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ -
Token Usage & Spend - Track Input + Completion tokens used + Spend/model
-
Caching - Implementation of Semantic Caching
-
Streaming & Async Support - Return generators to stream text responses
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
This API endpoint accepts all inputs in raw JSON and expects the following inputs
model
(string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): eggpt-3.5-turbo
,gpt-4
,claude-2
,command-nightly
,stabilityai/stablecode-completion-alpha-3b-4k
messages
(array, required): A list of messages representing the conversation context. Each message should have arole
(system, user, assistant, or function),content
(message text), andname
(for function role).- Additional Optional parameters:
temperature
,functions
,function_call
,top_p
,n
,stream
. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
For claude-2
{
"model": "claude-2",
"messages": [
{
"content": "Hello, whats the weather in San Francisco??",
"role": "user"
}
]
}
import requests
import json
# TODO: use your URL
url = "http://localhost:5000/chat/completions"
payload = json.dumps({
"model": "gpt-3.5-turbo",
"messages": [
{
"content": "Hello, whats the weather in San Francisco??",
"role": "user"
}
]
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Responses from the server are given in the following format. All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
"role": "assistant"
}
}
],
"created": 1691790381,
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
"model": "gpt-3.5-turbo-0613",
"object": "chat.completion",
"usage": {
"completion_tokens": 41,
"prompt_tokens": 16,
"total_tokens": 57
}
}
- Clone liteLLM repository to your local machine:
git clone https://github.com/BerriAI/liteLLM-proxy
- Install the required dependencies using pip
pip install requirements.txt
- Set your LLM API keys
os.environ['OPENAI_API_KEY]` = "YOUR_API_KEY" or set OPENAI_API_KEY in your .env file
- Run the server:
python main.py
-
Quick Start: Deploy on Railway
-
GCP
,AWS
,Azure
This project includes aDockerfile
allowing you to build and deploy a Docker Project on your providers
- Our calendar 👋
- Community Discord 💭
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
- Support hosted db (e.g. Supabase)
- Easily send data to places like posthog and sentry.
- Add a hot-cache for project spend logs - enables fast checks for user + project limitings
- Implement user-based rate-limiting
- Spending controls per project - expose key creation endpoint
- Need to store a keys db -> mapping created keys to their alias (i.e. project name)
- Easily add new models as backups / as the entry-point (add this to the available model list)