[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

Manouchehri · 2024-05-16T17:51:11Z

The Feature

Instead of returning the response to the user, upload the response quickly to a fast S3 bucket (like GCS or R2), and return a presigned URL to the client. This would only work for non-streaming responses.

This has been supported in OpenAI's python client since: openai/openai-python#1100. Following redirects with fetch in JavaScript is a default thing.

PoC:

from fastapi import FastAPI
from fastapi.responses import RedirectResponse

app = FastAPI()

@app.post("/chat/completions")
async def redirect_to_webhook():
    return RedirectResponse(url="https://webhook.site/removed-removed-removed-removed-removed", status_code=303)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="localhost", port=8000)

Then when using:

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import asyncio
import openai
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

client = openai.AsyncOpenAI(
    api_key="FAKE",
    base_url="http://localhost:8000",
)

async def main():
    response = await client.chat.completions.create(
        model="gemini-1.5-pro-preview-0409",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What’s in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                        }
                    }
                ]
            }
        ],
        temperature=0.0,
    )

    logger.info(response)


if __name__ == "__main__":
    asyncio.run(main())

Then this results in a GET request with these headers:

connection: close
x-stainless-async: async:asyncio
x-stainless-runtime-version: 3.11.9
x-stainless-runtime: CPython
x-stainless-arch: arm64
x-stainless-os: MacOS
x-stainless-package-version: 1.28.0
x-stainless-lang: python
user-agent: AsyncOpenAI/Python 1.28.0
content-type: application/json
accept: application/json
accept-encoding: gzip, deflate, br
host: webhook.site
content-length: 
Content-Type: application/json

Note to self, do not presigned the GET URL with the authorization header.

Motivation, pitch

For large responses:

This might reduce the load put on LiteLLM.
If there's a slow client, this would allow LiteLLM to not need to maintain a connection. e.g. making scaling with serverless platforms more efficient.

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

The text was updated successfully, but these errors were encountered:

Manouchehri · 2024-06-02T01:52:13Z

Not worth it.

Manouchehri added the enhancement New feature or request label May 16, 2024

Manouchehri self-assigned this May 16, 2024

Manouchehri closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

Manouchehri commented May 16, 2024

Manouchehri commented Jun 2, 2024

[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

Comments

Manouchehri commented May 16, 2024

The Feature

Motivation, pitch

Twitter / LinkedIn details

Manouchehri commented Jun 2, 2024