Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

Closed
Manouchehri opened this issue May 16, 2024 · 1 comment
Closed

[Feature]: S3 Scratch Bucket + 303 Redirect for responses #3684

Manouchehri opened this issue May 16, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Manouchehri
Copy link
Collaborator

The Feature

Instead of returning the response to the user, upload the response quickly to a fast S3 bucket (like GCS or R2), and return a presigned URL to the client. This would only work for non-streaming responses.

This has been supported in OpenAI's python client since: openai/openai-python#1100. Following redirects with fetch in JavaScript is a default thing.

PoC:

from fastapi import FastAPI
from fastapi.responses import RedirectResponse

app = FastAPI()

@app.post("/chat/completions")
async def redirect_to_webhook():
    return RedirectResponse(url="https://webhook.site/removed-removed-removed-removed-removed", status_code=303)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="localhost", port=8000)

Then when using:

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import asyncio
import openai
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

client = openai.AsyncOpenAI(
    api_key="FAKE",
    base_url="http://localhost:8000",
)

async def main():
    response = await client.chat.completions.create(
        model="gemini-1.5-pro-preview-0409",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What’s in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                        }
                    }
                ]
            }
        ],
        temperature=0.0,
    )

    logger.info(response)


if __name__ == "__main__":
    asyncio.run(main())

Then this results in a GET request with these headers:

connection: close
x-stainless-async: async:asyncio
x-stainless-runtime-version: 3.11.9
x-stainless-runtime: CPython
x-stainless-arch: arm64
x-stainless-os: MacOS
x-stainless-package-version: 1.28.0
x-stainless-lang: python
user-agent: AsyncOpenAI/Python 1.28.0
content-type: application/json
accept: application/json
accept-encoding: gzip, deflate, br
host: webhook.site
content-length: 
Content-Type: application/json

Note to self, do not presigned the GET URL with the authorization header.

Motivation, pitch

For large responses:

  1. This might reduce the load put on LiteLLM.
  2. If there's a slow client, this would allow LiteLLM to not need to maintain a connection. e.g. making scaling with serverless platforms more efficient.

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

@Manouchehri Manouchehri added the enhancement New feature or request label May 16, 2024
@Manouchehri Manouchehri self-assigned this May 16, 2024
@Manouchehri
Copy link
Collaborator Author

Not worth it.

@Manouchehri Manouchehri closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant