Skip to content

Introduction to the process of uploading up to 10,000 files to the Vector Store object in Azure OpenAI's Assistants API.

License

Notifications You must be signed in to change notification settings

LazaUK/AOAI-Assistants-VectorStore

Repository files navigation

Azure OpenAI Assistants API: Creating your first 10K Vector Store

Vector Store is a new object in Azure OpenAI (AOAI) Assistants API, that makes uploaded files searcheable by automatically parsing, chunking and embedding their content.

At the time of writing (October 2024), Vector Store was supporting the ingestion of up to 10,000 files.

Warning

Uploading thousands of files may fail due to timeouts or other API operation disruptions. Therefore, the upload process enforces two maximum file limits:

  • up to 100 files max, when creating a new Vector Store;
  • up to 500 files max per batch, when adding files to an existing Vector Store.

Table of contents:

Pre-requisites

  1. Upgrade openai Python package to its latest supported version:
pip install --upgrade openai
  1. Set the following 3 environment variables before running the notebooks:
Environment Variable Description
AZURE_OPENAI_API_BASE Base URL of the AOAI endpoint
AZURE_OPENAI_API_VERSION API version of the AOAI endpoint
AZURE_OPENAI_API_KEY API key of the AOAI endpoint (required for Scenario 1 only)

Scenario 1: Authenticating with API Key

  1. Retrieve values of environment variables:
AOAI_API_BASE = os.getenv("AZURE_OPENAI_API_BASE")
AOAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AOAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
  1. Instantiate Azure OpenAI client:
client = AzureOpenAI(
    azure_endpoint = AOAI_API_BASE,
    api_version = AOAI_API_VERSION,
    api_key = AOAI_API_KEY
)
  1. Instantiate new Vector Store:
vector_store = client.beta.vector_stores.create(
    name = "<VECTOR_STORE_NAME>"
)
  1. Populate the Vector Store with your files in batches:
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
    vector_store_id = vector_store.id,
    files = file_streams
)
  1. If successful, you should see a message like this:
Uploading files to the vector store from folder1...
Files upload status: completed
- cancelled: 0
- completed: 100
- failed: 0
- in progress: 0
----------------------------------------
Total: 100

Uploading files to the vector store from folder2...
Files upload status: completed
- cancelled: 0
- completed: 500
- failed: 0
- in progress: 0
----------------------------------------
Total: 500

Scenario 2: Authenticating with Entra ID

  1. Retrieve values of environment variables:
AOAI_API_BASE = os.getenv("AZURE_OPENAI_API_BASE")
AOAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
  1. Define Entra ID as a token provider:
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)
  1. Instantiate Azure OpenAI client:
client = AzureOpenAI(
    azure_endpoint = AOAI_API_BASE,
    api_version = AOAI_API_VERSION,
    azure_ad_token_provider = token_provider
)
  1. Instantiate new Vector Store:
vector_store = client.beta.vector_stores.create(
    name = "<VECTOR_STORE_NAME>"
)
  1. Populate the Vector Store with your files in batches:
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
    vector_store_id = vector_store.id,
    files = file_streams
)
  1. If successful, you should see a message like this:
Uploading files to the vector store from folder1...
Files upload status: completed
- cancelled: 0
- completed: 100
- failed: 0
- in progress: 0
----------------------------------------
Total: 100

Uploading files to the vector store from folder2...
Files upload status: completed
- cancelled: 0
- completed: 500
- failed: 0
- in progress: 0
----------------------------------------
Total: 500

About

Introduction to the process of uploading up to 10,000 files to the Vector Store object in Azure OpenAI's Assistants API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published