This example shows how a multitenant service can distribute requests evenly among multiple Azure OpenAI Service instances and manage tokens per minute (TPM) for multiple tenants.
kubernetes
grafana
prometheus
openai
grafana-dashboard
tpm
load-balancing
aks
azure-kubernetes-service
azure-openai
azure-openai-service
tokens-per-minute
-
Updated
Feb 26, 2024 - C#