AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.
kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator
-
Updated
Dec 20, 2024 - Go