diff --git a/docs/how-to/install-models.md b/docs/how-to/install-models.md index e1076346..f181e96d 100644 --- a/docs/how-to/install-models.md +++ b/docs/how-to/install-models.md @@ -54,10 +54,52 @@ kubectl explain models.spec kubectl explain models.spec.engine ``` +You can view all example manifests on the [GitHub repository](https://github.com/substratusai/kubeai/tree/main/manifests/models). + +Below are few examples using various engines and resource profiles. + +### Example Gemma 2 2B using Ollama on CPU + +```yaml +apiVersion: kubeai.org/v1 +kind: Model +metadata: + name: gemma2-2b-cpu +spec: + features: [TextGeneration] + url: ollama://gemma2:2b + engine: OLlama + resourceProfile: cpu:2 +``` + +### Example Llama 3.1 8B using vLLM on NVIDIA L4 GPU + +```yaml +apiVersion: kubeai.org/v1 +kind: Model +metadata: + name: llama-3.1-8b-instruct-fp8-l4 +spec: + features: [TextGeneration] + owner: neuralmagic + url: hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 + engine: VLLM + args: + - --max-model-len=16384 + - --max-num-batched-token=16384 + - --gpu-memory-utilization=0.9 + - --disable-log-requests + resourceProfile: nvidia-gpu-l4:1 +``` + ## Programmatically installing models See the [examples](https://github.com/substratusai/kubeai/tree/main/examples/k8s-api-clients). +## Calling a model + +You can inference a model by calling the KubeAI OpenAI compatible API. The model name should match the KubeAI model name. + ## Feedback welcome: A model management UI We are considering adding a UI for managing models in a running KubeAI instance. Give the [GitHub Issue](https://github.com/substratusai/kubeai/issues/148) a thumbs up if you would be interested in this feature. diff --git a/docs/installation/any.md b/docs/installation/any.md new file mode 100644 index 00000000..96555989 --- /dev/null +++ b/docs/installation/any.md @@ -0,0 +1,70 @@ +# Install on any Kubernetes Cluster + +KubeAI can be installed on any Kubernetes cluster and doesn't require GPUs. +If you do have GPUs, then KubeAI can take advantage of them. + +Please follow the Installation using GPUs section if you have GPUs available. + + +## Prerequisites + +1. Add the KubeAI helm repository. + +```bash +helm repo add kubeai https://www.kubeai.org +helm repo update +``` + +2. (Optional) Set the Hugging Face token as an environment variable. This is only required if you plan to use HuggingFace models that require authentication. + +```bash +export HF_TOKEN= +``` + +## Installation using only CPUs + +All engines supported in KubeAI also support running only on CPU resources. + +Install KubeAI using the pre-defined values file which defines CPU resourceProfiles: + +```bash +helm install kubeai kubeai/kubeai --wait \ + --set secrets.huggingface.token=$HF_TOKEN +``` + +Optionally, inspect the values file to see the default resourceProfiles: + +```bash +helm show values kubeai/kubeai > values.yaml +``` + +## Installation using GPUs + +This section assumes you have a Kubernetes cluster with GPU resources available and +installed the NVIDIA device plugin that adds GPU information labels to the nodes. + +This time we need to use a custom resource profiles that define the nodeSelectors +for different GPU types. + +Download the values file for the NVIDIA GPU operator: + +```bash +curl -L -O https://raw.githubusercontent.com/substratusai/kubeai/refs/heads/main/charts/kubeai/values-nvidia-k8s-device-plugin.yaml +``` + +You likely will not need to modify the `values-nvidia-k8s-device-plugin.yaml` file. +However, do inspect the file to ensure the GPU resourceProfile nodeSelectors match +the node labels on your nodes. + + +Install KubeAI using the custom resourceProfiles: +```bash +helm upgrade --install kubeai kubeai/kubeai \ + -f values-nvidia-k8s-device-plugin.yaml \ + --set secrets.huggingface.token=$HF_TOKEN \ + --wait +``` + +## Deploying models + +See the [How to install models guide](/how-to/installing-models.md) for instructions on deploying models and examples.