Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Configure Text Generation Models guide #313

Merged
merged 13 commits into from
Nov 27, 2024
160 changes: 160 additions & 0 deletions docs/how-to/configure-text-generation-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Configure Text Generation Models

KubeAI supports the following engines for text generation models (LLMs, VLMs, ..):

- vLLM (Recommended for GPU)
- Ollama (Recommended for CPU)
- Need something else? Please file an issue on [GitHub](https://github.com/substratusai/kubeai).

There are 2 ways to install a text generation model in KubeAI:
- Use Helm with the `kubeai/models` chart.
- Use `kubectl apply -f model.yaml` to install a Model CRD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/CRD/Custom Resource - CRD is the definition of a CR


KubeAI comes with pre-validated and optimized Model configurations for popular text generation models. These models are available in the `kubeai/models` Helm chart and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer links to GitHub directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

are also published as raw manifests in the `manifests/model` directory.

You can also easily define your own models using the Model CRD directly or by using the `kubeai/models` Helm chart.

## Install a Text Generation Model using Helm

KubeAI provides a `kubeai/models` chart that contains the pre-configured models.
Copy link
Contributor

@nstogner nstogner Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already stated above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


You can take a look at all the pre-configured models in the chart's default values file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link would be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


```bash
helm show values kubeai/models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would need to have loaded the chart first right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a given if you're trying to configure models after you've installed KubeAI. Instlaling KubeAI requires adding the KubeAI helm repo.

```

### Install Text Generation Model using CPU

Enable the `gemma2-2b-cpu` model using the Helm chart:

```bash
helm upgrade --install --reuse-values kubeai-models kubeai/models -f - <<EOF
catalog:
gemma2-2b-cpu:
enabled: true
engine: OLlama
resourceProfile: cpu:2
minReplicas: 1 # by default this is 0
EOF
```

### Install Text Generation Model using L4 GPU

Enable the Llama 3.1 8B model using the Helm chart:

```bash
helm upgrade --install --reuse-values kubeai-models kubeai/models -f - <<EOF
catalog:
llama-3.1-8b-instruct-fp8-l4:
enabled: true
engine: VLLM
resourceProfile: nvidia-gpu-l4:1
minReplicas: 1 # by default this is 0
EOF
```

## Install a Text Generation Model using kubectl
You can use the Model CRD directly to install a model using `kubectl apply -f model.yaml`.

### Install Text Generation Model using CPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to duplicate the steps for CPU... I think we could just point to an example model after defining steps for install models above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user I prefer to have the full examples in the doc. We can link to more examples in the repo but I want to at least show CPU and GPU for both helm and non-helm flow.


Apply the following Model CRD to install the Gemma 2 2B model using Ollama on CPU:
```yaml
apiVersion: kubeai.org/v1
kind: Model
metadata:
name: gemma2-2b-cpu
spec:
features: [TextGeneration]
url: ollama://gemma2:2b
engine: OLlama
resourceProfile: cpu:2
```

### Install Text Generation Model using L4 GPU

Apply the following Model CRD to install the Llama 3.1 8B model using vLLM on L4 GPU:
```yaml
apiVersion: kubeai.org/v1
kind: Model
metadata:
name: llama-3.1-8b-instruct-fp8-l4
spec:
features: [TextGeneration]
url: hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
engine: VLLM
args:
- --max-model-len=16384
- --max-num-batched-token=16384
- --gpu-memory-utilization=0.9
- --disable-log-requests
resourceProfile: nvidia-gpu-l4:1
```

## Interact with the Text Generation Model
The KubeAI service exposes an OpenAI compatible API that you can use to query the available models and interact with them.

The KubeAI service is available at `http://kubeai/openai/v1` within the Kubernetes cluster.

You can also port-forward the KubeAI service to your local machine to interact with the models:

```bash
kubectl port-forward svc/kubeai 8000:80
```

You can now query the available models using curl:

```bash
curl http://localhost:8000/openai/v1/models
```

### Using curl to interact with the model

Run the following curl command to interact with the model named `llama-3.1-8b-instruct-fp8-l4`:
```bash
curl "http://localhost:8000/openai/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b-instruct-fp8-l4",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Write a haiku about recursion in programming."
}
]
}'
```

### Using the OpenAI Python SDK to interact with the model
Once the pod is ready, you can use the OpenAI Python SDK to interact with the model:
All OpenAI SDKs work with KubeAI since the KubeAI service is OpenAI API compatible.

See the below example code to interact with the model using the OpenAI Python SDK:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alt: we could just link to different examples to keep docs easier to maintain. This would have the added benefit of directing users to the repo where they could add stars, issues, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should embed code from the repo in docs in the future, but I do think from user pov that having it in the same doc is preferred.

```python
from openai import OpenAI
# Assumes port-forward of kubeai service to localhost:8000.
kubeai_endpoint = "http://localhost:8000/openai/v1"
model_name = "llama-3.1-8b-instruct-fp8-l4"

# If you are running in a Kubernetes cluster, you can use the kubeai service endpoint.
if os.getenv("KUBERNETES_SERVICE_HOST"):
kubeai_endpoint = "http://kubeai/openai/v1"

client = OpenAI(api_key="ignored", base_url=kubeai_endpoint)

chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
model=model_name,
)
```
6 changes: 5 additions & 1 deletion docs/installation/any.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,8 @@ helm upgrade --install kubeai kubeai/kubeai \

## Deploying models

See the [How to install models guide](/how-to/installing-models.md) for instructions on deploying models and examples.
Take a look at the following how-to guides to deploy models:
* [Configure Text Generation Models](../how-to/configure-text-generation-models.md)
* [Configure Embedding Models](../how-to/configure-embedding-models.md)
* [Configure Speech to Text Models](../how-to/configure-speech-to-text.md)

18 changes: 5 additions & 13 deletions docs/installation/eks.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,17 +145,9 @@ helm upgrade --install kubeai kubeai/kubeai \
--wait
```

## 3. Optionally configure models
## 4. Deploying models

Optionally install preconfigured models.

```bash
cat <<EOF > kubeai-models.yaml
catalog:
llama-3.1-8b-instruct-fp8-l4:
enabled: true
EOF

helm install kubeai-models kubeai/models \
-f ./kubeai-models.yaml
```
Take a look at the following how-to guides to deploy models:
* [Configure Text Generation Models](../how-to/configure-text-generation-models.md)
* [Configure Embedding Models](../how-to/configure-embedding-models.md)
* [Configure Speech to Text Models](../how-to/configure-speech-to-text.md)
18 changes: 5 additions & 13 deletions docs/installation/gke.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,9 @@ helm upgrade --install kubeai kubeai/kubeai \
--wait
```

## 3. Optionally configure models
## 4. Deploying models

Optionally install preconfigured models.

```bash
cat <<EOF > kubeai-models.yaml
catalog:
llama-3.1-8b-instruct-fp8-l4:
enabled: true
EOF

helm install kubeai-models kubeai/models \
-f ./kubeai-models.yaml
```
Take a look at the following how-to guides to deploy models:
* [Configure Text Generation Models](../how-to/configure-text-generation-models.md)
* [Configure Embedding Models](../how-to/configure-embedding-models.md)
* [Configure Speech to Text Models](../how-to/configure-speech-to-text.md)
Loading