Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

samos123 · 2024-11-19T16:09:18Z

Use case: Users have pre-provisioned PVs that contain models on them and support ReadManyOnly. The user would be responsible for ensuring a compatible model is stored on the PV and creating a PVC.

Example:

model.url: "pvc://$PVC_NAME/$PVC_MODEL_PATH"

The $PVC_MODEL_PATH will always be an absolute path startign with a /.

Model is stored in PV under /llama: model.url: "pvc://123f124/llama" or "pvc://123f124//llama"`

If the model is stored under the root directory in PV, then this would both be valid:

`model.url: "pvc://123f124/
`model.url: "pvc://123f124

The following would happen inside the model engine pod:

Mount the PVC named $PVC_NAME on the $PV_PATH to engine container on path /model
The engine would always load the model using /model

Open questions:

Should we expose the mounting mode ReadManyOnly vs ReadWriteOnce?
No this won't be needed since the user is responsible for creating the PVC. So the user will have control over this. The only issue is we would start encountering issues when we try to scale beyond 1 replica with ReadWriteOnce.

Example user flow:

User creates a PVC using ReadManyOnly and binds this PVC to an existing PV that has the model loaded under /llama-3-8b. The PVC name is llama-3-8b
User creates a Model and specifies model.url as pvc://llama-3-8b/llama-3-8b
KubeAI: The model engine gets configured to mount the PVC

Step 1 PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: llama-3-8b
  labels:
    type: local
spec:
  storageClassName: robin
  resources:
    requests:
      storage: 10Gi
  accessModes:
    - ReadOnlyMany
  volumeName: name-of-pv-with-prepulated-llama-3-8gb

Step 3 pod spec:

        volumeMounts:
        - mountPath: "/model"
          subPath: llama-3-8b
          name: model-pvc
    volumes:
      - name: model-pvc
        persistentVolumeClaim:
          claimName: llama-3-8b
          readOnly: true

Why not have KubeAI manage the PVC and allow users to specify PV instead?

It allows the user to specify more attributes that may be relevant on the PVC. One example that wouldn't be as easy is figuring out the resource.requests.storage capacity to request for the PVC. So it may make more sense to have the user control that.

Taking a look at both GCS Fuse and Azure Blob. The only way to easily support both is to let the user supply a PVC. In the Azure case there seems to be no need to create a PV.

Azure Blob storage with PVC only

Let's take the use case for Azure Blob Storage. You may have a storage account samos123 and in that storage account we have 2 different storage containers: llama-3.1-8b and qwen70b. So the storage looks like this:

samos123
   - llama-3.1.8b: (config.json, safetensors..etc)
   - qwen70b: (config.json, safetensors..etc)

The AKS cluster is configured with storage account samos123

Then the user would only create a PVC and have no need to create a PV:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-blob-storage
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: azureblob-nfs-premium
  resources:
    requests:
      storage: 5Gi

In KubeAI the user would specify the folllowing url:
pvc://azure-blob-storage/llama-3.1.8b

GCS Fuse example

Assume I have a bucket named samos123 and in that bucket I have a directory llama-3-8b.

Create PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  storageClassName: example-storage-class
  mountOptions:
    - implicit-dirs
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: samos123
    volumeAttributes:
      gcsfuseLoggingSeverity: warning
  claimRef:
    name: gcs-fuse-csi-static-pvc
    namespace: NAMESPACE

Create PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc
  namespace: NAMESPACE
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: example-storage-class

Then in KubeAI, the URL would be: pvc://gcs-fuse-csi-static-pvc/llama-3-8b

Source: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#provision-static

The text was updated successfully, but these errors were encountered:

SatyKrish · 2024-11-26T00:39:38Z

Supporting PVC can make kubeai agnostic to storage provider, for example existing volumes from azure files or azure blob can be used.

samos123 · 2024-11-26T01:31:46Z

Would you prefer passing a PVC or PV to the model spec? @SatyKrish please share your reasoning for preference as well.

nstogner · 2024-11-27T12:49:23Z

One example that wouldn't be as easy is figuring out the resource.requests.storage capacity to request for the PVC.

The typical pattern here is to match what is on the PV:

spec.capacity.storage on the PersistentVolume manifest should match spec.resources.requests.storage on the PersistentVolumeClaim manifest. Since Cloud Storage buckets don't have size limits, you can put any number for capacity but it cannot be empty.

Src: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#provision-static

Then the user would only create a PVC and have no need to create a PV:

That is an example of dynamic provisioning. The PVC triggers the creation of a new bucket:

A persistent volume claim (PVC) uses the storage class object to dynamically provision an Azure Blob storage container.

Src: https://learn.microsoft.com/en-us/azure/aks/azure-csi-blob-storage-provision?tabs=mount-nfs%2Csecret#dynamically-provision-a-volume

In this case, the bucket would be empty unless you followed a process like:

1. Create PVC
2. Create Job with PVC to download
3. Wait for Job to succeed
4. Pass PVC to KubeAI

KubeAI already supports this flow naturally via the cache download functionality - and will soon support the specific bucket use case via a direct url to the bucket.

However, consider the following use case:

1. User has a preexisting NFS/EFS filesystem
2. User has downloaded a model to that filesystem outside of k8s
3. User wants to mount this filesystem inside of KubeAI

The way to represent that volume inside of k8s would be to create a PV with a reference to the preexisting NFS share.

I would recommend supporting pv:// and pvc://, with PV being higher priority as it covers use cases that are not already covered today.

SatyKrish · 2024-12-06T03:18:55Z

Would you prefer passing a PVC or PV to the model spec? @SatyKrish please share your reasoning for preference as well.

Apologies, couldn’t respond earlier (prod fire drill).

I’m currently loading large models from Disk, and small models from azure file share on AKS. PVC support will make it easier to migrate vLLM setup to KubeAI.

samos123 · 2024-12-06T05:57:16Z

Perfect, the PR is working for vLLM support to load models from PVC directly. Probably will be merged tomorrow.

Implements #311 Fixes #303

samos123 · 2024-12-09T06:04:52Z

Storing models on a PVC is now supported with vLLM. Please update your helm chart to v0.10.0 or later to try it out.

Other engines may happen later. Keeping this bug open until Ollama and Infinity engine is updated to add support as well.

samos123 mentioned this issue Nov 19, 2024

how to access model files by pvc? #303

Closed

samos123 changed the title ~~Proposal: Mount a PV in ReadManyOnly mode as model storage~~ Proposal: Mount a PVC in ReadManyOnly mode as model storage Nov 20, 2024

samos123 changed the title ~~Proposal: Mount a PVC in ReadManyOnly mode as model storage~~ Proposal: Mount a PV or PVC in ReadManyOnly mode as model storage Dec 4, 2024

samos123 mentioned this issue Dec 5, 2024

vLLM: Add support for loading models from PVC #339

Merged

samos123 added a commit that referenced this issue Dec 6, 2024

vLLM: Add support for loading models from PVC (#339)

f4a7a16

Implements #311 Fixes #303

samos123 changed the title ~~Proposal: Mount a PV or PVC in ReadManyOnly mode as model storage~~ Proposal: Mount a PVC in ReadManyOnly mode as model storage Dec 6, 2024

samos123 closed this as completed Dec 23, 2024

samos123 reopened this Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

samos123 commented Nov 19, 2024 •

edited

Loading

SatyKrish commented Nov 26, 2024

samos123 commented Nov 26, 2024

nstogner commented Nov 27, 2024

SatyKrish commented Dec 6, 2024

samos123 commented Dec 6, 2024

samos123 commented Dec 9, 2024 •

edited

Loading

Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

Comments

samos123 commented Nov 19, 2024 • edited Loading

Why not have KubeAI manage the PVC and allow users to specify PV instead?

Azure Blob storage with PVC only

GCS Fuse example

SatyKrish commented Nov 26, 2024

samos123 commented Nov 26, 2024

nstogner commented Nov 27, 2024

SatyKrish commented Dec 6, 2024

samos123 commented Dec 6, 2024

samos123 commented Dec 9, 2024 • edited Loading

samos123 commented Nov 19, 2024 •

edited

Loading

samos123 commented Dec 9, 2024 •

edited

Loading