Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

Open
samos123 opened this issue Nov 19, 2024 · 6 comments
Open

Proposal: Mount a PVC in ReadManyOnly mode as model storage #311

samos123 opened this issue Nov 19, 2024 · 6 comments

Comments

@samos123
Copy link
Contributor

samos123 commented Nov 19, 2024

Use case: Users have pre-provisioned PVs that contain models on them and support ReadManyOnly. The user would be responsible for ensuring a compatible model is stored on the PV and creating a PVC.

Example:

model.url: "pvc://$PVC_NAME/$PVC_MODEL_PATH"

The $PVC_MODEL_PATH will always be an absolute path startign with a /.

Model is stored in PV under /llama: model.url: "pvc://123f124/llama" or "pvc://123f124//llama"`

If the model is stored under the root directory in PV, then this would both be valid:

`model.url: "pvc://123f124/
`model.url: "pvc://123f124

The following would happen inside the model engine pod:

  1. Mount the PVC named $PVC_NAME on the $PV_PATH to engine container on path /model
  2. The engine would always load the model using /model

Open questions:

  1. Should we expose the mounting mode ReadManyOnly vs ReadWriteOnce?
    No this won't be needed since the user is responsible for creating the PVC. So the user will have control over this. The only issue is we would start encountering issues when we try to scale beyond 1 replica with ReadWriteOnce.

Example user flow:

  1. User creates a PVC using ReadManyOnly and binds this PVC to an existing PV that has the model loaded under /llama-3-8b. The PVC name is llama-3-8b
  2. User creates a Model and specifies model.url as pvc://llama-3-8b/llama-3-8b
  3. KubeAI: The model engine gets configured to mount the PVC

Step 1 PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: llama-3-8b
  labels:
    type: local
spec:
  storageClassName: robin
  resources:
    requests:
      storage: 10Gi
  accessModes:
    - ReadOnlyMany
  volumeName: name-of-pv-with-prepulated-llama-3-8gb

Step 3 pod spec:

        volumeMounts:
        - mountPath: "/model"
          subPath: llama-3-8b
          name: model-pvc
    volumes:
      - name: model-pvc
        persistentVolumeClaim:
          claimName: llama-3-8b
          readOnly: true

Why not have KubeAI manage the PVC and allow users to specify PV instead?

It allows the user to specify more attributes that may be relevant on the PVC. One example that wouldn't be as easy is figuring out the resource.requests.storage capacity to request for the PVC. So it may make more sense to have the user control that.

Taking a look at both GCS Fuse and Azure Blob. The only way to easily support both is to let the user supply a PVC. In the Azure case there seems to be no need to create a PV.

Azure Blob storage with PVC only

Let's take the use case for Azure Blob Storage. You may have a storage account samos123 and in that storage account we have 2 different storage containers: llama-3.1-8b and qwen70b. So the storage looks like this:

samos123
   - llama-3.1.8b: (config.json, safetensors..etc)
   - qwen70b: (config.json, safetensors..etc)

The AKS cluster is configured with storage account samos123

Then the user would only create a PVC and have no need to create a PV:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-blob-storage
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: azureblob-nfs-premium
  resources:
    requests:
      storage: 5Gi

In KubeAI the user would specify the folllowing url:
pvc://azure-blob-storage/llama-3.1.8b

GCS Fuse example

Assume I have a bucket named samos123 and in that bucket I have a directory llama-3-8b.

Create PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  storageClassName: example-storage-class
  mountOptions:
    - implicit-dirs
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: samos123
    volumeAttributes:
      gcsfuseLoggingSeverity: warning
  claimRef:
    name: gcs-fuse-csi-static-pvc
    namespace: NAMESPACE

Create PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc
  namespace: NAMESPACE
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: example-storage-class

Then in KubeAI, the URL would be: pvc://gcs-fuse-csi-static-pvc/llama-3-8b

Source: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#provision-static

@samos123 samos123 changed the title Proposal: Mount a PV in ReadManyOnly mode as model storage Proposal: Mount a PVC in ReadManyOnly mode as model storage Nov 20, 2024
@SatyKrish
Copy link

Supporting PVC can make kubeai agnostic to storage provider, for example existing volumes from azure files or azure blob can be used.

@samos123
Copy link
Contributor Author

Would you prefer passing a PVC or PV to the model spec? @SatyKrish please share your reasoning for preference as well.

@nstogner
Copy link
Contributor

One example that wouldn't be as easy is figuring out the resource.requests.storage capacity to request for the PVC.

The typical pattern here is to match what is on the PV:

spec.capacity.storage on the PersistentVolume manifest should match spec.resources.requests.storage on the PersistentVolumeClaim manifest. Since Cloud Storage buckets don't have size limits, you can put any number for capacity but it cannot be empty.

Src: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#provision-static

Then the user would only create a PVC and have no need to create a PV:

That is an example of dynamic provisioning. The PVC triggers the creation of a new bucket:

A persistent volume claim (PVC) uses the storage class object to dynamically provision an Azure Blob storage container.

Src: https://learn.microsoft.com/en-us/azure/aks/azure-csi-blob-storage-provision?tabs=mount-nfs%2Csecret#dynamically-provision-a-volume

In this case, the bucket would be empty unless you followed a process like:

1. Create PVC
2. Create Job with PVC to download
3. Wait for Job to succeed
4. Pass PVC to KubeAI

KubeAI already supports this flow naturally via the cache download functionality - and will soon support the specific bucket use case via a direct url to the bucket.

However, consider the following use case:

1. User has a preexisting NFS/EFS filesystem
2. User has downloaded a model to that filesystem outside of k8s
3. User wants to mount this filesystem inside of KubeAI

The way to represent that volume inside of k8s would be to create a PV with a reference to the preexisting NFS share.

I would recommend supporting pv:// and pvc://, with PV being higher priority as it covers use cases that are not already covered today.

@samos123 samos123 changed the title Proposal: Mount a PVC in ReadManyOnly mode as model storage Proposal: Mount a PV or PVC in ReadManyOnly mode as model storage Dec 4, 2024
@SatyKrish
Copy link

Would you prefer passing a PVC or PV to the model spec? @SatyKrish please share your reasoning for preference as well.

Apologies, couldn’t respond earlier (prod fire drill).

I’m currently loading large models from Disk, and small models from azure file share on AKS. PVC support will make it easier to migrate vLLM setup to KubeAI.

@samos123
Copy link
Contributor Author

samos123 commented Dec 6, 2024

Perfect, the PR is working for vLLM support to load models from PVC directly. Probably will be merged tomorrow.

samos123 added a commit that referenced this issue Dec 6, 2024
@samos123 samos123 changed the title Proposal: Mount a PV or PVC in ReadManyOnly mode as model storage Proposal: Mount a PVC in ReadManyOnly mode as model storage Dec 6, 2024
@samos123
Copy link
Contributor Author

samos123 commented Dec 9, 2024

Storing models on a PVC is now supported with vLLM. Please update your helm chart to v0.10.0 or later to try it out.

Other engines may happen later. Keeping this bug open until Ollama and Infinity engine is updated to add support as well.

@samos123 samos123 reopened this Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants