Skip to content

Commit

Permalink
Bug fixes and E2E tests for PVC storage (#340)
Browse files Browse the repository at this point in the history
Motivation

Address PVC follow-up work items outlined in #337 for PVC storage introduced in #267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves #337

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
  • Loading branch information
ckadner authored Mar 21, 2023
1 parent c666dfb commit 3f01857
Show file tree
Hide file tree
Showing 39 changed files with 1,405 additions and 446 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/run-fvt.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: FVTs
name: FVT

on:
pull_request:
Expand Down Expand Up @@ -71,15 +71,16 @@ jobs:
run: |
docker pull nvcr.io/nvidia/tritonserver:21.06.1-py3
docker pull seldonio/mlserver:0.5.2
docker pull openvino/model_server:2022.1
docker pull openvino/model_server:2022.2
# docker pull pytorch/torchserve:0.6.0-cpu
docker pull kserve/modelmesh-runtime-adapter
docker pull kserve/rest-proxy
docker pull kserve/modelmesh
- name: Check installation
run: |
docker images
kubectl get pods
kubectl get servingruntimes
kubectl get clusterservingruntimes
- name: Run FVTs
run: |
go install github.com/onsi/ginkgo/v2/ginkgo
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ test:

# Run fvt tests. This requires an etcd, kubernetes connection, and model serving installation. Ginkgo CLI is used to run them in parallel
fvt:
ginkgo -v -p -progress --fail-fast fvt/predictor fvt/scaleToZero --timeout=40m
ginkgo -v -procs=2 --progress --fail-fast fvt/predictor fvt/scaleToZero fvt/storage --timeout=50m

# Command to regenerate the grpc go files from the proto files
fvt-protoc:
Expand Down
117 changes: 117 additions & 0 deletions config/dependencies/fvt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,120 @@ stringData:
"default_bucket": "modelmesh-example-models",
"region": "us-south"
}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "models-pvc-1"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "models-pvc-2"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "models-pvc-3"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: batch/v1
kind: Job
metadata:
name: "pvc-init"
spec:
template:
metadata:
name: "pvc-init-pod"
spec:
restartPolicy: OnFailure
containers:
- name: "copy-pod"
image: kserve/modelmesh-minio-examples:latest
securityContext:
runAsUser: 0
allowPrivilegeEscalation: false
command: ["/bin/sh", "-ex", "-c"]
args:
- echo copy model files ...;
whoami;
ls -al "${SRC_FOLDER}";
cp -r "${SRC_FOLDER}"/* "${DST_FOLDER_1}" &&
cp -r "${SRC_FOLDER}"/* "${DST_FOLDER_2}" &&
cp -r "${SRC_FOLDER}"/* "${DST_FOLDER_3}" &&
ls -al "${DST_FOLDER_1}" &&
ls -al "${DST_FOLDER_2}" &&
ls -al "${DST_FOLDER_3}" &&
echo done &&
exit 0;
env:
- name: SRC_FOLDER
value: "/data1/modelmesh-example-models"
- name: DST_FOLDER_1
value: "/mnt/pvc1"
- name: DST_FOLDER_2
value: "/mnt/pvc2"
- name: DST_FOLDER_3
value: "/mnt/pvc3"
volumeMounts:
- name: "pvc1"
mountPath: "/mnt/pvc1"
- name: "pvc2"
mountPath: "/mnt/pvc2"
- name: "pvc3"
mountPath: "/mnt/pvc3"
volumes:
- name: "pvc1"
persistentVolumeClaim:
claimName: "models-pvc-1"
- name: "pvc2"
persistentVolumeClaim:
claimName: "models-pvc-2"
- name: "pvc3"
persistentVolumeClaim:
claimName: "models-pvc-3"
backoffLimit: 4
---
apiVersion: v1
kind: Pod
metadata:
name: "pvc-reader"
spec:
containers:
- name: main
image: ubuntu
command: ["/bin/sh", "-ec", "sleep 10000"]
volumeMounts:
- name: "pvc1"
mountPath: "/mnt/pvc1"
- name: "pvc2"
mountPath: "/mnt/pvc2"
- name: "pvc3"
mountPath: "/mnt/pvc3"
volumes:
- name: "pvc1"
persistentVolumeClaim:
claimName: "models-pvc-1"
- name: "pvc2"
persistentVolumeClaim:
claimName: "models-pvc-2"
- name: "pvc3"
persistentVolumeClaim:
claimName: "models-pvc-3"
14 changes: 13 additions & 1 deletion config/dependencies/minio-storage-secret.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Copyright 2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: v1

kind: Secret
metadata:
name: storage-config
Expand Down
1 change: 1 addition & 0 deletions config/rbac/cluster-scope/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ rules:
- ""
resources:
- endpoints
- persistentvolumeclaims
verbs:
- get
- list
Expand Down
1 change: 1 addition & 0 deletions config/rbac/namespace-scope/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ rules:
- ""
resources:
- endpoints
- persistentvolumeclaims
verbs:
- get
- list
Expand Down
5 changes: 2 additions & 3 deletions controllers/config/templating.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ package config
import (
"bytes"
"io"
"io/ioutil"
"os"
"text/template"

Expand All @@ -43,9 +42,9 @@ func PathTemplateSource(path string, context interface{}) mf.Source {
return templateSource(f, context)
}

//A templating manifest source
// A templating manifest source
func templateSource(r io.Reader, context interface{}) mf.Source {
b, err := ioutil.ReadAll(r)
b, err := io.ReadAll(r)
if err != nil {
panic(err)
}
Expand Down
2 changes: 1 addition & 1 deletion controllers/modelmesh/etcd.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ const (
kvStoreEnvVar = "KV_STORE"
)

//mimics base/patches/etcd.yaml
// mimics base/patches/etcd.yaml
func (m *Deployment) configureMMDeploymentForEtcdSecret(deployment *appsv1.Deployment) error {
EtcdSecretName := m.EtcdSecretName

Expand Down
1 change: 1 addition & 0 deletions controllers/modelmesh/runtime.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ func (m *Deployment) addVolumesToDeployment(deployment *appsv1.Deployment) error
}

// need to mount storage config for built-in adapters and the scenarios where StorageHelper is not disabled
// TODO: what about allowAnyPVC only use case without MinIO
if rts.BuiltInAdapter != nil || useStorageHelper(rts) {
storageVolume := corev1.Volume{
Name: ConfigStorageMount,
Expand Down
13 changes: 0 additions & 13 deletions controllers/service_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,6 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/*
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package controllers

Expand Down
Loading

0 comments on commit 3f01857

Please sign in to comment.