Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Canonical Logo #262

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Mike deploy

on:
push:
branches: [ main ]
branches: [ v0.9 ]

jobs:
build_docs:
Expand Down Expand Up @@ -41,4 +41,4 @@ jobs:

- name: Deploy with mike
run: |
mike deploy --push $(cat version.txt)
mike deploy --push --update-aliases $(cat version.txt) latest
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ git branch gh-pages origin/gh-pages
echo "Listing branches"
git branch

mike deploy $currentVersion
mike deploy --update-aliases $(cat version.txt) latest

git checkout gh-pages
3 changes: 2 additions & 1 deletion docs/admin/kubernetes_deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ Kubernetes version.
## Recommended Version Matrix
| Kubernetes Version | Recommended Istio Version |
| :---------- | :------------ |
| 1.20 | 1.9, 1.10, 1.11 |
| 1.21 | 1.10, 1.11 |
| 1.22 | 1.11, 1.12 |
| 1.23 | 1.12, 1.13 |
| 1.24 | 1.13, 1.14 |

## 1. Install Istio

Expand Down
3 changes: 2 additions & 1 deletion docs/admin/serverless.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ Kubernetes version.
## Recommended Version Matrix
| Kubernetes Version | Recommended Istio Version | Recommended Knative Version |
| :---------- | :------------ | :------------|
| 1.20 | 1.9, 1.10, 1.11 | 0.25, 0.26, 1.0 |
| 1.21 | 1.10, 1.11 | 0.25, 0.26, 1.0 |
| 1.22 | 1.11, 1.12 | 0.25, 0.26, 1.0 |
| 1.23 | 1.12, 1.13 | 1.0-1.4 |
| 1.24 | 1.13, 1.14 | 1.0-1.4 |

## 1. Install Istio
Please refer to the [Istio install guide](https://knative.dev/docs/admin/install/installing-istio).
Expand Down
152 changes: 152 additions & 0 deletions docs/blog/articles/2022-07-21-KServe-0.9-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Announcing: KServe v0.9.0

Today, we are pleased to announce the v0.9.0 release of KServe! [KServe](https://github.com/kserve) has now fully onboarded to [LF AI & Data Foundation](https://lfaidata.foundation) as an [Incubation Project](https://lfaidata.foundation/projects/kserve)!

In this release we are excited to introduce the new `InferenceGraph` feature which has long been asked from the community. Also continuing the effort from the last release for unifying the InferenceService API for deploying models on KServe and ModelMesh, ModelMesh is now fully compatible with KServe InferenceService API!


## Introduce InferenceGraph

The ML Inference system is getting bigger and more complex. It often consists of many models to make a single prediction.
The common use cases are image classification and natural language multi-stage processing pipelines. For example, an image classification pipeline needs to run top level classification first then downstream further classification based on previous prediction results.

KServe has the unique strength to build the distributed inference graph with its native integration of InferenceServices, standard inference protocol for chaining models and serverless auto-scaling capabilities. KServe leverages these strengths to build the InferenceGraph and enable users to deploy complex ML Inference pipelines to production in a declarative and scalable way.


**InferenceGraph** is made up of a list of routing nodes with each node consisting of a set of routing steps. Each step can either route to an InferenceService or another node defined on the graph which makes the InferenceGraph highly composable.
The graph router is deployed behind an HTTP endpoint and can be scaled dynamically based on request volume. The InferenceGraph supports four different types of routing nodes: **Sequence**, **Switch**, **Ensemble**, **Splitter**.

![InferenceGraph](../../modelserving/inference_graph/images/inference_graph.png)

- **Sequence Node**: It allows users to define multiple `Steps` with `InferenceServices` or `Nodes` as routing targets in a sequence. The `Steps` are executed in sequence and the request/response from the previous step and be passed to the next step as input based on configuration.
- **Switch Node**: It allows users to define routing conditions and select a `Step` to execute if it matches the condition. The response is returned as soon as it finds the first step that matches the condition. If no condition is matched, the graph returns the original request.
- **Ensemble Node**: A model ensemble requires scoring each model separately and then combines the results into a single prediction response. You can then use different combination methods to produce the final result. Multiple classification trees, for example, are commonly combined using a "majority vote" method. Multiple regression trees are often combined using various averaging techniques.
- **Splitter Node**: It allows users to split the traffic to multiple targets using a weighted distribution.

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "cat-dog-classifier"
spec:
predictor:
pytorch:
resources:
requests:
cpu: 100m
storageUri: gs://kfserving-examples/models/torchserve/cat_dog_classification
---
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "dog-breed-classifier"
spec:
predictor:
pytorch:
resources:
requests:
cpu: 100m
storageUri: gs://kfserving-examples/models/torchserve/dog_breed_classification
---
apiVersion: "serving.kserve.io/v1alpha1"
kind: "InferenceGraph"
metadata:
name: "dog-breed-pipeline"
spec:
nodes:
root:
routerType: Sequence
steps:
- serviceName: cat-dog-classifier
name: cat_dog_classifier # step name
- serviceName: dog-breed-classifier
name: dog_breed_classifier
data: $request
condition: "[@this].#(predictions.0==\"dog\")"
```

Currently `InferenceGraph` is supported with the `Serverless` deployment mode. You can try it out following the [tutorial](https://kserve.github.io/website/master/modelserving/inference_graph/image_pipeline/).


## InferenceService API for ModelMesh


The InferenceService CRD is now the primary interface for interacting with ModelMesh. Some changes were made to the InferenceService spec to better facilitate ModelMesh’s needs.

### Storage Spec

To unify how model storage is defined for both single and multi-model serving, a new storage spec was added to the predictor model spec. With this storage spec, users can specify a key inside a common secret holding config/credentials for each of the storage backends from which models can be loaded. Example:

```yaml
storage:
key: localMinIO # Credential key for the destination storage in the common secret
path: sklearn # Model path inside the bucket
# schemaPath: null # Optional schema files for payload schema
parameters: # Parameters to override the default values inside the common secret.
bucket: example-models
```
Learn more [here](https://github.com/kserve/kserve/tree/release-0.9/docs/samples/storage/storageSpec).



### Model Status

For further alignment between ModelMesh and KServe, some additions to the InferenceService status were made. There is now a `Model Status` section which contains information about the model loaded in the predictor. New fields include:

- `states` - State information of the predictor's model.
- `activeModelState` - The state of the model currently being served by the predictor's endpoints.
- `targetModelState` - This will be set only when `transitionStatus` is not `UpToDate`, meaning that the target model differs from the currently-active model.
- `transitionStatus` - Indicates state of the predictor relative to its current spec.
- `modelCopies` - Model copy information of the predictor's model.
- `lastFailureInfo` - Details about the most recent error associated with this predictor. Not all of the contained fields will necessarily have a value.

### Deploying on ModelMesh

For deploying InferenceServices on ModelMesh, the ModelMesh and KServe controllers will still require that the user specifies the `serving.kserve.io/deploymentMode: ModelMesh` annotation.
A complete example on an InferenceService with the new storage spec is showing below:

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-tensorflow-mnist
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: tensorflow
storage:
key: localMinIO
path: tensorflow/mnist.savedmodel
```

## Other New Features:

- Support [serving MLFlow model format](https://kserve.github.io/website/0.9/modelserving/v1beta1/mlflow/v2/) via MLServer serving runtime.
- Support [unified autoscaling target and metric fields](https://kserve.github.io/website/0.9/modelserving/autoscaling/autoscaling/) for InferenceService components with both Serverless and RawDeployment mode.
- Support [InferenceService ingress class and url domain template configuration](https://kserve.github.io/website/0.9/admin/kubernetes_deployment/) for RawDeployment mode.
- ModelMesh now has a default [OpenVINO Model Server](https://github.com/openvinotoolkit/model_server) ServingRuntime.


## What’s Changed?

- The KServe controller manager is changed from StatefulSet to Deployment to support HA mode.
- log4j security vulnerability fix
- Upgrade TorchServe serving runtime to 0.6.0
- Update MLServer serving runtime to 1.0.0

Check out the full release notes for [KServe](https://github.com/kserve/kserve/releases/tag/v0.9.0) and
[ModelMesh](https://github.com/kserve/modelmesh-serving/releases/tag/v0.9.0) for more details.

## Join the community

- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
- Join the Slack ([#kserve](https://kubeflow.slack.com/archives/CH6E58LNP))
- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!

Thank you for contributing or checking out KServe!

– The KServe Working Group
2 changes: 1 addition & 1 deletion docs/get_started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ The [Kubernetes CLI (`kubectl`)](https://kubernetes.io/docs/tasks/tools/install-
You can get started with a local deployment of KServe by using _KServe Quick installation script on Kind_:

```bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.9/hack/quick_install.sh" | bash
```

4 changes: 2 additions & 2 deletions docs/modelserving/v1beta1/custom/custom_model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ if __name__ == "__main__":
model = AlexNetModel("custom-model")
kserve.ModelServer().start([model])
```
The full code example can be found [here](https://github.com/kserve/kserve/tree/master/python/custom_model/model.py).
The full code example can be found [here](https://github.com/kserve/kserve/blob/release-0.9/python/custom_model/model.py).

## Build the custom image with Buildpacks
[Buildpacks](https://buildpacks.io/) allows you to transform your inference code into images that can be deployed on KServe without
Expand Down Expand Up @@ -75,7 +75,7 @@ class AlexNetModel(kserve.Model):
if __name__ == "__main__":
kserve.ModelServer().start({"custom-model": AlexNetModel})
```
The full code example can be found [here](https://github.com/kserve/kserve/tree/master/python/custom_model/model_remote.py).
The full code example can be found [here](https://github.com/kserve/kserve/blob/release-0.9/python/custom_model/model_remote.py).

Modify the `Procfile` to `web: python -m model_remote` and then run the above `pack` command, it builds the serving image which launches
each model as separate python worker and tornado webserver routes to the model workers by name.
Expand Down
2 changes: 1 addition & 1 deletion docs/modelserving/v1beta1/serving_runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ After models are deployed with InferenceService, you get all the following serve
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
| [Triton Inference Server](https://github.com/triton-inference-server/server) | [TensorFlow,TorchScript,ONNX](https://github.com/triton-inference-server/server/blob/r21.09/docs/model_repository.md)| v2 | :heavy_check_mark: | :heavy_check_mark: | [Compatibility Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)| [Torchscript cifar](triton/torchscript) |
| [TFServing](https://www.tensorflow.org/tfx/guide/serving) | [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) | v1 | :heavy_check_mark: | :heavy_check_mark: | [TFServing Versions](https://github.com/tensorflow/serving/releases) | [TensorFlow flower](./tensorflow) |
| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1/v2 REST | :heavy_check_mark: | :heavy_check_mark: | 0.5.3 | [TorchServe mnist](./torchserve) |
| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1/v2 REST | :heavy_check_mark: | :heavy_check_mark: | 0.6.0 | [TorchServe mnist](./torchserve) |
| [SKLearn MLServer](https://github.com/SeldonIO/MLServer) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 1.0.1 | [SKLearn Iris V2](./sklearn/v2) |
| [XGBoost MLServer](https://github.com/SeldonIO/MLServer) | [Saved Model](https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 1.5.0 | [XGBoost Iris V2](./xgboost) |
| [SKLearn ModelServer](https://github.com/kserve/kserve/tree/master/python/sklearnserver) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v1 | :heavy_check_mark: | -- | 1.0.1 | [SKLearn Iris](./sklearn/v1) |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ class ImageTransformer(kserve.Model):
return {"predictions": response.as_numpy("OUTPUT__0").tolist()}
```

Please see the code example [here](https://github.com/kserve/kserve/tree/release-0.8/python/custom_transformer).
Please see the code example [here](https://github.com/kserve/kserve/tree/release-0.9/python/custom_transformer).

### Transformer Server Entrypoint
For single model you just create a transformer object and register that to the model server.
Expand Down
117 changes: 109 additions & 8 deletions docs/modelserving/v1beta1/triton/torchscript/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,106 @@ Apply the gRPC `InferenceService` yaml and then you can call the model with `tri
kubectl apply -f torchscript_grpc.yaml
```


### Run a prediction with grpcurl

After the gRPC `InferenceService` becomes ready, [grpcurl](https://github.com/fullstorydev/grpcurl), can be used to send gRPC requests to the `InferenceService`.

```bash
# download the proto file
curl -O https://raw.githubusercontent.com/kserve/kserve/master/docs/predict-api/v2/grpc_predict_v2.proto

# download the input json file
curl -O https://raw.githubusercontent.com/kserve/website/triton-grpc/docs/modelserving/v1beta1/triton/torchscript/input-grpc.json

INPUT_PATH=input-grpc.json
PROTO_FILE=grpc_predict_v2.proto
SERVICE_HOSTNAME=$(kubectl get inferenceservice torchscript-cifar10 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```

The gRPC APIs follow the KServe [prediction V2 protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).

For example, `ServerReady` API can be used to check if the server is ready:

```bash
grpcurl \
-plaintext \
-proto ${PROTO_FILE} \
-H "Host: ${SERVICE_HOSTNAME}" \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ServerReady
```

Expected Output
```json
{
"ready": true
}
```

`ModelInfer` API takes input following the `ModelInferRequest` schema defined in the `grpc_predict_v2.proto` file. Notice that the input file differs from that used in the previous `curl` example.

```bash
grpcurl \
-vv \
-plaintext \
-proto ${PROTO_FILE} \
-H "Host: ${SERVICE_HOSTNAME}" \
-d @ \
${INGRESS_HOST}:${INGRESS_PORT} \
inference.GRPCInferenceService.ModelInfer \
<<< $(cat "$INPUT_PATH")
```

==** Expected Output **==

```
Resolved method descriptor:
// The ModelInfer API performs inference using the specified model. Errors are
// indicated by the google.rpc.Status returned for the request. The OK code
// indicates success and other codes indicate failure.
rpc ModelInfer ( .inference.ModelInferRequest ) returns ( .inference.ModelInferResponse );

Request metadata to send:
host: torchscript-cifar10.default.example.com

Response headers received:
accept-encoding: identity,gzip
content-type: application/grpc
date: Fri, 12 Aug 2022 01:49:53 GMT
grpc-accept-encoding: identity,deflate,gzip
server: istio-envoy
x-envoy-upstream-service-time: 16

Response contents:
{
"modelName": "cifar10",
"modelVersion": "1",
"outputs": [
{
"name": "OUTPUT__0",
"datatype": "FP32",
"shape": [
"1",
"10"
]
}
],
"rawOutputContents": [
"wCwGwOJLDL7icgK/dusyQAqAD799KP8/In2QP4zAs7+WuRk/2OoHwA=="
]
}

Response trailers received:
(empty)
Sent 1 request and received 1 response
```

The content of output tensor is encoded in `rawOutputContents` field. It can be `base64` decoded and loaded into a Numpy array with the given datatype and shape.

Alternatively, Triton also provides [Python client library](https://pypi.org/project/tritonclient/) which has many [examples](https://github.com/triton-inference-server/client/tree/main/src/python/examples) showing how to interact with the KServe V2 gPRC protocol.


## Add Transformer to the InferenceService

`Triton Inference Server` expects tensors as input data, often times a pre-processing step is required before making the prediction call
Expand All @@ -227,9 +327,10 @@ User is responsible to create a python class which extends from KServe `Model` b
format to tensor format according to V2 prediction protocol, `postprocess` handle is to convert raw prediction response to a more user friendly response.

### Implement pre/post processing functions
```python

```python title="image_transformer_v2.py"
import kserve
from typing import List, Dict
from typing import Dict
from PIL import Image
import torchvision.transforms as transforms
import logging
Expand All @@ -253,10 +354,11 @@ def image_transform(instance):
return res.tolist()


class ImageTransformer(kserve.Model):
def __init__(self, name: str, predictor_host: str):
class ImageTransformerV2(kserve.Model):
def __init__(self, name: str, predictor_host: str, protocol: str):
super().__init__(name)
self.predictor_host = predictor_host
self.protocol = protocol

def preprocess(self, inputs: Dict) -> Dict:
return {
Expand All @@ -271,11 +373,10 @@ class ImageTransformer(kserve.Model):
}

def postprocess(self, results: Dict) -> Dict:
# Here we reshape the data because triton always returns the flatten 1D array as json if not explicitly requesting binary
# since we are not using the triton python client library which takes care of the reshape it is up to user to reshape the returned tensor.
return {output["name"] : np.array(output["data"]).reshape(output["shape"]) for output in results["outputs"]}
return {output["name"]: np.array(output["data"]).reshape(output["shape"]).tolist()
for output in results["outputs"]}
```
Please find [the code example](https://github.com/kserve/kserve/tree/release-0.8/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.8/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).
Please find [the code example](https://github.com/kserve/kserve/tree/release-0.9/docs/samples/v1beta1/triton/torchscript/image_transformer_v2) and [Dockerfile](https://github.com/kserve/kserve/blob/release-0.9/docs/samples/v1beta1/triton/torchscript/transformer.Dockerfile).

### Build Transformer docker image
```
Expand Down
Loading