Skip to content

Commit

Permalink
Address comments
Browse files Browse the repository at this point in the history
  • Loading branch information
yuzisun committed Feb 18, 2023
1 parent bc3f27e commit 4bc260d
Showing 1 changed file with 12 additions and 9 deletions.
21 changes: 12 additions & 9 deletions docs/blog/articles/2023-02-05-KServe-0.10-release.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# Announcing: KServe v0.10.0

We are excited to announce KServe 0.10 release, in this release we have been focusing on enabling more KServe networking setup options,
improving metrics for supported serving runtimes and increasing support coverage for [Open(aka v2) inference protocol](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/) for both KServe and ModelMesh.
We are excited to announce KServe 0.10 release. In this release we have enabled more KServe networking options,
improved metrics instruments for supported serving runtimes and increased support coverage for [Open(aka v2) inference protocol](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/) for both standard and ModelMesh InferenceService.

## KServe Networking Options

Istio is now optional for both `Serverless` and `RawDeployment` mode, please see the [alternative networking guide](https://kserve.github.io/website/0.10/admin/serverless/kourier_networking/) for how you can enable other ingress options supported by Knative with Serverless mode.
For Istio users, if you want to turn on full service mesh mode to secure inference services with mutual TLS and enable the traffic policies, please read the [service mesh setup guideline](https://kserve.github.io/website/0.10/admin/serverless/servicemesh/).
Istio is now optional for both `Serverless` and `RawDeployment` mode. Please see the [alternative networking guide](https://kserve.github.io/website/0.10/admin/serverless/kourier_networking/) for how you can enable other ingress options supported by Knative with Serverless mode.
For Istio users, if you want to turn on full service mesh mode to secure InferenceService with mutual TLS and enable the traffic policies, please read the [service mesh setup guideline](https://kserve.github.io/website/0.10/admin/serverless/servicemesh/).

## KServe Telemetry for Serving Runtimes

We have instrumented additional latency metrics in KServe python serving runtimes for `preprocess`, `predict` and `postprocess` handlers.
In Serverless mode we have extended Knative queue-proxy to enable metrics aggregation for both metrics exposed in `queue-proxy` and `kserve-container`.
We have instrumented additional latency metrics in KServe Python ServingRuntimes for `preprocess`, `predict` and `postprocess` handlers.
In Serverless mode we have extended Knative `queue-proxy` to enable metrics aggregation for both metrics exposed in `queue-proxy` and `kserve-container` from each `ServingRuntime`.
Please read the [prometheus metrics setup guideline](https://kserve.github.io/website/0.10/modelserving/observability/prometheus_metrics/) for how to enable the metrics scraping and aggregations.

## Open(v2) Inference Protocol Support Coverage
Expand Down Expand Up @@ -57,8 +57,11 @@ class CustomTransformer(Model):
return infer_request
```

You can use the same Python API type `InferRequest` and `InferResponse` for both REST and gRPC protocol, KServe handles the underlying decoding and encoding according to the protocol.
New `headers` argument is added to the custom handlers to pass http/gRPC headers or metadata, you can also use this as context dict to pass data between handlers.
You can use the same Python API type `InferRequest` and `InferResponse` for both REST and gRPC protocol. KServe handles the underlying decoding and encoding according to the protocol.

!!! Warning
A new `headers` argument is added to the custom handlers to pass http/gRPC headers or other metadata. You can also use this as context dict to pass data between handlers.
If you have existing custom transformer or predictor, the `headers` argument is now required to add to the `preprocess`, `predict` and `postprocess` handlers.


Please check the following matrix for supported ServingRuntimes and ModelFormats.
Expand Down Expand Up @@ -92,7 +95,7 @@ for multiple architectures: `ppc64le`, `arm64`, `amd64`, `s390x`.
## ModelMesh updates
ModelMesh has continued to integrate itself as KServe's multi-model serving backend, introducing improvements and features that better align the two projects. For example, it now supports ClusterServingRuntimes, allowing use of cluster-scoped ServingRuntimes, originally introduced in KServe 0.8.

Additionally, ModelMesh has introduced support for TorchServe enabling users to serve arbitrary PyTorch models (e.g. eager-mode) in the context of distributed-multi-model serving.
Additionally, ModelMesh introduced support for TorchServe enabling users to serve arbitrary PyTorch models (e.g. eager-mode) in the context of distributed-multi-model serving.

Other limitations have been addressed as well, such as adding support for BYTES/string type tensors when using the REST inference API for inference requests that require them.

Expand Down

0 comments on commit 4bc260d

Please sign in to comment.