Releases: kserve/kserve
Releases · kserve/kserve
v0.14.0
What's Changed
- Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
- Extract openai predict logic into smaller methods by @grandbora in #3716
- Bump MLServer to 1.5.0 by @sivanantha321 in #3740
- Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
- inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
- Propagate
trust_remote_code
flag throughout vLLM startup by @calwoo in #3729 - Fix dead links on PyPI by @kevinbazira in #3754
- Fix model is ready even if there is no model by @HAO2167 in #3275
- Fix No model ready error in multi model serving by @sivanantha321 in #3758
- Initial implementation of Inference client by @sivanantha321 in #3401
- Fix logprobs for vLLM by @sivanantha321 in #3738
- Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
- pillow - Buffer Overflow by @spolti in #3598
- Use add_generation_prompt while creating chat template by @Datta0 in #3775
- Deduplicate the names for the additional domain names by @houshengbo in #3773
- Make Virtual Service case-insensitive by @andyi2it in #3779
- Install packages needed for vllm model load by @gavrissh in #3802
- Make gRPC max message length configurable by @sivanantha321 in #3741
- Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
- Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
- Increase timeout to make unit test stable by @Jooho in #3808
- Upgrade CI deps by @sivanantha321 in #3822
- Add tests for vLLM by @sivanantha321 in #3771
- Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
- Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
- Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
- Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
- Make ray an optional dependency by @sivanantha321 in #3834
- Update aif example by @spolti in #3765
- Use helm for quick installation by @sivanantha321 in #3813
- Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
- Add support for Azure DNS zone endpoints by @tjandy98 in #3819
- Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
- Add logging request feature for vLLM backend by @sivanantha321 in #3849
- Bump vLLM to 0.5.4 by @sivanantha321 in #3874
- Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
- Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
- Update KServe 2024-2025 Roadmap by @yuzisun in #3810
- Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
- Fix issue with rolling update behavior by @andyi2it in #3786
- Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
- Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
- Protobuf version upgrade 4.25.4 by @andyi2it in #3881
- Adds optional labels and annotations to the controller by @guitouni in #3366
- Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
- bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
- support text embedding task in hugging face server by @kevinmingtarja in #3743
- Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
- [Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
- Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
- adding metadata on requests by @gcemaj in #3635
- Publish 0.14.0-rc0 release by @yuzisun in #3867
- Use API token for publishing package to PyPI by @sivanantha321 in #3896
- Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
- Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
- Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
- bump to vllm 0.5.5 by @lizzzcai in #3911
- pin gosec to 2.20.0 by @greenmoon55 in #3921
- add a new doc 'common issues and solutions' by @Jooho in #3878
- Implement health endpoint for vLLM backend by @sivanantha321 in #3850
- Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
- Bump Go to 1.22 by @sivanantha321 in #3912
- bump to vllm 0.6.0 by @hustxiayang in #3934
- Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
- mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
- Fix permission error in snyk scan by @sivanantha321 in #3889
- Cluster Local Model CR by @greenmoon55 in #3839
- added http headers to inbound request by @andyi2it in #3895
- Add prow-github-action by @sivanantha321 in #3888
- Add TLS support for Inference Loggers by @ruivieira in #3863
- Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
- Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
- Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
- remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
- update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
- bump to vLLM0.6.1post2 by @hustxiayang in #3948
- Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
- add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
- Implement Huggingface model download in storage initializer by @andyi2it in #3584
- Update OWNERS file by @yuzisun in #3966
- Cluster local model controller by @greenmoon55 in #3860
- Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
- add a new API for multi-node/multi-gpu by @Jooho in #3871
- Fix update-openapigen.sh that can be executed from kserve dir by @Jooho in #3924
- Add python 3.12 support and remove python 3.8 support by @sivanantha321 in #3645
- Fix openssl vulnerability CWE-1395 by @sivanantha321 in #3975
- Fix Kubernetes Doc Links by @jyono in #3670
- Fix kserve local testing env by @yuzisun in #3981
- Fix streaming response not working properly with logger by @sivanantha321 in #3847
- Add a flag for automount serviceaccount token by @greenmoon55 in https://github.com/kserve/ks...
v0.14.0-rc1
What's Changed
- Publish 0.14.0-rc0 release by @yuzisun in #3867
- Use API token for publishing package to PyPI by @sivanantha321 in #3896
- Fix sdlc broken when kserve installed using helm by @sivanantha321 in #3890
- Add Security Context and Resources to RBAC Proxy by @HotsauceLee in #3898
- Remove unwanted cluster scope secret permissions by @sivanantha321 in #3893
- bump to vllm 0.5.5 by @lizzzcai in #3911
- pin gosec to 2.20.0 by @greenmoon55 in #3921
- add a new doc 'common issues and solutions' by @Jooho in #3878
- Implement health endpoint for vLLM backend by @sivanantha321 in #3850
- Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in #3917
- Bump Go to 1.22 by @sivanantha321 in #3912
- bump to vllm 0.6.0 by @hustxiayang in #3934
- Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in #3885
- mount /dev/shm volume to huggingfaceserver by @lizzzcai in #3910
- Fix permission error in snyk scan by @sivanantha321 in #3889
- Cluster Local Model CR by @greenmoon55 in #3839
- added http headers to inbound request by @andyi2it in #3895
- Add prow-github-action by @sivanantha321 in #3888
- Add TLS support for Inference Loggers by @ruivieira in #3863
- Fix explainer endpoint not working with path based routing by @sivanantha321 in #3257
- Fix ingress configuration for path based routing and update go mod by @sivanantha321 in #3944
- Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in #3943
- remove conversion wehbook part from self-signed-ca.sh by @Jooho in #3941
- update fluid kserve sample to use huggingface servingruntime by @lizzzcai in #3907
- bump to vLLM0.6.1post2 by @hustxiayang in #3948
- Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in #3955
- add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in #3954
- Implement Huggingface model download in storage initializer by @andyi2it in #3584
- Update OWNERS file by @yuzisun in #3966
- Cluster local model controller by @greenmoon55 in #3860
- Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in #3970
New Contributors
- @HotsauceLee made their first contribution in #3898
- @hustxiayang made their first contribution in #3934
- @hdefazio made their first contribution in #3885
- @ruivieira made their first contribution in #3863
- @gfkeith made their first contribution in #3954
Full Changelog: v0.14.0-rc0...v0.14.0-rc1
v0.14.0-rc0
What's Changed
- Prevent the PassthroughCluster for clients/workloads in the service mesh by @israel-hdez in #3711
- Extract openai predict logic into smaller methods by @grandbora in #3716
- Bump MLServer to 1.5.0 by @sivanantha321 in #3740
- Refactor storage initializer to log model download time for all storage types by @sivanantha321 in #3735
- inferenceservice controller: fix error check in Serverless mode by @dtrifiro in #3753
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in #3723
- Propagate
trust_remote_code
flag throughout vLLM startup by @calwoo in #3729 - Fix dead links on PyPI by @kevinbazira in #3754
- Fix model is ready even if there is no model by @HAO2167 in #3275
- Fix No model ready error in multi model serving by @sivanantha321 in #3758
- Initial implementation of Inference client by @sivanantha321 in #3401
- Fix logprobs for vLLM by @sivanantha321 in #3738
- Fix model name not properly parsed by inference graph by @sivanantha321 in #3746
- pillow - Buffer Overflow by @spolti in #3598
- Use add_generation_prompt while creating chat template by @Datta0 in #3775
- Deduplicate the names for the additional domain names by @houshengbo in #3773
- Make Virtual Service case-insensitive by @andyi2it in #3779
- Install packages needed for vllm model load by @gavrissh in #3802
- Make gRPC max message length configurable by @sivanantha321 in #3741
- Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in #3789
- Several bug fixes for vLLM completion endpoint by @sivanantha321 in #3788
- Increase timeout to make unit test stable by @Jooho in #3808
- Upgrade CI deps by @sivanantha321 in #3822
- Add tests for vLLM by @sivanantha321 in #3771
- Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in #3812
- Bump vLLM to 0.5.3.post1 by @sivanantha321 in #3828
- Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in #3757
- Update golang for docs/Dockerfile to 1.21 by @spolti in #3761
- Make ray an optional dependency by @sivanantha321 in #3834
- Update aif example by @spolti in #3765
- Use helm for quick installation by @sivanantha321 in #3813
- Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in #3737
- Add support for Azure DNS zone endpoints by @tjandy98 in #3819
- Fix failed build for knativeLocalGatewayService by @yuzisun in #3866
- Add logging request feature for vLLM backend by @sivanantha321 in #3849
- Bump vLLM to 0.5.4 by @sivanantha321 in #3874
- Fix: Add workaround for snyk image scan failure by @sivanantha321 in #3880
- Fix trust_remote_code not working with huggingface backend by @sivanantha321 in #3879
- Update KServe 2024-2025 Roadmap by @yuzisun in #3810
- Configurable image pull secrets in Helm charts by @saileshd1402 in #3838
- Fix issue with rolling update behavior by @andyi2it in #3786
- Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in #3886
- Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in #3685
- Protobuf version upgrade 4.25.4 by @andyi2it in #3881
- Adds optional labels and annotations to the controller by @guitouni in #3366
- Enable Server-Side Apply for Kustomize Overlays in Test Environment by @Jooho in #3877
- bufix: update image_transformer.py to handle changes in input structure by @zwong91 in #3830
- support text embedding task in hugging face server by @kevinmingtarja in #3743
- Rename max_length parameter to max_model_len to be in sync with vLLM by @Datta0 in #3827
- [Upstream] - Update-istio version based on go version 1.21 by @mholder6 in #3825
- Enrich isvc NotReady events for failed conditions by @asdqwe123zxc in #3303
- adding metadata on requests by @gcemaj in #3635
New Contributors
- @calwoo made their first contribution in #3729
- @guitouni made their first contribution in #3366
- @zwong91 made their first contribution in #3830
- @mholder6 made their first contribution in #3825
- @asdqwe123zxc made their first contribution in #3303
- @gcemaj made their first contribution in #3635
Full Changelog: v0.13.0...v0.14.0-rc0
v0.13.1
What's Changed
- Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 (#3723)
- Propagate trust_remote_code flag throughout vLLM startup by @calwoo (#3729)
- Use add_generation_prompt while creating chat template by @Datta0 (#3775)
- Fix logprobs for vLLM by @sivanantha321 (#3738)
- Install packages needed for vllm model load by @gavrissh (#3802)
- Publish 0.13.1 Release by @johnugeorge in #3824
Full Changelog: v0.13.0...v0.13.1
v0.13.0
🌈 What's New?
- add support for async streaming in predict by @alexagriffith in #3475
- Fix: Support model parallelism in HF transformer by @gavrishp in #3459
- Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
- OpenAI schema by @tessapham in #3477
- Support OpenAIModel in ModelRepository by @grandbora in #3590
- updated xgboost to support json and ubj models by @andyi2it in #3551
- Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
- VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
- Add a user friendly error message for http exceptions by @grandbora in #3581
- feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
- set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
- Enabled the multiple domains support on an inference service by @houshengbo in #3615
- Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
- Add headers to predictor exception logging by @grandbora in #3658
- Enhance controller setup based on available CRDs by @israel-hdez in #3472
- Add openai models endpoint by @cmaddalozzo in #3666
- feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
- Enable dtype support for huggingface server by @Datta0 in #3613
- Add method for checking model health/readiness by @cmaddalozzo in #3673
- Unify the log configuration using kserve logger by @sivanantha321 in #3577
- Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
- Add FP16 datatype support for OIP grpc by @sivanantha321 in #3695
- Add option for returning probabilities in huggingface server by @andyi2it in #3607
⚠️ What's Changed
- Remove conversion webhook from manifests by @Jooho in #3476
- Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
- chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
- docs: Move Alibi explainer to docs by @terrytangyuan in #3579
- Remove generate endpoints by @cmaddalozzo in #3654
- Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
🐛 What's Fixed
- Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
- fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
- Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
- Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
- Make the modelcar injection idempotent by @rhuss in #3517
- Only pad left for decode-only architecture models. by @sivanantha321 in #3534
- fix lint typo on Makefile by @spolti in #3569
- fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
- Fix model unload in server stop method by @sivanantha321 in #3587
- Fix golint errors by @andyi2it in #3552
- Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
- Fix Pydantic 2 warnings by @cmaddalozzo in #3622
- build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
- Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
- build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
- Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
- Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
- Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
- Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
- fix for extract zip from gcs by @andyi2it in #3510
- fix: HPA equality check should include annotations by @terrytangyuan in #3650
- Fix: model id and model dir check order by @yuzisun in #3680
- Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
- Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
- Fix kserve version is not updated properly by python-release.sh by @sivanantha321 in #3707
- Add precaution again running v1 endpoints on openai models by @grandbora in #3694
- Typos and minor fixes by @alpe in #3429
- Fix model_id and model_dir precedence for vLLM by @yuzisun in #3718
- Fixup max_length for HF and model info for vLLM by @Datta0 in #3715
- Fix prompt token count and provide completion usage in OpenAI response by @sivanantha321 in #3712
⬆️ Version Upgrade
- Upgrade orjson to version 3.9.15 by @spolti in #3488
- feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
- Update cert manager version in quick install script by @shauryagoel in #3496
- ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
- upgrade knative to 1.13 by @andyi2it in #3457
- Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
- chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
- upgrade vllm/transformers version by @johnugeorge in #3671
🔨 Project SDLC
- Enhance CI environment by @sivanantha321 in #3440
- Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
- chore: Update list of reviewers by @ckadner in #3484
- build: Add helm docs update to make generate command by @terrytangyuan in #3437
- Added v2 infer test for supported model frameworks. by @andyi2it in #3349
- fix the quote format same with others and docstrings by @leyao-daily in #3490
- remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
- Remove GOARCH by @mkumatag in #3523
- GH Alert: Potential file inclusion via variable by @spolti in #3520
- Update codeQL to v3 by @spolti in #3548
- switch e2e test inference graph to raw mode by @andyi2it in #3511
- Black lint by @cmaddalozzo in #3568
- Fix python linter by @sivanantha321 in #3571
- build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
- build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
- Allow rerunning failed workflows by comment by @andyi2it in #3550
- add re-run info in the PR templates by @spolti in #3633
- Add e2e tests for huggingface by @sivanantha321 in #3600
- Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
- workflow file for cherry-pick on comment by @andyi2it in #3653
- Fix: huggingface runtime in helm chart by @yuzisun in #3679
- Copy generated CRDs by kustomize to Helm by @Jooho in #3392
...
v0.13.0-rc1
What's Changed
- upgrade vllm/transformers version by @johnugeorge in #3671
- Add openai models endpoint by @cmaddalozzo in #3666
- feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in #3603
- Enable dtype support for huggingface server by @Datta0 in #3613
- Add method for checking model health/readiness by @cmaddalozzo in #3673
- fix for extract zip from gcs by @andyi2it in #3510
- Update Dockerfile and Readme by @gavrishp in #3676
- Update huggingface readme by @alexagriffith in #3678
- fix: HPA equality check should include annotations by @terrytangyuan in #3650
- Fix: huggingface runtime in helm chart by @yuzisun in #3679
- Fix: model id and model dir check order by @yuzisun in #3680
- Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in #3688
- Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in #3684
- Unify the log configuration using kserve logger by @sivanantha321 in #3577
- Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in #3700
- Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in #3705
New Contributors
Full Changelog: v0.13.0-rc0...v0.13.0-rc1
v0.13.0-rc0
🌈 What's New?
- add support for async streaming in predict by @alexagriffith in #3475
- Fix: Support model parallelism in HF transformer by @gavrishp in #3459
- Support model revision and tokenizer revision in huggingface server by @lizzzcai in #3558
- OpenAI schema by @tessapham in #3477
- Support OpenAIModel in ModelRepository by @grandbora in #3590
- updated xgboost to support json and ubj models by @andyi2it in #3551
- Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in #3582
- VLLM support for OpenAI Completions in HF server by @gavrishp in #3589
- Add a user friendly error message for http exceptions by @grandbora in #3581
- feat: Provide minimal distribution of CRDs by @terrytangyuan in #3492
- set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in #3594
- Enabled the multiple domains support on an inference service by @houshengbo in #3615
- Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in #3621
- Add headers to predictor exception logging by @grandbora in #3658
- Enhance controller setup based on available CRDs by @israel-hdez in #3472
⚠️ What's Changed
- Remove conversion webhook from manifests by @Jooho in #3476
- Remove cluster level list/watch for configmaps, serviceaccounts, secrets by @sivanantha321 in #3469
- chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in #3443
- docs: Move Alibi explainer to docs by @terrytangyuan in #3579
- Remove generate endpoints by @cmaddalozzo in #3654
🐛 What's Fixed
- Fix:Support Parallelism in vllm runtime by @gavrishp in #3464
- fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in #3424
- Fix isADirectoryError in Azure blob download by @tjandy98 in #3502
- Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in #3481
- Make the modelcar injection idempotent by @rhuss in #3517
- Only pad left for decode-only architecture models. by @sivanantha321 in #3534
- fix lint typo on Makefile by @spolti in #3569
- fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in #3576
- Fix model unload in server stop method by @sivanantha321 in #3587
- Fix golint errors by @andyi2it in #3552
- Fix make deploy-dev-storage-initializer not working by @sivanantha321 in #3617
- Fix Pydantic 2 warnings by @cmaddalozzo in #3622
- build: Fix CRD copying in generate-install.sh by @terrytangyuan in #3620
- Only load from model repository if model binary is not found under model_dir by @sivanantha321 in #3559
- build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in #3641
- Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in #3657
- Fix Huggingface server stopping criteria by @cmaddalozzo in #3659
- Explicitly specify pad token id when generating tokens by @sivanantha321 in #3565
- Fix quick install does not cleans up Istio installer by @sivanantha321 in #3660
⬆️ Version Upgrade
- Upgrade orjson to version 3.9.15 by @spolti in #3488
- feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in #3374
- Update cert manager version in quick install script by @shauryagoel in #3496
- ci: Bump minikube version to work with newer K8s version by @terrytangyuan in #3498
- upgrade knative to 1.13 by @andyi2it in #3457
- Upgrade istio to 1.20 works for the Github Actions by @houshengbo in #3529
- chore: Bump ModelMesh version to v0.12.0-rc0 in Helm chart by @terrytangyuan in #3642
🔨 Project SDLC
- Enhance CI environment by @sivanantha321 in #3440
- Fixed go lint error using golangci-lint tool. by @andyi2it in #3378
- chore: Update list of reviewers by @ckadner in #3484
- build: Add helm docs update to make generate command by @terrytangyuan in #3437
- Added v2 infer test for supported model frameworks. by @andyi2it in #3349
- fix the quote format same with others and docstrings by @leyao-daily in #3490
- remove unnecessary Istio settings from quick_install.sh by @peterj in #3493
- Remove GOARCH by @mkumatag in #3523
- GH Alert: Potential file inclusion via variable by @spolti in #3520
- Update codeQL to v3 by @spolti in #3548
- switch e2e test inference graph to raw mode by @andyi2it in #3511
- Black lint by @cmaddalozzo in #3568
- Fix python linter by @sivanantha321 in #3571
- build: Add flake8 and black to pre-commit hooks by @terrytangyuan in #3578
- build: Allow pre-commit to keep changes in reformatted code by @terrytangyuan in #3604
- Allow rerunning failed workflows by comment by @andyi2it in #3550
- add re-run info in the PR templates by @spolti in #3633
- Add e2e tests for huggingface by @sivanantha321 in #3600
- Test image builds for ARM64 arch in CI by @sivanantha321 in #3629
- workflow file for cherry-pick on comment by @andyi2it in #3653
CVE patches
- CVE-2024-24762 - update fastapi to 0.109.1 by @spolti in #3556
- golang.org/x/net Allocation of Resources Without Limits or Throttling by @spolti in #3596
- Fix CVE-2023-45288 for qpext by @sivanantha321 in #3618
- Security fix - CVE 2024 24786 by @andyi2it in #3585
📝 Documentation Update
- qpext: fix a typo in qpext doc by @daixiang0 in #3491
- Update KServe project description by @yuzisun in #3524
- Update kserve cake diagram by @yuzisun in #3530
- Remove white background for the kserve diagram by @yuzisun in #3531
- fix a typo in OPENSHIFT_GUIDE.md by @marek-veber in #3544
- Fix typo in README.md by @terrytangyuan in #3575
New Contributors
- @leyao-daily made their first contribution in #3490
- @peterj made their first contribution in #3493
- @timothyjlaurent made their first contribution in #3374
- @shauryagoel made their first contribution in #3496
- @mkumatag made their first contribution in #3523
- @marek-veber made their first contribution in #3544
- @trojaond made their first contribution in #3481
- @grandbora made their first contribution in #3590
- @saileshd1402 made their first contribution in #3657
Full Changelog: v0.12.1...v0.13.0-rc0
v0.12.1
What's Changed
- [release-0.12] Update fastapi to 0.109.1 and Support ray 2.10 by @sivanantha321 in #3609
- [release-0.12] Pydantic 2 support by @cmaddalozzo in #3614
- [release-0.12] Make the modelcar injection idempotent by @sivanantha321 in #3612
- Prepare for release 0.12.1 by @sivanantha321 in #3610
- release-0.12 pin back ray to 2.10 by @yuzisun in #3616
- [release-0.12] Fix docker build failure for ARM64 by @sivanantha321 in #3627
Full Changelog: v0.12.0...v0.12.1
v0.12.0
🌈 What's New?
Core Inference & Serving Runtimes
- Implement HuggingFace model server by @yuzisun in #3334
- feat: Add HuggingFace runtime out-of-the-box support by @terrytangyuan in #3395
- Implement support for vllm as alternative backend by @gavrishp in #3415
- Torchserve grpc v2 by @andyi2it in #3247
- feat: CA bundle mount options for storage initializer by @Jooho in #3250
- Add support for modelcars by @rhuss in #3110
- Add compatibility for Istio CNI plugin by @israel-hdez in #3316
- feat: Allow to disable ingress creation for raw deployment mode by @terrytangyuan in #3436
Advanced Inference
- RawDeployment support for Inference Graph by @bmopuri in #3199, @bmopuri in #3194
- Added custom request timeout for inferencegraph. by @andyi2it in #3173
- Add regex support for propagating IG headers by @sivanantha321 in #3178
KServe Python SDK, Storage
- Unpack archive files for hdfs by @sivanantha321 in #3093
- feat: Support S3 transfer acceleration by @terrytangyuan in #3305
⚠️ What's Changed
- Change the default value for enableDirectPvcVolumeMount to true by @Jooho in #3371
- Add model arguments to API and update BERT inference example by @yuzisun in #3332
--model_name
, --predictor_host
, --predictor_use_ssl
, --predictor_request_timeout_seconds
are added to the kserve model server and no longer need to be defined in the custom predictor or transformer. --protocol
is deprecated and superceded by --predictor_protocol
. More details can be found on API reference doc.
🐛 What's Fixed
- Removing update op from pod-mutator webhook by @rachitchauhan43 in #3163
- Fix quick install script by @dtrifiro in #3164
- Fix self-signed-ca installation by @sivanantha321 in #3165
- Add S3_VERIFY_SSL to storage.py for S3 by @Jooho in #3172
- Fix runtime not found for triton due to wrong default protocolVersion by @sivanantha321 in #3177
- Make ModelServer to stop correctly when using more than 1 worker by @andyi2it in #3174
- Fix serving runtime webhook cert namespace for kubeflow installation by @sivanantha321 in #3188
- Fix knative config-defaults values overrided by kserve by @sivanantha321 in #3130
- Fix qpext metrics port by @yuzisun in #3209
- Added async with postprocess method. by @andyi2it in #3204
- Fix lightgbm model input conversion when input is list of lists by @sivanantha321 in #3226
- Validation added for ensuring same model format has same priority for runtime by @andyi2it in #3181
- Fix: Unexpected Panic in Inference graph when it fails to create http request by @HAO2167 in #3079
- Support verify variable with storage-config json style (fix-3263) by @Jooho in #3267
- s3 storage initializer: only set environment variables if variables are set in storage secret json by @dtrifiro in #3259
- Fix tensorflow e2e test fails due to OOM error by @sivanantha321 in #3293
- fix: Properly handle the creation and closure of success file in DownloadModel() by @terrytangyuan in #3295
- fix: Surface errors when writing graphHandler response by @terrytangyuan in #3308
- Fix qpext hangs during shutdown by @sivanantha321 in #3268
- fix: Check if HPA has the same scaleTargetRef and behavior by @terrytangyuan in #3294
- Updated quick_install script to temporarily fix 0.11.2 release install by @andyi2it in #3311
- image_patch_dev.sh: set pipefail by @dtrifiro in #3274
- Move pmml worker validation to runtime by @sivanantha321 in #3182
- Introduce retry on resource conflict by @sivanantha321 in #3240
- Fix inference request fails when sending with less number of features than the total model features on lightgbm by @sivanantha321 in #3313
- Fix raw deployment service points to predictor container port instead of transformer container port in transformer collocation by @sivanantha321 in #3318
- Restrict storage uri to predictor only in collocation of transformer and predictor by @sivanantha321 in #3280
- feat: Expose defaults for several batcher handler parameters by @terrytangyuan in #3301
- fix: Properly close resources and handle errors in agent and storage. Fixes #3323 by @terrytangyuan in #3321
- Handles s3 download for object name starts with folder name. by @andyi2it in #3205
- chore: Remove unused timeout annotation and flag in batcher by @terrytangyuan in #3341
- Pass missing infer parameters during conversion by @sivanantha321 in #3368
- Add exception handler for model server and Add ability to specify custom handler by @sivanantha321 in #3405
- fix: Add missing volume mount to transformer container when using modelcars by @rhuss in #3384
- fix: Add 'model_version' to InferResponse in python library by @ajstewart in #3466
- Fix v2 model ready url in kserve client by @sivanantha321 in #3403
- Fix parameters value type conversion by pydantic by @sivanantha321 in #3430
- Fix Raw Logger E2E by @israel-hdez in #3434
- Expose qpext aggregate metrics port on container by @sivanantha321 in #3291
- Fix dup metrics aggr port by @yuzisun in #3447
- fix: HuggingFace predictor should not be recognized as multi-model server by @terrytangyuan in #3449
- Fix: bugs for huggingface runtime template by @yuzisun in #3448
- Fix: Add padding and truncation in huggingface tokenizer by @kevinmingtarja in #3450
- Fix: vllm backend does not work with model_dir for huggingface runtime by @yuzisun in #3456
- Fix azure workload identity federation by excluding azure client secret by @robbertvdg in #3390
- Change
certificate
toca_bundle
in json style of s3 storageSecret by @Jooho in #3463
⬆️ Version Upgrade
- Upgrade istio Api and migrate to v1beta1 Api version by @sivanantha321 in #3150
- Bump torchserve version to 0.9.0 by @gavrishp in #3217
- Allow ray >=2.7,<3 by @ddelange in #3075
- Bump istio version to 1.19.4 by @sivanantha321 in #3258
- Updated ray to 2.8.0 and removed detached flag to avoid deprecation error in future by @andyi2it in #3272
- chore: Upgrade to XGBoost v2.0.2. Fixes #3310 by @terrytangyuan in #3309
- chore: Upgrade Go to v1.21 by @terrytangyuan in #3296
- Added 3.11 support for paddle in workflow. by @andyi2it in #3246
- Upgraded poetry version to 1.7.1 by @andyi2it in #3271
- Upgrade cloudevent to v2 by @homily707 in #3255
- Update knative-serving by @spolti in #3362
- Update google-cloud-storage dependecy to >=2.3.0,<3.0.0 and ray dependency to >=2.8.1, <3.0.0 by @sivanantha321 in #3389
🔨 Project SDLC
- chore: Add design doc template links to feature request template by @ckadner in #3155
- Make storage initializer image configurable by @yuzisun in #3145
- Increase pytest workers for kourier e2e test by @sivanantha321 in #3151
- Restrict workflow concurrency by @vignesh-murugani2i in #3167
- Generate client-go for StorageContainer CR by @sivanantha321 in #3152
- Refractor v1 vs. v2 endpoint unit tests in kserve/test/test_server.py… by @guohaoyu110 in #3158
- Verify codegen in CI by @sivanantha321 in...
v0.12.0-rc1
What's Changed
- docs: Corrections and edits on release process document by @terrytangyuan in #3326
- build: Switch to use kustomize in kubectl to simplify build process. Fixes #3314 by @terrytangyuan in #3315
- feat: Expose defaults for several batcher handler parameters by @terrytangyuan in #3301
- fix: Properly close resources and handle errors in agent and storage. Fixes #3323 by @terrytangyuan in #3321
- Add model arguments to API and update BERT inference example by @yuzisun in #3332
- chore: Update generated APIs and check generated manifests by @terrytangyuan in #3335
- Update python model serving runtime API docstring by @yuzisun in #3338
- Handles s3 download for object name starts with folder name. by @andyi2it in #3205
- chore: Remove unused timeout annotation and flag in batcher by @terrytangyuan in #3341
- ci: Automate release process by @terrytangyuan in #3345
- fixes critical vulnerabilities on ray by @spolti in #3285
- chore: Bump versions to prepare v0.12.0-rc1 release by @terrytangyuan in #3352
- Change version for helm charts in README by @gawsoftpl in #3353
- Fixes CVE-2023-48795 by @spolti in #3354
- Fix Stack-based Buffer Overflow on protobuf by @spolti in #3358
- Update knative-serving by @spolti in #3362
- Fixes vulnerabilities on the otelhttp dependency by @spolti in #3361
- Change the default value for enableDirectPvcVolumeMount to true by @Jooho in #3371
- feat: Automatically generate Helm Chart docs. Fixes #3356 by @terrytangyuan in #3363
- Modified script for include all kserve poetry projects. by @andyi2it in #3350
- RawDeployment support for Inference Graph by @bmopuri in #3199
- Add compatibility for Istio CNI plugin by @israel-hdez in #3316
- Pass missing infer parameters during conversion by @sivanantha321 in #3368
- feat: Support S3 transfer acceleration by @terrytangyuan in #3305
- Implement HuggingFace model server by @yuzisun in #3334
- fix: Add missing volume mount to transformer container when using modelcars by @rhuss in #3384
- align cloudevents/sdk-go dependency by @spolti in #3387
New Contributors
- @gawsoftpl made their first contribution in #3353
Full Changelog: v0.12.0-rc0...v0.12.0-rc1