-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.14 release blog #431
base: main
Are you sure you want to change the base?
v0.14 release blog #431
Conversation
Contribute blog article for v0.14 release Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
✅ Deploy Preview for elastic-nobel-0aef7a ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@yuzisun @greenmoon55 May you, please, review? |
Thanks @israel-hdez !! |
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
modelSize: 1Gi | ||
nodeGroup: nodegroup1 | ||
sourceModelUri: gs://kfserving-examples/models/sklearn/1.0/model | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add the isvc with the example explaining how to use this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the InferenceService doesn't change and you use it normally (i.e. you still would use gs://kfserving-examples/models/sklearn/1.0/model
for storageUri
) .
The difference you would notice is that the model will be fetched/mounted from the cache, instead of downloading it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can add a brief note about what I just wrote.
|
||
This release also includes several enhancements and changes: | ||
|
||
### What's New? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sivanantha321 @andyi2it also good to the add the binary extension support and response header support?
#419
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow thought that the binary extension was an enhancement of the Inference Client. So, to better understand... should I add it under the inference client heading? or it is good here as a bullet under What's New?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary extension is not part of the inference client effort, it is implementing the binary extension as part of the open inference protocol along with FP16 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can link the documentation as well. https://kserve.github.io/website/latest/modelserving/data_plane/binary_tensor_data_extension/
* Allow PVC storage to be mounted in ReadWrite mode via an annotation [#3687](https://github.com/kserve/kserve/issues/3687) | ||
|
||
### What's Changed? | ||
* Added `hostIPC` field to `ServingRuntime` CRD, for supporting more than one GPU in Serverless mode [#3791](https://github.com/kserve/kserve/issues/3791) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to add a section for LLM runtime support to include the changes as part of the 0.14 release
- vllm 0.6.x support
- add health endpoint for vLLM backend
- support shared memory volume for vLLM backend
- support chat completion template file
- support trust_remote_code for vllm and HF backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a dedicated section with LLM-related enhancements? or should those be listed here under What's Changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes worth calling it out separately, cc @sivanantha321 to check if the list of changes are correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ray is an optional dependency now and the way it is implemented is changed. It is worth mentioning this as a breaking change. kserve/kserve#3834
Proposed Changes
Notes
I haven't tried the model cache. So, I'm not sure the provided YAML is correct.