Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.14 release blog #431

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

israel-hdez
Copy link
Contributor

Proposed Changes

Notes

I haven't tried the model cache. So, I'm not sure the provided YAML is correct.

Contribute blog article for v0.14 release

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Copy link

netlify bot commented Dec 9, 2024

Deploy Preview for elastic-nobel-0aef7a ready!

Name Link
🔨 Latest commit 05f73b6
🔍 Latest deploy log https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/67588b55b8da2a0009890a7b
😎 Deploy Preview https://deploy-preview-431--elastic-nobel-0aef7a.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@israel-hdez
Copy link
Contributor Author

@yuzisun @greenmoon55 May you, please, review?

@yuzisun
Copy link
Member

yuzisun commented Dec 10, 2024

Thanks @israel-hdez !!

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
modelSize: 1Gi
nodeGroup: nodegroup1
sourceModelUri: gs://kfserving-examples/models/sklearn/1.0/model
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the isvc with the example explaining how to use this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the InferenceService doesn't change and you use it normally (i.e. you still would use gs://kfserving-examples/models/sklearn/1.0/model for storageUri) .

The difference you would notice is that the model will be fetched/mounted from the cache, instead of downloading it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can add a brief note about what I just wrote.


This release also includes several enhancements and changes:

### What's New?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sivanantha321 @andyi2it also good to the add the binary extension support and response header support?
#419

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow thought that the binary extension was an enhancement of the Inference Client. So, to better understand... should I add it under the inference client heading? or it is good here as a bullet under What's New? ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary extension is not part of the inference client effort, it is implementing the binary extension as part of the open inference protocol along with FP16 support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Allow PVC storage to be mounted in ReadWrite mode via an annotation [#3687](https://github.com/kserve/kserve/issues/3687)

### What's Changed?
* Added `hostIPC` field to `ServingRuntime` CRD, for supporting more than one GPU in Serverless mode [#3791](https://github.com/kserve/kserve/issues/3791)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to add a section for LLM runtime support to include the changes as part of the 0.14 release

  • vllm 0.6.x support
  • add health endpoint for vLLM backend
  • support shared memory volume for vLLM backend
  • support chat completion template file
  • support trust_remote_code for vllm and HF backend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a dedicated section with LLM-related enhancements? or should those be listed here under What's Changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes worth calling it out separately, cc @sivanantha321 to check if the list of changes are correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray is an optional dependency now and the way it is implemented is changed. It is worth mentioning this as a breaking change. kserve/kserve#3834

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants