Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use ServingRuntime autoscaling? #374

Open
andreapairon opened this issue May 17, 2023 · 8 comments
Open

How to use ServingRuntime autoscaling? #374

andreapairon opened this issue May 17, 2023 · 8 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@andreapairon
Copy link

Hi all,

which is the way to correctly use the HPA autoscaling on ServingRuntime?
Should I remove the replicas property under spec ?
Should I update all the YAML files involved in the "Enable HPA..." commit or use another version of the ModelMesh controller image?

It's not very easy to understand how to use the autoscaling from the scaling documentation page.

@ckadner ckadner added the question Further information is requested label May 24, 2023
@ckadner
Copy link
Member

ckadner commented May 24, 2023

@Jooho -- there may be a need to update our docs to clear up some of the confusion :-)

@andreapairon
Copy link
Author

Thank you. In the meantime...can you tell me how to activate the serving runtimes autoscaling? :D

@Jooho
Copy link
Contributor

Jooho commented May 25, 2023

@andreapairon

I'm sorry for causing confusion and thank you for providing the questions. I will answer each of the questions you have raised.

which is the way to correctly use the HPA autoscaling on ServingRuntime?

In order to enable HPA, you can add this annotation for the specific ServingRuntime.
This is an example:

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    serving.kserve.io/autoscalerClass: hpa

Should I remove the replicas property under spec ?

Correct. If you want to enable HPA, you have to remove the replicas from ServingRuntime spec.

Should I update all the YAML files involved in the "Enable HPA..." commit or use another version of the ModelMesh controller image?

HPA uses webhook so you have to update all yaml files

Additional Comments:
By default, HPA-specific features are managed through annotations in ServingRuntime, which is different from kserve/kserve being managed through annotations or predictor specs in inference service. This is because by design, multiple models in a kserve/modelmesh share a single ServingRuntime. HPA is a default object provided by kubernetes, and ModelMesh relies on this HPA object to autoscale the ServingRuntime Pods.

If you have further questions, please let me know.

@ckadner
Copy link
Member

ckadner commented May 26, 2023

@andreapairon -- when you got a chance to try it out, would you be willing to open a PR to update our docs?

@andreapairon
Copy link
Author

@ckadner --- yeah, I'll do.

@Jooho --- But to enable HPA, is it necessary the KNative installation as well? Or the standalone installation of KServe ModelMesh Serving is enough?

@Jooho
Copy link
Contributor

Jooho commented Jun 1, 2023

@andreapairon No it does not need knative installation but the cluster has to support metrics.

@ckadner
Copy link
Member

ckadner commented Jun 20, 2023

We have to update the go code for the latest Kubernetes version, OpenShift v4.13 and K8s v1.26 no beta2 version of HPA (deprecated)

W0713 21:57:33.276930       1 warnings.go:70] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler

@ckadner ckadner assigned ckadner and unassigned Jooho Jul 11, 2023
@ckadner ckadner added this to the v0.11.0 milestone Jul 11, 2023
@ckadner ckadner added bug Something isn't working and removed question Further information is requested labels Jul 13, 2023
@rafvasq
Copy link
Member

rafvasq commented Jul 20, 2023

We have to update the go code for the latest Kubernetes version, OpenShift v4.13 and K8s v1.26 no beta2 version of HPA (deprecated)

W0713 21:57:33.276930       1 warnings.go:70] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler

That's completed now with #403, but this issue is more about better documenting how to use autoscaling right? Maybe we can either open a new issue re: improving autoscaling documentation, or repurpose/rename this one to better capture that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants