How to use ServingRuntime autoscaling? #374

andreapairon · 2023-05-17T13:24:50Z

Hi all,

which is the way to correctly use the HPA autoscaling on ServingRuntime?
Should I remove the replicas property under spec ?
Should I update all the YAML files involved in the "Enable HPA..." commit or use another version of the ModelMesh controller image?

It's not very easy to understand how to use the autoscaling from the scaling documentation page.

The text was updated successfully, but these errors were encountered:

ckadner · 2023-05-24T02:16:41Z

@Jooho -- there may be a need to update our docs to clear up some of the confusion :-)

andreapairon · 2023-05-24T08:51:48Z

Thank you. In the meantime...can you tell me how to activate the serving runtimes autoscaling? :D

Jooho · 2023-05-25T13:47:01Z

@andreapairon

I'm sorry for causing confusion and thank you for providing the questions. I will answer each of the questions you have raised.

which is the way to correctly use the HPA autoscaling on ServingRuntime?

In order to enable HPA, you can add this annotation for the specific ServingRuntime.
This is an example:

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    serving.kserve.io/autoscalerClass: hpa

Should I remove the replicas property under spec ?

Correct. If you want to enable HPA, you have to remove the replicas from ServingRuntime spec.

Should I update all the YAML files involved in the "Enable HPA..." commit or use another version of the ModelMesh controller image?

HPA uses webhook so you have to update all yaml files

Additional Comments:
By default, HPA-specific features are managed through annotations in ServingRuntime, which is different from kserve/kserve being managed through annotations or predictor specs in inference service. This is because by design, multiple models in a kserve/modelmesh share a single ServingRuntime. HPA is a default object provided by kubernetes, and ModelMesh relies on this HPA object to autoscale the ServingRuntime Pods.

If you have further questions, please let me know.

ckadner · 2023-05-26T00:35:03Z

@andreapairon -- when you got a chance to try it out, would you be willing to open a PR to update our docs?

andreapairon · 2023-05-30T14:19:05Z

@ckadner --- yeah, I'll do.

@Jooho --- But to enable HPA, is it necessary the KNative installation as well? Or the standalone installation of KServe ModelMesh Serving is enough?

Jooho · 2023-06-01T13:23:53Z

@andreapairon No it does not need knative installation but the cluster has to support metrics.

ckadner · 2023-06-20T20:16:18Z

We have to update the go code for the latest Kubernetes version, OpenShift v4.13 and K8s v1.26 no beta2 version of HPA (deprecated)

W0713 21:57:33.276930       1 warnings.go:70] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler

rafvasq · 2023-07-20T16:54:08Z

We have to update the go code for the latest Kubernetes version, OpenShift v4.13 and K8s v1.26 no beta2 version of HPA (deprecated)
W0713 21:57:33.276930       1 warnings.go:70] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler

That's completed now with #403, but this issue is more about better documenting how to use autoscaling right? Maybe we can either open a new issue re: improving autoscaling documentation, or repurpose/rename this one to better capture that.

ckadner assigned Jooho May 24, 2023

ckadner added the question Further information is requested label May 24, 2023

heyselbi mentioned this issue Jul 11, 2023

Update HPA version in kserve/modelmesh opendatahub-io/modelmesh-serving#143

Closed

ckadner assigned ckadner and unassigned Jooho Jul 11, 2023

ckadner added this to the v0.11.0 milestone Jul 11, 2023

ckadner added bug Something isn't working and removed question Further information is requested labels Jul 13, 2023

This was referenced Jul 18, 2023

Update deprecated API version for HPA #402

Closed

chore: Bump to use autoscaling/v2 #403

Merged

ckadner modified the milestones: v0.11.0, v0.11.1 Aug 29, 2023

ckadner mentioned this issue Sep 22, 2023

No information about how Predictor autoscaling works? #434

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use ServingRuntime autoscaling? #374

How to use ServingRuntime autoscaling? #374

andreapairon commented May 17, 2023

ckadner commented May 24, 2023

andreapairon commented May 24, 2023

Jooho commented May 25, 2023

ckadner commented May 26, 2023

andreapairon commented May 30, 2023

Jooho commented Jun 1, 2023

ckadner commented Jun 20, 2023 •

edited

Loading

rafvasq commented Jul 20, 2023

How to use ServingRuntime autoscaling? #374

How to use ServingRuntime autoscaling? #374

Comments

andreapairon commented May 17, 2023

ckadner commented May 24, 2023

andreapairon commented May 24, 2023

Jooho commented May 25, 2023

ckadner commented May 26, 2023

andreapairon commented May 30, 2023

Jooho commented Jun 1, 2023

ckadner commented Jun 20, 2023 • edited Loading

rafvasq commented Jul 20, 2023

ckadner commented Jun 20, 2023 •

edited

Loading