Load Balancing not happening in Multiple Replicas !!! #330

MLHafizur · 2023-02-16T19:53:34Z

We deployed PyTorch model using MLServer Serving runtime. Our goal is to get faster prediction with higher load.
So we created 5 replicas for each runtime:

But we sending mutiple parallel or sequencial inference requests, all of the request are take care by one pod out of 5 replicas. Which is delaying us to get the result. Also we noticed that after few successful requests, the requests are failing.

Keeping 5 replicas up running is expesive, is there any way to scale them zero or one, when there's no inference requests?

njhill · 2023-02-18T01:16:30Z

We deployed PyTorch model using MLServer Serving runtime. Our goal is to get faster prediction with higher load.
So we created 5 replicas for each runtime:

@MLHafizur if there's sufficient load, the model should get loaded in additional replicas up to the total number you have for that runtime. There will be a bit of a delay while this scale-up happens, depending on how long your models take to load.

Also we noticed that after few successful requests, the requests are failing.

You'll have to provide more detail about the kind of failure.. from the client side and also logs from the containers.

Keeping 5 replicas up running is expesive, is there any way to scale them zero or one, when there's no inference requests?

There is a plan to allow it to work with HPA, see #329.

rafvasq · 2024-02-16T18:47:25Z

Closing due to inactivity. HPA was introduced in #342, and please feel free to reopen with more information about the failing requests or open a new issue.

MLHafizur added the bug Something isn't working label Feb 16, 2023

ckadner mentioned this issue Sep 22, 2023

No information about how Predictor autoscaling works? #434

Open

rafvasq closed this as completed Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load Balancing not happening in Multiple Replicas !!! #330

Load Balancing not happening in Multiple Replicas !!! #330

MLHafizur commented Feb 16, 2023

njhill commented Feb 18, 2023

rafvasq commented Feb 16, 2024

Load Balancing not happening in Multiple Replicas !!! #330

Load Balancing not happening in Multiple Replicas !!! #330

Comments

MLHafizur commented Feb 16, 2023

njhill commented Feb 18, 2023

rafvasq commented Feb 16, 2024