You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We deployed PyTorch model using MLServer Serving runtime. Our goal is to get faster prediction with higher load.
So we created 5 replicas for each runtime:
But we sending mutiple parallel or sequencial inference requests, all of the request are take care by one pod out of 5 replicas. Which is delaying us to get the result. Also we noticed that after few successful requests, the requests are failing.
Keeping 5 replicas up running is expesive, is there any way to scale them zero or one, when there's no inference requests?
The text was updated successfully, but these errors were encountered:
We deployed PyTorch model using MLServer Serving runtime. Our goal is to get faster prediction with higher load.
So we created 5 replicas for each runtime:
@MLHafizur if there's sufficient load, the model should get loaded in additional replicas up to the total number you have for that runtime. There will be a bit of a delay while this scale-up happens, depending on how long your models take to load.
Also we noticed that after few successful requests, the requests are failing.
You'll have to provide more detail about the kind of failure.. from the client side and also logs from the containers.
Keeping 5 replicas up running is expesive, is there any way to scale them zero or one, when there's no inference requests?
There is a plan to allow it to work with HPA, see #329.
Closing due to inactivity. HPA was introduced in #342, and please feel free to reopen with more information about the failing requests or open a new issue.
So we created 5 replicas for each runtime:
But we sending mutiple parallel or sequencial inference requests, all of the request are take care by one pod out of 5 replicas. Which is delaying us to get the result. Also we noticed that after few successful requests, the requests are failing.
The text was updated successfully, but these errors were encountered: