Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterServingRuntime support #81

Closed
pvaneck opened this issue Dec 3, 2021 · 4 comments
Closed

ClusterServingRuntime support #81

pvaneck opened this issue Dec 3, 2021 · 4 comments
Assignees

Comments

@pvaneck
Copy link
Member

pvaneck commented Dec 3, 2021

KServe supports cluster-scoped ServingRuntimes called ClusterServingRuntimes. These act as the built-in or default serving runtimes accessible to any user/namespace in the cluster. Currently ModelMesh-Serving only considers the the namespace-scoped ServingRunimes. Let's think about how ModelMesh-Serving can handle these cluster-level resources.

@njhill
Copy link
Member

njhill commented Sep 7, 2022

Summary of design discussion with @chinhuang007:

  • A ServingRuntime in a given namespace will "hide" a ClusterServingRuntime with the same name for any inference services in that namespace
  • Deployments derived from ClsuterServingRuntimes won't have an owner set, otherwise the ServingRuntime will be set as the owner as it is now
  • When installing in cluster scope mode (default), the built-in runtimes will be created as ClusterServingRuntimes, otherwise they will be created as regular ServingRuntimes in the target namespace

Code changes required:

  • Have the SR reconciler also watch CSRs, with a handler function that maps CSR events to multiple SR requests, one per "modelmesh-enabled" namespace
  • In the Reconcile function, always Get the SR as we do now. If it is not found, attempt to Get a CSR with the same name
  • Refactor the rest of Reconcile function to work with *ServingRuntimeSpec (which exists in both CR and CSR) rather than SR directly
  • Where list of runtimes is retrieved, instead create merged list of SR and CSR with SRs taking precedence

We should have the CSR parts conditional on some flag so that things will work the same as they do now if that CRD does not exist (or controller does not have permisison to list/read them). We may also want to expose and option to explicitly enable/disable use of CSRs in namespace-scope case.

@chinhuang007
Copy link
Contributor

chinhuang007 commented Sep 8, 2022

I have shared the discussions, design, and potential implementation/changes here. It is a bit more elaborated than what @njhill summarized above. Feel free to make comments!

kserve-oss-bot pushed a commit that referenced this issue Sep 22, 2022
Add ClusterServingRuntime (CSR) support so that the CSRs can be shared between namespaces and overridden by namespaece level ServingRuntimes (SR).

#### Motivation
This PR provides the support for ClusterServingRuntime (CSR), addressing [issue ](#81). The use cases and design can be found [here](https://docs.google.com/document/d/1lSqwqmiOeS7rJTtxfSdvxuTK_GzaOwTV5BXfnw2IEJs/edit#heading=h.x1r6y17xd8u7).

#### Modifications

- ServingRuntime controller watches CSRs
- ServingRuntime controller determines which CSRs can be used to create deployments in ModelMesh enabled namespaces
- ServingRuntime controller creates deployments using CSRs
- ServingRuntime controller decides whether a SR or a CSR to use for the deployment
- ServingRuntime controller sets deployment owner=CSR for CSRs
- Refactor code to take SR.spec instead of SR and use the spec from either a SR or a CSR.
- Update unit tests to work with refactored code

Note that changes to install the CSR CRD and required controller roles, and change built-in SRs to be CSRs will be done in a follow-on PR.

#### Result
ModelMesh supports ClusterServingRuntimes while namespace level ServingRuntimes have overriding power.

Contributes to #81

Signed-off-by: Chin Huang <chhuang@us.ibm.com>
@njhill
Copy link
Member

njhill commented Sep 23, 2022

Required changes to the controller have now been completed in #241. Remaining tasks:

  • Update config/install scripts to install CSR CRD and built-in runtimes as CSRs for cluster-scope mode
  • Ensure FVT coverage of both SRs and CSRs

@rafvasq
Copy link
Member

rafvasq commented Jan 19, 2024

Closed by #241.

Install and FVT coverage introduced in #252 and improved in #448.

@rafvasq rafvasq closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants