[Component Guide] KFServing #82

surajkota · 2022-02-09T23:31:32Z

Is your feature request related to a problem? Please describe.
Existing samples/tutorial in this repository focus on how to install Kubeflow on EKS.
kserve docs show how to create an inference service and sending requests from ingress gateway, if user has a realDNS etc. It does not talk about auth or how to setup a real DNS

Describe the solution you'd like

E2e tutorials for users on how to use inference service in production on AWS with an ALB endpoint/custom domain and auth
How to use a model in S3

surajkota · 2022-02-25T03:54:54Z

Working e2e POC:

In this tutorial we will create a TLS enabled load balancer endpoint for serving prediction requests over HTTPS.

Pre-requisites:

Kubeflow deployment
Configure and install aws load balancer controller (Installed by default for cognito based deployment, Refer step 1 in #67 comment for dex based deployment)
Cluster subnets tagged according to the Prerequisites section in this document for ALB controller to work

Background:

Currently, it is not possible to programatically authenticate a request through ALB that is using cognito authentication. (in other words, you cannot generate the AWSELBAuthSessionCookie cookies yourself by using the tokens from cognito)
Certificates for ALB public DNS names are not supported. Instead, you must create a custom domain.

We will be creating an ALB endpoint which authorizes based on a token in a predefined header. This will enable service to service communication.

Step 1: Register a domain or reuse the domain if you are using the Cognito based deployment. You can get a domain using any domain registration service like Route53 or godaddy.com. Suppose you already had a registered domain example.com and want are using platform.example.com for hosting Kubeflow.

Step 2: Modify knative domain configuration to use your custom domain.
Note that knative default domain is in the format {route}.{namespace}.{default-domain}. Lets assume your domain name is platform.example.com and you will create the resource in kserve namespace. We will use this information in the next step

Step 3: To get TLS support from the ALB, you need to request a certificate in AWS Certificate Manager. Follow this tutorial to create a certificate for *.example.com and *.kserve.example.com(both domains in the same certificate) in the region where your cluster exists.
After successful validation, you will get a certificate ARN to use with the Ingress ALB endpoint.

Step 4: Create an ingress with the following config by substituting the value of randomTokenxxx and randomTokenyyy. You can also change the HttpHeaderName from x-api-key to your choice of header. These are the header and token you will pass in your request. Tokens are static strings and you only need to pass one of the tokens in the request.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=180
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/certificate-arn: 'arn:aws:acm:ca-central-1:123456789012:certificate/xxxxxx-xxxx-xxxx-xxxx-xxxxxxx'
    alb.ingress.kubernetes.io/conditions.istio-ingressgateway: '[{"Field":"http-header","HttpHeaderConfig":{"HttpHeaderName": "x-api-key", "Values":["randomTokenxxx", "randomTokenyyy"]}}]'
    alb.ingress.kubernetes.io/actions.istio-ingressgateway: '{"Type":"forward","ForwardConfig":{"TargetGroups":[{"ServiceName":"istio-ingressgateway","ServicePort":"80","Weight":100}]}}'
    kubernetes.io/ingress.class: alb
  name: istio-ingress-api
  namespace: istio-system
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: istio-ingressgateway
          servicePort: 80
        path: /*

Step 5: Wait for the ALB to get created (this takes about 3-5 minutes)

$ kubectl get ingress -n istio-system istio-ingress-api
NAME            CLASS    HOSTS   ADDRESS                                                                     PORTS   AGE
istio-ingress   <none>   *       956f2953-istiosystem-istio-2gh2-1234567890.ca-central-1.elb.amazonaws.com   80      14m

Step 6: When ALB is ready, copy the DNS name of that load balancer and create a CNAME entry to it in Route53 under subdomain (platform.example.com) for *.kserve.platform.example.com.

Step 7: Create a sample sklearn InferenceService and wait for it to be READY

apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  namespace: kserve
spec:
  predictor:
    sklearn:
      storageUri: "gs://kfserving-samples/models/sklearn/iris"

$ k get inferenceservice -A
NAMESPACE    NAME           URL                                                  READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
kserve       sklearn-iris   http://sklearn-iris.kserve.platform.example.com       True           100                              sklearn-iris-predictor-default-00001   2d

Step 8: Your endpoint is ready to serve! Send request using the sample script below by substituing the values for url and Host according to your endpoint:

import requests

data = {
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}

url = "https://sklearn-iris.kserve.platform.example.com/v1/models/sklearn-iris:predict"
headers = {
  "Host" : "sklearn-iris.kserve.platform.example.com",
  "x-api-key": "randomTokenxxx"
}

response = requests.post(url, headers=headers, json=data)

print("Status Code", response.status_code)
print("JSON Response ", response.json())

Output would look like:

Status Code 200
JSON Response  {'predictions': [1, 1]}

ryansteakley · 2022-02-26T07:25:41Z

Looks fine to me, in the pre-requisites when you direct them towards https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html to have the cluster subnets tagged properly, perhaps we should also link to this command we have in the cognito readme to make it easier for the user? Specifically section 4 of this https://github.com/awslabs/kubeflow-manifests/tree/main/docs/deployment/cognito#40-configure-ingress

ryansteakley · 2022-02-26T07:27:22Z

Additionally can you give context on what the randomTokenxxx and randomTokenyyy should be are these just random string-values the user will create and do they need to remember them for future use or is a one-time token since they will I assume expire similiar to the way AWSELBAuthSessionCookie have a pre-determined lifespan.

surajkota · 2022-02-28T21:17:53Z

for 1. -> I will work on kubeflow/kubeflow#67 first so the prerequisites are reduced
2 -> tokens are static strings. They are not linked to any session and hence have no expiration times. Will add this to the documentation or think of a better wording

AlexandreBrown · 2022-02-28T21:21:16Z

@surajkota I see that we specify 2 values for the tokens when defining the Ingress but we only specify one when doing a test request.
Should we use 1 or 2 values?
My intuition is that we have the freedom to define 1 or 2 in the Ingress but if we define 2 then we must use 2 when doing a request, is that correct?

**Description of your changes:** - Bring in changes from #114: - TODO item from #109 regarding detailed documentation for telemetry component - Changed the name from AWS distribution of Kubeflow to Kubeflow on AWS to be consistent with website and usage tracking documentation - Added a section in vanilla Kubeflow readme: `Exposing Kubeflow over Load Balancer` to this [#67](#67) to expose deployment over LoadBalancer. - adds fixes for a few broken links - Sync the knative manifest for other deployment options with [vanilla](https://github.com/awslabs/kubeflow-manifests/blob/14c17ff16689dbf70af7fb7971deb7da63105690/docs/deployment/vanilla/kustomization.yaml#L17) corresponding to this [change](kubeflow/manifests#1966). This was a missed in initial PR because of looking at 2 branches to create this one **Testing** - links working as expected - tested kfserving model using steps from #82 for the knative overlay change

AlexandreBrown · 2022-03-15T20:02:49Z

Hello, we followed the guide but unfortunately could not get an inference to return a result.

Effect of this issue

This issue is critical for us since it blocks us from deploying any model to production.

Issue

Performing an inference to a model server returns a 504 Gateway Timeout Error

Steps to reproduce

Create a cluster on EKS 1.21
Here is the cluster definition I used

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: mycluster-1
  version: "1.21"
  region: us-east-2

availabilityZones: ["us-east-2a", "us-east-2b"]

managedNodeGroups:
  - name: ai-platform-1
    minSize: 3
    maxSize: 5
    desiredCapacity: 3
    instanceType: m6i.xlarge
    iam:
      withAddonPolicies:
        autoScaler: true
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/mycluster-1: "owned"

Install Kubeflow according to RDS-S3-Cognito setup (I used the rds-s3 script for rds-s3 part then manually did Cognito following the doc)

Branch : main + upstream v1.4.1

Create a profile/namespace prod

apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
  name: prod

Create a default-editor service account for prod

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-editor-access-binding-prod
subjects:
- kind: ServiceAccount
  name: default-editor
  namespace: prod
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

Authorize yourself to the namespace (follow kuveflow doc step )
Setup KFServing
Perform an inference.
Notice 504 error
Note : The predicot logs do not show any activity when performing a requeset
The request hangs for whatever time the ELB timeout is set to and then returns a 504.
I tried changing the timeout to 5 minutes and it would hang for 5 minutes and then return 504.

Context

Cluster region : us-east-2
Cognito setup
- Root domain (mydomain.com) is under a different domain registry (not AWS Route53)
  - I created a Hosted Zone for the subdomain ai.mydomain.com to allow subdomain delegation.
  - I created 4 NS records on my root domain registry with the nameservers from the subdomain hosted zone
KFServing setup
- Modified the config-domain config map to use my sub domain.
  - kubectl edit configmap config-domain -n knative-serving

apiVersion: v1
data:
  ai.mydomain.com: ""
kind: ConfigMap
[...]

Certificates
- I successfully created a certificate in us-east-2 (my cluster region) using AWS Certificate Manager.
  - The certificate has 2 domain values : *.prod.ai.mydomain.com and *.ai.mydomain.com
Created the ingress using a copy paste from the procedure and I entered the certificate ARN in place of arn:aws:acm:ca-central-1:123456789012:certificate/xxxxxx-xxxx-xxxx-xxxx-xxxxxxx
Created a CNAME for the ELB

*.prod.ai.mydomain.com | CNAME | Simple | - | 0dfceb22-istiosystem-istio-a50d-1499903191.us-east-2.elb.amazonaws.com

Any help on this would be extremely appreciated.

surajkota · 2022-03-15T22:50:33Z

Thanks for reporting the issue and detailed steps to reproduce. My first thought is you might be hitting these open issue where auth policy is incomplete in profile namespace: kserve/kserve#1558, kubeflow/dashboard#13. I can confirm this issue exists in 1.4 since I was also not able to get the request served in profile namespace.

I created a plain k8s namespace using: kubectl create namespace kserve. Can you try this as well?

I understand from our discussion that this is not ideal since these models wont be visible in the webapp and it will not be accessible from pipelines or notebooks but lets try to break this into: getting inference request served w/o profile and getting it to work in profile namespace

AlexandreBrown · 2022-03-16T01:35:56Z

Thank you @surajkota , I was able to still use profiles and get the inference working by applying the right auth policy (mine is slightly modified to the one linked to add v2 protocol support) :

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-predictor
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: predictor
  action: ALLOW
  rules:
  - to:
    - operation:
        paths:
        - /metrics
        - /healthz
        - /ready
        - /wait-for-drain
        - /v1/models/*
        - /v2/models/*

If someone uses transformers he would need to apply it for transformers as well :

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-transformer
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: transformer
  action: ALLOW
  rules:
  - to:
    - operation:
        paths:
        - /metrics
        - /healthz
        - /ready
        - /wait-for-drain
        - /v1/models/*
        - /v2/models/*

This is not an AWS issue, it should be added to vanilla Kubeflow in my opinion.
Until it is added to vanilla Kubeflow we could mention it in the doc so that people with KF 1.4 don't face the issue.

surajkota · 2022-03-23T15:40:13Z

The above problem is being addressed in this PR: kubeflow/kubeflow#6013

**Which issue is resolved by this Pull Request:** Resolves #82 **Description of your changes:** - Guide for serving prediction request over load balancer on AWS **Testing:** - Tested the README manually By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

surajkota · 2022-08-26T00:13:51Z

kubeflow/kubeflow#6627

surajkota added the enhancement New feature or request label Feb 9, 2022

surajkota changed the title ~~Guide to using KFServing~~ Guide on using KFServing Feb 9, 2022

surajkota changed the title ~~Guide on using KFServing~~ Guide for KFServing Feb 9, 2022

surajkota changed the title ~~Guide for KFServing~~ [Component Guide] KFServing Feb 15, 2022

surajkota self-assigned this Feb 18, 2022

surajkota mentioned this issue Feb 18, 2022

AWS distribution of Kubeflow v1.4 #27

Closed

31 tasks

surajkota added the work in progress Has been assigned and is in progress label Feb 23, 2022

This was referenced Mar 3, 2022

Update aws-e2e.md kubeflow/website#2823

Closed

Documentation updates: usage tracking, link and manifest fixes #123

Merged

rrrkharse mentioned this issue Mar 24, 2022

Logs not showing for Inference Service #145

Closed

surajkota mentioned this issue Apr 12, 2022

KFServing component guide #173

Merged

surajkota closed this as completed in #173 Apr 19, 2022

ananth102 mentioned this issue Mar 3, 2023

Not able to access the model server url externally #599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Component Guide] KFServing #82

[Component Guide] KFServing #82

surajkota commented Feb 9, 2022 •

edited

Loading

surajkota commented Feb 25, 2022 •

edited

Loading

ryansteakley commented Feb 26, 2022

ryansteakley commented Feb 26, 2022

surajkota commented Feb 28, 2022

AlexandreBrown commented Feb 28, 2022 •

edited

Loading

AlexandreBrown commented Mar 15, 2022 •

edited

Loading

surajkota commented Mar 15, 2022

AlexandreBrown commented Mar 16, 2022 •

edited

Loading

surajkota commented Mar 23, 2022

surajkota commented Aug 26, 2022

[Component Guide] KFServing #82

[Component Guide] KFServing #82

Comments

surajkota commented Feb 9, 2022 • edited Loading

surajkota commented Feb 25, 2022 • edited Loading

ryansteakley commented Feb 26, 2022

ryansteakley commented Feb 26, 2022

surajkota commented Feb 28, 2022

AlexandreBrown commented Feb 28, 2022 • edited Loading

AlexandreBrown commented Mar 15, 2022 • edited Loading

Effect of this issue

Issue

Steps to reproduce

Context

surajkota commented Mar 15, 2022

AlexandreBrown commented Mar 16, 2022 • edited Loading

surajkota commented Mar 23, 2022

surajkota commented Aug 26, 2022

surajkota commented Feb 9, 2022 •

edited

Loading

surajkota commented Feb 25, 2022 •

edited

Loading

AlexandreBrown commented Feb 28, 2022 •

edited

Loading

AlexandreBrown commented Mar 15, 2022 •

edited

Loading

AlexandreBrown commented Mar 16, 2022 •

edited

Loading