Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Component Guide] KFServing #82

Closed
1 of 2 tasks
Tracked by #27
surajkota opened this issue Feb 9, 2022 · 10 comments · Fixed by #173
Closed
1 of 2 tasks
Tracked by #27

[Component Guide] KFServing #82

surajkota opened this issue Feb 9, 2022 · 10 comments · Fixed by #173
Assignees
Labels
enhancement New feature or request work in progress Has been assigned and is in progress

Comments

@surajkota
Copy link
Contributor

surajkota commented Feb 9, 2022

Is your feature request related to a problem? Please describe.
Existing samples/tutorial in this repository focus on how to install Kubeflow on EKS.
kserve docs show how to create an inference service and sending requests from ingress gateway, if user has a realDNS etc. It does not talk about auth or how to setup a real DNS

Describe the solution you'd like

  • E2e tutorials for users on how to use inference service in production on AWS with an ALB endpoint/custom domain and auth
  • How to use a model in S3
@surajkota surajkota added the enhancement New feature or request label Feb 9, 2022
@surajkota surajkota changed the title Guide to using KFServing Guide on using KFServing Feb 9, 2022
@surajkota surajkota changed the title Guide on using KFServing Guide for KFServing Feb 9, 2022
@surajkota surajkota changed the title Guide for KFServing [Component Guide] KFServing Feb 15, 2022
@surajkota surajkota self-assigned this Feb 18, 2022
@surajkota surajkota added the work in progress Has been assigned and is in progress label Feb 23, 2022
@surajkota
Copy link
Contributor Author

surajkota commented Feb 25, 2022

Working e2e POC:

In this tutorial we will create a TLS enabled load balancer endpoint for serving prediction requests over HTTPS.

Pre-requisites:

  1. Kubeflow deployment
  2. Configure and install aws load balancer controller (Installed by default for cognito based deployment, Refer step 1 in #67 comment for dex based deployment)
  3. Cluster subnets tagged according to the Prerequisites section in this document for ALB controller to work

Background:

  • Currently, it is not possible to programatically authenticate a request through ALB that is using cognito authentication. (in other words, you cannot generate the AWSELBAuthSessionCookie cookies yourself by using the tokens from cognito)
  • Certificates for ALB public DNS names are not supported. Instead, you must create a custom domain.

We will be creating an ALB endpoint which authorizes based on a token in a predefined header. This will enable service to service communication.

Step 1: Register a domain or reuse the domain if you are using the Cognito based deployment. You can get a domain using any domain registration service like Route53 or godaddy.com. Suppose you already had a registered domain example.com and want are using platform.example.com for hosting Kubeflow.

Step 2: Modify knative domain configuration to use your custom domain.
Note that knative default domain is in the format {route}.{namespace}.{default-domain}. Lets assume your domain name is platform.example.com and you will create the resource in kserve namespace. We will use this information in the next step

Step 3: To get TLS support from the ALB, you need to request a certificate in AWS Certificate Manager. Follow this tutorial to create a certificate for *.example.com and *.kserve.example.com(both domains in the same certificate) in the region where your cluster exists.
After successful validation, you will get a certificate ARN to use with the Ingress ALB endpoint.

Step 4: Create an ingress with the following config by substituting the value of randomTokenxxx and randomTokenyyy. You can also change the HttpHeaderName from x-api-key to your choice of header. These are the header and token you will pass in your request. Tokens are static strings and you only need to pass one of the tokens in the request.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=180
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/certificate-arn: 'arn:aws:acm:ca-central-1:123456789012:certificate/xxxxxx-xxxx-xxxx-xxxx-xxxxxxx'
    alb.ingress.kubernetes.io/conditions.istio-ingressgateway: '[{"Field":"http-header","HttpHeaderConfig":{"HttpHeaderName": "x-api-key", "Values":["randomTokenxxx", "randomTokenyyy"]}}]'
    alb.ingress.kubernetes.io/actions.istio-ingressgateway: '{"Type":"forward","ForwardConfig":{"TargetGroups":[{"ServiceName":"istio-ingressgateway","ServicePort":"80","Weight":100}]}}'
    kubernetes.io/ingress.class: alb
  name: istio-ingress-api
  namespace: istio-system
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: istio-ingressgateway
          servicePort: 80
        path: /*

Step 5: Wait for the ALB to get created (this takes about 3-5 minutes)

$ kubectl get ingress -n istio-system istio-ingress-api
NAME            CLASS    HOSTS   ADDRESS                                                                     PORTS   AGE
istio-ingress   <none>   *       956f2953-istiosystem-istio-2gh2-1234567890.ca-central-1.elb.amazonaws.com   80      14m

Step 6: When ALB is ready, copy the DNS name of that load balancer and create a CNAME entry to it in Route53 under subdomain (platform.example.com) for *.kserve.platform.example.com.

Step 7: Create a sample sklearn InferenceService and wait for it to be READY

apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  namespace: kserve
spec:
  predictor:
    sklearn:
      storageUri: "gs://kfserving-samples/models/sklearn/iris"
$ k get inferenceservice -A
NAMESPACE    NAME           URL                                                  READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
kserve       sklearn-iris   http://sklearn-iris.kserve.platform.example.com       True           100                              sklearn-iris-predictor-default-00001   2d

Step 8: Your endpoint is ready to serve! Send request using the sample script below by substituing the values for url and Host according to your endpoint:

import requests

data = {
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}

url = "https://sklearn-iris.kserve.platform.example.com/v1/models/sklearn-iris:predict"
headers = {
  "Host" : "sklearn-iris.kserve.platform.example.com",
  "x-api-key": "randomTokenxxx"
}

response = requests.post(url, headers=headers, json=data)

print("Status Code", response.status_code)
print("JSON Response ", response.json())

Output would look like:

Status Code 200
JSON Response  {'predictions': [1, 1]}

@ryansteakley
Copy link
Contributor

Looks fine to me, in the pre-requisites when you direct them towards https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html to have the cluster subnets tagged properly, perhaps we should also link to this command we have in the cognito readme to make it easier for the user? Specifically section 4 of this https://github.com/awslabs/kubeflow-manifests/tree/main/docs/deployment/cognito#40-configure-ingress

@ryansteakley
Copy link
Contributor

Additionally can you give context on what the randomTokenxxx and randomTokenyyy should be are these just random string-values the user will create and do they need to remember them for future use or is a one-time token since they will I assume expire similiar to the way AWSELBAuthSessionCookie have a pre-determined lifespan.

@surajkota
Copy link
Contributor Author

for 1. -> I will work on kubeflow/kubeflow#67 first so the prerequisites are reduced
2 -> tokens are static strings. They are not linked to any session and hence have no expiration times. Will add this to the documentation or think of a better wording

@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented Feb 28, 2022

@surajkota I see that we specify 2 values for the tokens when defining the Ingress but we only specify one when doing a test request.
Should we use 1 or 2 values?
My intuition is that we have the freedom to define 1 or 2 in the Ingress but if we define 2 then we must use 2 when doing a request, is that correct?

surajkota added a commit that referenced this issue Mar 8, 2022
**Description of your changes:**
- Bring in changes from #114:
  - TODO item from #109 regarding detailed documentation for telemetry component
  - Changed the name from AWS distribution of Kubeflow to Kubeflow on AWS to be consistent with website and usage tracking documentation
  - Added a section in vanilla Kubeflow readme: `Exposing Kubeflow over Load Balancer` to this [#67](#67) to expose deployment over LoadBalancer.
- adds fixes for a few broken links
- Sync the knative manifest for other deployment options with [vanilla](https://github.com/awslabs/kubeflow-manifests/blob/14c17ff16689dbf70af7fb7971deb7da63105690/docs/deployment/vanilla/kustomization.yaml#L17) corresponding to this [change](kubeflow/manifests#1966). This was a missed in initial PR because of looking at 2 branches to create this one

**Testing**
- links working as expected
- tested kfserving model using steps from #82 for the knative overlay change
@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented Mar 15, 2022

Hello, we followed the guide but unfortunately could not get an inference to return a result.

Effect of this issue

This issue is critical for us since it blocks us from deploying any model to production.

Issue

Performing an inference to a model server returns a 504 Gateway Timeout Error

Steps to reproduce

  1. Create a cluster on EKS 1.21
    Here is the cluster definition I used
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: mycluster-1
  version: "1.21"
  region: us-east-2

availabilityZones: ["us-east-2a", "us-east-2b"]

managedNodeGroups:
  - name: ai-platform-1
    minSize: 3
    maxSize: 5
    desiredCapacity: 3
    instanceType: m6i.xlarge
    iam:
      withAddonPolicies:
        autoScaler: true
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/mycluster-1: "owned"
  1. Install Kubeflow according to RDS-S3-Cognito setup (I used the rds-s3 script for rds-s3 part then manually did Cognito following the doc)
  • Branch : main + upstream v1.4.1
  1. Create a profile/namespace prod
apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
  name: prod
  1. Create a default-editor service account for prod
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-editor-access-binding-prod
subjects:
- kind: ServiceAccount
  name: default-editor
  namespace: prod
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
  1. Authorize yourself to the namespace (follow kuveflow doc step )
  2. Setup KFServing
  3. Perform an inference.
  4. Notice 504 error
    Note : The predicot logs do not show any activity when performing a requeset
    The request hangs for whatever time the ELB timeout is set to and then returns a 504.
    I tried changing the timeout to 5 minutes and it would hang for 5 minutes and then return 504.

Context

  • Cluster region : us-east-2
  • Cognito setup
    • Root domain (mydomain.com) is under a different domain registry (not AWS Route53)
      • I created a Hosted Zone for the subdomain ai.mydomain.com to allow subdomain delegation.
      • I created 4 NS records on my root domain registry with the nameservers from the subdomain hosted zone
        image
  • KFServing setup
    • Modified the config-domain config map to use my sub domain.
      • kubectl edit configmap config-domain -n knative-serving
apiVersion: v1
data:
  ai.mydomain.com: ""
kind: ConfigMap
[...]
  • Certificates
    • I successfully created a certificate in us-east-2 (my cluster region) using AWS Certificate Manager.
      • The certificate has 2 domain values : *.prod.ai.mydomain.com and *.ai.mydomain.com
  • Created the ingress using a copy paste from the procedure and I entered the certificate ARN in place of arn:aws:acm:ca-central-1:123456789012:certificate/xxxxxx-xxxx-xxxx-xxxx-xxxxxxx
  • Created a CNAME for the ELB
*.prod.ai.mydomain.com | CNAME | Simple | - | 0dfceb22-istiosystem-istio-a50d-1499903191.us-east-2.elb.amazonaws.com

Any help on this would be extremely appreciated.

@surajkota
Copy link
Contributor Author

Thanks for reporting the issue and detailed steps to reproduce. My first thought is you might be hitting these open issue where auth policy is incomplete in profile namespace: kserve/kserve#1558, kubeflow/dashboard#13. I can confirm this issue exists in 1.4 since I was also not able to get the request served in profile namespace.

I created a plain k8s namespace using: kubectl create namespace kserve. Can you try this as well?

I understand from our discussion that this is not ideal since these models wont be visible in the webapp and it will not be accessible from pipelines or notebooks but lets try to break this into: getting inference request served w/o profile and getting it to work in profile namespace

@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented Mar 16, 2022

Thank you @surajkota , I was able to still use profiles and get the inference working by applying the right auth policy (mine is slightly modified to the one linked to add v2 protocol support) :

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-predictor
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: predictor
  action: ALLOW
  rules:
  - to:
    - operation:
        paths:
        - /metrics
        - /healthz
        - /ready
        - /wait-for-drain
        - /v1/models/*
        - /v2/models/*

If someone uses transformers he would need to apply it for transformers as well :

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-transformer
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: transformer
  action: ALLOW
  rules:
  - to:
    - operation:
        paths:
        - /metrics
        - /healthz
        - /ready
        - /wait-for-drain
        - /v1/models/*
        - /v2/models/*

This is not an AWS issue, it should be added to vanilla Kubeflow in my opinion.
Until it is added to vanilla Kubeflow we could mention it in the doc so that people with KF 1.4 don't face the issue.

@surajkota
Copy link
Contributor Author

The above problem is being addressed in this PR: kubeflow/kubeflow#6013

surajkota added a commit that referenced this issue Apr 19, 2022
**Which issue is resolved by this Pull Request:**
Resolves #82 

**Description of your changes:**
- Guide for serving prediction request over load balancer on AWS

**Testing:**
- Tested the README manually

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@surajkota
Copy link
Contributor Author

kubeflow/kubeflow#6627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request work in progress Has been assigned and is in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants