-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Component Guide] KFServing #82
Comments
Working e2e POC: In this tutorial we will create a TLS enabled load balancer endpoint for serving prediction requests over HTTPS. Pre-requisites:
Background:
We will be creating an ALB endpoint which authorizes based on a token in a predefined header. This will enable service to service communication. Step 1: Register a domain or reuse the domain if you are using the Cognito based deployment. You can get a domain using any domain registration service like Route53 or godaddy.com. Suppose you already had a registered domain Step 2: Modify knative domain configuration to use your custom domain. Step 3: To get TLS support from the ALB, you need to request a certificate in AWS Certificate Manager. Follow this tutorial to create a certificate for Step 4: Create an ingress with the following config by substituting the value of
Step 5: Wait for the ALB to get created (this takes about 3-5 minutes)
Step 6: When ALB is ready, copy the DNS name of that load balancer and create a CNAME entry to it in Route53 under subdomain ( Step 7: Create a sample sklearn InferenceService and wait for it to be READY
Step 8: Your endpoint is ready to serve! Send request using the sample script below by substituing the values for
Output would look like:
|
Looks fine to me, in the pre-requisites when you direct them towards https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html to have the cluster subnets tagged properly, perhaps we should also link to this command we have in the cognito readme to make it easier for the user? Specifically section 4 of this https://github.com/awslabs/kubeflow-manifests/tree/main/docs/deployment/cognito#40-configure-ingress |
Additionally can you give context on what the randomTokenxxx and randomTokenyyy should be are these just random string-values the user will create and do they need to remember them for future use or is a one-time token since they will I assume expire similiar to the way AWSELBAuthSessionCookie have a pre-determined lifespan. |
for 1. -> I will work on kubeflow/kubeflow#67 first so the prerequisites are reduced |
@surajkota I see that we specify 2 values for the tokens when defining the Ingress but we only specify one when doing a test request. |
**Description of your changes:** - Bring in changes from #114: - TODO item from #109 regarding detailed documentation for telemetry component - Changed the name from AWS distribution of Kubeflow to Kubeflow on AWS to be consistent with website and usage tracking documentation - Added a section in vanilla Kubeflow readme: `Exposing Kubeflow over Load Balancer` to this [#67](#67) to expose deployment over LoadBalancer. - adds fixes for a few broken links - Sync the knative manifest for other deployment options with [vanilla](https://github.com/awslabs/kubeflow-manifests/blob/14c17ff16689dbf70af7fb7971deb7da63105690/docs/deployment/vanilla/kustomization.yaml#L17) corresponding to this [change](kubeflow/manifests#1966). This was a missed in initial PR because of looking at 2 branches to create this one **Testing** - links working as expected - tested kfserving model using steps from #82 for the knative overlay change
Hello, we followed the guide but unfortunately could not get an inference to return a result. Effect of this issueThis issue is critical for us since it blocks us from deploying any model to production. IssuePerforming an inference to a model server returns a 504 Gateway Timeout Error Steps to reproduce
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: mycluster-1
version: "1.21"
region: us-east-2
availabilityZones: ["us-east-2a", "us-east-2b"]
managedNodeGroups:
- name: ai-platform-1
minSize: 3
maxSize: 5
desiredCapacity: 3
instanceType: m6i.xlarge
iam:
withAddonPolicies:
autoScaler: true
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/mycluster-1: "owned"
apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
name: prod
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: default-editor-access-binding-prod
subjects:
- kind: ServiceAccount
name: default-editor
namespace: prod
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
Context
apiVersion: v1
data:
ai.mydomain.com: ""
kind: ConfigMap
[...]
Any help on this would be extremely appreciated. |
Thanks for reporting the issue and detailed steps to reproduce. My first thought is you might be hitting these open issue where auth policy is incomplete in profile namespace: kserve/kserve#1558, kubeflow/dashboard#13. I can confirm this issue exists in 1.4 since I was also not able to get the request served in profile namespace. I created a plain k8s namespace using: I understand from our discussion that this is not ideal since these models wont be visible in the webapp and it will not be accessible from pipelines or notebooks but lets try to break this into: getting inference request served w/o profile and getting it to work in profile namespace |
Thank you @surajkota , I was able to still use profiles and get the inference working by applying the right auth policy (mine is slightly modified to the one linked to add v2 protocol support) : apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-predictor
namespace: istio-system
spec:
selector:
matchLabels:
component: predictor
action: ALLOW
rules:
- to:
- operation:
paths:
- /metrics
- /healthz
- /ready
- /wait-for-drain
- /v1/models/*
- /v2/models/* If someone uses transformers he would need to apply it for transformers as well : apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-transformer
namespace: istio-system
spec:
selector:
matchLabels:
component: transformer
action: ALLOW
rules:
- to:
- operation:
paths:
- /metrics
- /healthz
- /ready
- /wait-for-drain
- /v1/models/*
- /v2/models/* This is not an AWS issue, it should be added to vanilla Kubeflow in my opinion. |
The above problem is being addressed in this PR: kubeflow/kubeflow#6013 |
**Which issue is resolved by this Pull Request:** Resolves #82 **Description of your changes:** - Guide for serving prediction request over load balancer on AWS **Testing:** - Tested the README manually By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Is your feature request related to a problem? Please describe.
Existing samples/tutorial in this repository focus on how to install Kubeflow on EKS.
kserve docs show how to create an inference service and sending requests from ingress gateway, if user has a realDNS etc. It does not talk about auth or how to setup a real DNS
Describe the solution you'd like
The text was updated successfully, but these errors were encountered: