Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

katib ui: adapt environment in which cluster role is unavailable #1141

Merged
merged 6 commits into from
Apr 16, 2020

Conversation

sperlingxx
Copy link
Member

Katib UI fails to get experiment list when cluster role is unavailable. This problem is fixed in this PR through additional environment variable representing available namespaces.

@kubeflow-bot
Copy link

This change is Reviewable

@sperlingxx
Copy link
Member Author

sperlingxx commented Apr 13, 2020

@andreyvelich @johnugeorge Could you help to review this PR?

availableNameSpaces, hasClusterRole = func() ([]string, bool) {
ns := env.GetEnvOrDefault(AvailableNameSpaceEnvName, ClusterRoleKey)
if ns == ClusterRoleKey {
return []string{""}, true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident but do you mistake it for []string{}?

And I think you can use init() function like:

var (
    availableNameSpaces []string
    hasClusterRole      bool
)

func init() {
    ns := env.GetEnvOrDefault(AvailableNameSpaceEnvName, ClusterRoleKey)
    ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-bata Thanks for your suggestions!

  1. It should be []string{""} which means no namespace options(restriction). If []string{}, it will use default name space defined by controller.v1alpha3/consts.DefaultKatibNamespace.
  2. I have replaced with init function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for your explanation!

)

func init() {
ns := env.GetEnvOrDefault(AvailableNameSpaceEnvName, ClusterRoleKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions here

  1. Is it right to get the namespaces during init function? What happens if env variable is changed later?
  2. GetEnvOrDefault has fallback value as the second argument. Did you mean to keep ClusterRoleKey as the default value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge

  1. To me, this env variable should be set via manifest yaml file. I can't see any motivation to modify this env variable later.
  2. Yes.

// Use "" to get experiments in all namespaces.
jobs, err := k.getExperimentList("", JobTypeNAS)
// Get NAS-related experiments in all available namespaces.
jobs, err := k.getExperimentList(availableNameSpaces, JobTypeNAS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you expect when you have multiple namespaces passed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in Katib Client: https://github.com/kubeflow/katib/blob/master/pkg/util/v1alpha3/katibclient/katib_client.go#L210, we extract only first namespace from the list, so it will not give you all available namespaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich Yes, but value of namespace is fixed on "" (https://github.com/kubeflow/katib/blob/master/pkg/ui/v1alpha3/hp.go#L22) which leads to list experiments in cluster scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge It seems only first namespace will be taken. It is a problem.

@johnugeorge
Copy link
Member

Btwn, what is your use case? Can you explain the deployment requirement?

@sperlingxx
Copy link
Member Author

Btwn, what is your use case? Can you explain the deployment requirement?
In internal k8s clusters of Antfin, service accounts of third party operators (like kubeflow operators) have no access to ClusterRole and ClusterRoleBinding for security reasons.

@andreyvelich
Copy link
Member

Btwn, what is your use case? Can you explain the deployment requirement?
In internal k8s clusters of Antfin, service accounts of third party operators (like kubeflow operators) have no access to ClusterRole and ClusterRoleBinding for security reasons.

For this reason, do you have access only for the namespace where Katib UI was deployed?
Also, in your case, how Katib controller works? Since it is also required ClusterRole to deploy Jobs not in DefaultNamespace?

@sperlingxx
Copy link
Member Author

Btwn, what is your use case? Can you explain the deployment requirement?
In internal k8s clusters of Antfin, service accounts of third party operators (like kubeflow operators) have no access to ClusterRole and ClusterRoleBinding for security reasons.

For this reason, do you have access only for the namespace where Katib UI was deployed?
Also, in your case, how Katib controller works? Since it is also required ClusterRole to deploy Jobs not in DefaultNamespace?

In my company, all machine learning related resources are shared the same namespace. So, cluster role is not required.

@andreyvelich
Copy link
Member

If Katib UI can't fetch all namespaces, just get Experiments and Namespaces only from namespace where Katib UI is deployed.
What do you think about it @sperlingxx ?

@sperlingxx
Copy link
Member Author

If Katib UI can't fetch all namespaces, just get Experiments and Namespaces only from namespace where Katib UI is deployed.
What do you think about it @sperlingxx ?

Good idea!

@johnugeorge
Copy link
Member

If Katib UI can't fetch all namespaces, just get Experiments and Namespaces only from namespace where Katib UI is deployed.
What do you think about it @sperlingxx ?

Good idea!

In this case, do you really need extra environment variable? If you do not have permissions, get experiments from own namespace else do the current way.

@sperlingxx
Copy link
Member Author

sperlingxx commented Apr 14, 2020

@johnugeorge I think we need know from env variable whether cluster role is permitted. In alternative, katib will try to get objects in cluster scope at first. If failed then fallback to get objects from own namespace.

@johnugeorge
Copy link
Member

@johnugeorge I think we need whether has permission from env variable. In alternative, katib will try to get objects in cluster scope at first. If failed then fallback to get objects from own namespace.

+1 to this.

@andreyvelich
Copy link
Member

In alternative, katib will try to get objects in cluster scope at first. If failed then fallback to get objects from own namespace.

I agree with it. Katib UI should try to get objects from all namespaces and if it fails, just get object from own namespace.

@sperlingxx
Copy link
Member Author

@johnugeorge @andreyvelich I think current PR is ready.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sperlingxx Thanks!
/lgtm
/cc @johnugeorge

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 109d8f8 into kubeflow:master Apr 16, 2020
sperlingxx added a commit to sperlingxx/katib that referenced this pull request Jul 9, 2020
…eflow#1141)

* katib ui: adapt environments which cluster role is unavailable

* use init_fn

* fix gofmt

* update

* fix error message

* fallback plan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants