Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Azure support for kfctl.sh #2557

Merged
merged 3 commits into from
Mar 8, 2019
Merged

Conversation

ritazh
Copy link
Contributor

@ritazh ritazh commented Feb 26, 2019

kfctl.sh will be going away eventually, so this is just a stop gap to support Azure until the go client kfctl is ready for other cloud providers.

@jlewi @kkasravi PTAL

cc @sozercan @aronchick


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @ritazh. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jlewi
Copy link
Contributor

jlewi commented Feb 27, 2019

/ok-to-test
/assign @jlewi

scripts/azure/util.sh Outdated Show resolved Hide resolved
@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2019

/ok-to-test
/lgtm
/hold to address @sozercan 's comments

@ritazh
Copy link
Contributor Author

ritazh commented Feb 28, 2019

comment has been addressed in new commits

@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2019

The test failure looks like #2574

@jlewi
Copy link
Contributor

jlewi commented Feb 28, 2019

I think #2555 might provide a temporary fix. You might just need to rebase on the latest master to pull that in.

@ritazh
Copy link
Contributor Author

ritazh commented Feb 28, 2019

@jlewi thanks for the recommendation! Looks like it's still failing after the rebase.

@kkasravi
Copy link
Contributor

We can port this to kfctl (golang) based on kfctl.sh or @ritazh you could do the same.
Another possibility is I could create the Azure plugin with hooks and specific azure API calls could be done by you. It looks like we would use the azure-sdk-for-go?

@ritazh
Copy link
Contributor Author

ritazh commented Feb 28, 2019

@kkasravi I think adding azure under https://github.com/kubeflow/kubeflow/blob/master/bootstrap/pkg/client makes sense to me. we should use azure-sdk-for-go.

@ritazh
Copy link
Contributor Author

ritazh commented Mar 1, 2019

bump
is there anything else I need to do?

@jlewi
Copy link
Contributor

jlewi commented Mar 1, 2019

Most recent test failure:
http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-presubmit-kfctl-2557-72ba595-5954-ecd6?tab=workflow

I had to fetch the logs from stackdriver.

POD=kubeflow-presubmit-kfctl-2557-72ba595-5954-ecd6-625146308
gcloud --project=${PROJECT} logging read --format="table(timestamp, resource.labels.container_name, textPayload)" --freshness=24h      --order asc         "resource.type=\"k8s_container\" resource.labels.pod_name=\"${POD}\"  " > ~/tmp/${POD}.log.txt

The apply step failure is

ERROR: (gcloud.container.clusters.get-credentials) cluster kfctl-ecd6 is missing endpoint. Is it still PROVISIONING?

This is most likely because the cluster couldn't be created because us-east1-d was low on capacity.

So its a test flake.
/test all

@ritazh
Copy link
Contributor Author

ritazh commented Mar 6, 2019

/retest

@ritazh
Copy link
Contributor Author

ritazh commented Mar 6, 2019

@jlewi can you PTAL at the failed job? This PR shouldnt impact any of the tests that run on GCP.

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

There was some problems with the test-infra see #324

/test all

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

/lgtm
/approve

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

latest failure looks like a flake with katib.
/test all

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

Most recent failure is

level=error msg="no prototype names matched 'spark-operator'"

Not sure where that would be coming from.

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

/test all

1 similar comment
@ritazh
Copy link
Contributor Author

ritazh commented Mar 7, 2019

/test all

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

Latest failures

Traceback (most recent call last):
 File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
   "__main__", fname, loader, pkg_name)
 File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
   exec code in run_globals
 File "/mnt/test-data-volume/kubeflow-presubmit-kfctl-2557-72ba595-6107-bdc7/src/kubeflow/kubeflow/testing/tf_job_simple_test.py", line 30, in <module>
   from py import tf_job_client
ImportError: cannot import name tf_job_client

This looks like #2574.

@ritazh sorry this is turning into such a marathon. We're hoping to make a big push on our test-infra in Q2/0..

Can you try rebasing on the latest master to pull in the fixes?

@ritazh ritazh force-pushed the feat-kfctl-azure branch from 72ba595 to 99b1aba Compare March 7, 2019 19:21
@k8s-ci-robot k8s-ci-robot removed the lgtm label Mar 7, 2019
@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

/test all

@ritazh
Copy link
Contributor Author

ritazh commented Mar 8, 2019

/retest

@k8s-ci-robot
Copy link
Contributor

@ritazh: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
kubeflow-presubmit 99b1aba link /test kubeflow-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jlewi
Copy link
Contributor

jlewi commented Mar 8, 2019

Merging manually.

Sorry @ritazh

@jlewi jlewi merged commit 74e4b67 into kubeflow:master Mar 8, 2019
@ritazh
Copy link
Contributor Author

ritazh commented Mar 8, 2019

Thanks @jlewi!

kkasravi pushed a commit to kkasravi/kubeflow that referenced this pull request Mar 8, 2019
* Add Azure support for kfctl.sh

* pass in sku

* update ds
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
* Add Azure support for kfctl.sh

* pass in sku

* update ds
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 12, 2021
* Add Azure support for kfctl.sh

* pass in sku

* update ds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants