Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade ks app for E2E test to 0.3.0 #2338

Merged
merged 1 commit into from
Feb 13, 2019

Conversation

stpabhi
Copy link
Member

@stpabhi stpabhi commented Jan 26, 2019

This PR updates api version to 0.3.0 so that test infra ksonnet version is upgraded to v0.13.1 as we are recommending the same in the docs.

Signed-off-by: Abhilash Pallerlamudi stp.abhi@gmail.com


This change is Reviewable

@stpabhi
Copy link
Member Author

stpabhi commented Jan 26, 2019

/assign @jlewi
/assign @r2d4

@stpabhi
Copy link
Member Author

stpabhi commented Jan 26, 2019

/test kubeflow-presubmit

@jlewi
Copy link
Contributor

jlewi commented Jan 28, 2019

It looks like there was a problem starting the tests.

You can follow these instructions
https://github.com/kubeflow/testing#no-results-show-up-in-gubernator

To use the prow jobs UI to get pod logs to see what went wrong.

@stpabhi
Copy link
Member Author

stpabhi commented Jan 28, 2019

/test kubeflow-presubmit

@stpabhi
Copy link
Member Author

stpabhi commented Jan 28, 2019

@jlewi tried to get pod logs but not enough information. I tried to setup a local kubeflow-test-infra and ran a debug pod to check and found that ks-13 binary is missing in /usr/local/bin. Do you have access to debug pod on prow?

Update: found that ks-13 tarball was not downloaded. Will check the Dockerfile and open a fix.

@stpabhi
Copy link
Member Author

stpabhi commented Jan 28, 2019

gcr.io/kubeflow-ci/test-worker:latest is old and doesn't have the ks-13 binaries. Should we update the tag?

@jlewi
Copy link
Contributor

jlewi commented Jan 29, 2019

@stpabhi You can update the image. You should pin to a specific image like we do here
https://github.com/kubeflow/examples/blob/89e960202abf9b065c67bf96b99894564fdb7f37/test/workflows/components/mnist.jsonnet#L18

As opposed to using a relative tag.

Did you run ks upgrade to upgrade the app?

The reason I ask is because I would have thought upgrading to 0.13 made other changes such as the location of .libsonnet files.

@stpabhi
Copy link
Member Author

stpabhi commented Jan 29, 2019

Yes, I did run ks upgrade. It complains that upgraded lib already exists which is go/src/github.com/kubeflow/kubeflow/testing/workflows/lib/ksonnet-lib/v1.7.0. I will update the testing images to one that has ks-13 bin.

@stpabhi
Copy link
Member Author

stpabhi commented Jan 29, 2019

@jlewi Image has to be updated and merged before updating app?

@jlewi
Copy link
Contributor

jlewi commented Jan 30, 2019

@stpabhi It shouldn't. The presubmits will use whatever workflow you have defined on the PR. So if your PR updates the image used for the worker then that's what should get used.

@stpabhi
Copy link
Member Author

stpabhi commented Jan 30, 2019

@jlewi Got it. I’m not sure what else I have missed. Is it possible to run on Minikube without gcloud services so that it’s easy to debug?

@jlewi
Copy link
Contributor

jlewi commented Jan 30, 2019

Your test failed.

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/src/kubeflow/testing/py/kubeflow/testing/run_e2e_workflow.py", line 386, in <module> final_result = main() File "/src/kubeflow/testing/py/kubeflow/testing/run_e2e_workflow.py", line 376, in main return run(args, file_handler) File "/src/kubeflow/testing/py/kubeflow/testing/run_e2e_workflow.py", line 168, in run util.run([ks_cmd, "version"]) File "/src/kubeflow/testing/py/kubeflow/testing/util.py", line 58, in run command, cwd=cwd, env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) File "/usr/lib/python2.7/subprocess.py", line 711, in __init__ errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child raise child_exceptionOSError: [Errno 2] No such file or directory

I think its refering to binary ks-13 not being found.

I think your image is too old. try using
gcr.io/kubeflow-ci/test-worker/test-worker:v20190116-b7abb8d-e3b0c4

@stpabhi
Copy link
Member Author

stpabhi commented Jan 30, 2019

@jlewi I did use the new images. Let me try again using tags instead of sha checksum

@jlewi
Copy link
Contributor

jlewi commented Jan 30, 2019

I think the sha you were using was pointing to a different image from the one I pasted. The one I pasted should be used by our mnist example which is why I have confidence its up to date.

@stpabhi stpabhi force-pushed the ISSUE-2263 branch 2 times, most recently from c43ea3d to 5e9cf13 Compare January 30, 2019 05:45
@jlewi
Copy link
Contributor

jlewi commented Jan 31, 2019

Still seems like its not running. I think to debug it you should submit the test; and then use the prow jobs dashboard to fetch pod logs to see what's failing.

I'm going to mark this WIP while you work on fixing the test; when its fixed please remove WIP from the title.

If you need help lets chat on kubeflow.slack.com

@jlewi jlewi changed the title upgrade ks app for E2E test to 0.3.0 [wip] upgrade ks app for E2E test to 0.3.0 Jan 31, 2019
@stpabhi
Copy link
Member Author

stpabhi commented Jan 31, 2019

/test kubeflow-presubmit

@stpabhi
Copy link
Member Author

stpabhi commented Feb 7, 2019

@jlewi App version 0.2.0 worked. I tried on my gke instance with ks-13 but ran into another issue.

@stpabhi
Copy link
Member Author

stpabhi commented Feb 9, 2019

@jlewi Weird, App version 0.3.0 works now. Could you please review and add the appropriate labels?

@jlewi jlewi changed the title [wip] upgrade ks app for E2E test to 0.3.0 upgrade ks app for E2E test to 0.3.0 Feb 12, 2019
Copy link
Contributor

@jlewi jlewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @abhi-g and @ellis-bigelow)

@jlewi
Copy link
Contributor

jlewi commented Feb 12, 2019

/lgtm
/approve

@stpabhi
Copy link
Member Author

stpabhi commented Feb 12, 2019

/test kubeflow-presubmit

Signed-off-by: Abhilash Pallerlamudi <stp.abhi@gmail.com>
@jlewi
Copy link
Contributor

jlewi commented Feb 13, 2019

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stpabhi
Copy link
Member Author

stpabhi commented Feb 13, 2019

/test kubeflow-presubmit

@k8s-ci-robot k8s-ci-robot merged commit 4ca7cc2 into kubeflow:master Feb 13, 2019
@lluunn
Copy link
Contributor

lluunn commented Feb 13, 2019

Why do we change back to old test-worker image?

@lluunn
Copy link
Contributor

lluunn commented Feb 13, 2019

It breaks to deployapp test since it needs the new change in test worker

lluunn added a commit that referenced this pull request Feb 13, 2019
@stpabhi
Copy link
Member Author

stpabhi commented Feb 14, 2019

@lluunn Is this specific to #2463? Notebooks test did not fail in the presubmit test for this PR https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/kubeflow_kubeflow/2338/kubeflow-presubmit/5676/

@lluunn
Copy link
Contributor

lluunn commented Feb 14, 2019

saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
Signed-off-by: Abhilash Pallerlamudi <stp.abhi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants