-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get the mnist notebook tests to pass #65
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/kf-ci/pipelineruns/mnist-wc6fq |
Error is:
|
Python path is wrong Should be /srcCache |
* Python path should be /srcCache not /src * Related to GoogleCloudPlatform/kubeflow-distribution#65
* Python path should be /srcCache not /src * Related to GoogleCloudPlatform/kubeflow-distribution#65
* Python path should be /srcCache not /src * Related to GoogleCloudPlatform/kubeflow-distribution#65
The latest run The copy buckets command ends up being
Copy artifacts step is also failing
|
Logs for the papermill job Running the notebook fails with an import error
|
The path to upload the notebook to GCS doesn't look wrong and there is an exception trying to upload it
|
* Related GoogleCloudPlatform/kubeflow-distribution#51 get-credentials isn't finding any clusters because when using Fire the parameter should be --pattern not --location * Related GoogleCloudPlatform/kubeflow-distribution#65 When copying the bucket output in the notebook tests the parameter should be params.notebook-output not params.output
* Related GoogleCloudPlatform/kubeflow-distribution#51 get-credentials isn't finding any clusters because when using Fire the parameter should be --pattern not --location * Related GoogleCloudPlatform/kubeflow-distribution#65 When copying the bucket output in the notebook tests the parameter should be params.notebook-output not params.output
Latest run Latest failure
|
Here was the command executed
So looks like a bug in the command. |
…ploy a stateful set that is in sync with the worker image (#713) * To support debugging of the test worker image deploy a stateful set that is in sync with the worker image * Fix get-credentials in notebook tasks * Related to GoogleCloudPlatform/kubeflow-distribution#65
Latest run Copying papermill output failed
Are we not setting the workload identity correctly? No results are reported in test grid
Looks like the directory for junit_notebook.xml is wrong. Should be in the artifacts subdirectory. |
* We need to upload the junits to the artifacts/junit_* directory Related to GoogleCloudPlatform/kubeflow-distribution#65
* We need to upload the junits to the artifacts/junit_* directory Related to GoogleCloudPlatform/kubeflow-distribution#65
Filed kubeflow/examples#806 about the actual error in the notebook |
* We need to upload the junits to the artifacts/junit_* directory Related to GoogleCloudPlatform/kubeflow-distribution#65
* We need to upload the junits to the artifacts/junit_* directory Related to GoogleCloudPlatform/kubeflow-distribution#65
I did some small attempts to also test this on the mpi-operator repo, just triggering the minist notebook but it seems to fail as well. I don't fully understand if it is due to the same issue. Keeping track on this to fix it. |
kubeflow/testing#716 fixed the issue with the junit file not being copied to GCS. |
* We need to upload the junits to the artifacts/junit_* directory Related to GoogleCloudPlatform/kubeflow-distribution#65
I have tried it out and get some issues with Junit as I understand the logs: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_mpi-operator/244/kubeflow-mpi-operator-presubmit/1279419183024574464/ . Currently test with a dummy notebook just which only prints "Done". |
kubeflow/examples#807 should fix the KFServing error.
This is most likely due to problems with workload identity. Which is being tracked in #61 |
@NikeNano Your logs indicate that your test harness was unable to find the HTML file on GCS containing the rendered notebook. I suspect its because the job running on the KF cluster didn't have permission to write the rendered notebook to GCS. As of right now the test infra for the notebooks appears to be working. If you are having additional problems with your notebook test please file a separate issue specific to your test and lets use that to track. |
Thanks for the help @jlewi. |
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/kubeflow-gcp-blueprints-master-periodic/1280696285153726467 Looks like a 404 reading image file
The image file doesn't exist
|
The image step writes the image file to: |
So artifacts-gcs isn't set consistently across the steps. |
Success! Still need to fix: kubeflow/testing#715 otherwise we will clobber the HTML file. |
Split off from #42
#63 added the notebook tests. The Tekton workflows are being fired off
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-gcp-blueprints-master-periodic
https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/kf-ci/pipelineruns/mnist-22vjq
The copy buckets step is failing though and that is causing the task to abort before running the step to copy the test artifacts
The text was updated successfully, but these errors were encountered: