Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup CI against blueprint deployments #42

Closed
jlewi opened this issue Jun 8, 2020 · 6 comments
Closed

Setup CI against blueprint deployments #42

jlewi opened this issue Jun 8, 2020 · 6 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jun 8, 2020

We should run the example tests continuously against the blueprint deployments

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.97

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 9, 2020

I have start working on

kubeflow/testing#676 - This should setup ACM on the kf-ci-v1 cluster so we can use it to synchronize the tekton resources we will use. It will also include some fixes to the tekton resources.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.73

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

jlewi pushed a commit to jlewi/testing that referenced this issue Jun 9, 2020
…ployments.

* Update get_kf_testing_cluster to work with blueprints
  * With GCP blueprints we won't have deployments so we need to
    list clusters by name and find cluster with matching regex.

* Create an ACM repo for kf-ci-v1 cluster and hydrate it with the appropriate
  manifests for the auto-deploy and tektoncd namespaces.

  * kubeflow#677 is tracking using ACM with the cluster kf-ci-v1
  * Note this PR is only using ACM to sync tekton resources; we still need
    to sync the rest of the auto-deployment infrastructure like the reconciler
    and webserver.

  * Remove ACM cluster selector; ACM complains because it isn't actually being applied.

* Override notebook-test-task.yaml with nb-test-task.yaml. The latter should include the latest changes from Gabriel's PR.

* When deploying the blueprint we need to do `kpt cfg set email` to
  set the email for the default profile. Without this change
  the deployment won't include the namespace we need to run
  the tests in

GoogleCloudPlatform/kubeflow-distribution#42 is tracking CI for the blueprints.
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 9, 2020
…ployments.

* Update get_kf_testing_cluster to work with blueprints
  * With GCP blueprints we won't have deployments so we need to
    list clusters by name and find cluster with matching regex.

* Create an ACM repo for kf-ci-v1 cluster and hydrate it with the appropriate
  manifests for the auto-deploy and tektoncd namespaces.

  * kubeflow#677 is tracking using ACM with the cluster kf-ci-v1
  * Note this PR is only using ACM to sync tekton resources; we still need
    to sync the rest of the auto-deployment infrastructure like the reconciler
    and webserver.

  * Remove ACM cluster selector; ACM complains because it isn't actually being applied.

* Override notebook-test-task.yaml with nb-test-task.yaml. The latter should include the latest changes from Gabriel's PR.

* When deploying the blueprint we need to do `kpt cfg set email` to
  set the email for the default profile. Without this change
  the deployment won't include the namespace we need to run
  the tests in

GoogleCloudPlatform/kubeflow-distribution#42 is tracking CI for the blueprints.
k8s-ci-robot pushed a commit to kubeflow/testing that referenced this issue Jun 9, 2020
…ployments. (#678)

* Update get_kf_testing_cluster to work with blueprints
  * With GCP blueprints we won't have deployments so we need to
    list clusters by name and find cluster with matching regex.

* Create an ACM repo for kf-ci-v1 cluster and hydrate it with the appropriate
  manifests for the auto-deploy and tektoncd namespaces.

  * #677 is tracking using ACM with the cluster kf-ci-v1
  * Note this PR is only using ACM to sync tekton resources; we still need
    to sync the rest of the auto-deployment infrastructure like the reconciler
    and webserver.

  * Remove ACM cluster selector; ACM complains because it isn't actually being applied.

* Override notebook-test-task.yaml with nb-test-task.yaml. The latter should include the latest changes from Gabriel's PR.

* When deploying the blueprint we need to do `kpt cfg set email` to
  set the email for the default profile. Without this change
  the deployment won't include the namespace we need to run
  the tests in

GoogleCloudPlatform/kubeflow-distribution#42 is tracking CI for the blueprints.
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 10, 2020
* Use ACM to deploy the auto-deploy infrastructure

  * Previously we were using ACM to manage Tekton resources but not
    the auto-deploy server and reconciler

  * Move the Makefile with the hydration rules to the root of the repository

Fix double deployments of blueprints kubeflow#666
  * We were reading pipelineruns twice because of symbolic links

* Related to GoogleCloudPlatform/kubeflow-distribution#42 CI for blueprint deployments
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 10, 2020
* Use ACM to deploy the auto-deploy infrastructure

  * Previously we were using ACM to manage Tekton resources but not
    the auto-deploy server and reconciler

  * Move the Makefile with the hydration rules to the root of the repository

Fix double deployments of blueprints kubeflow#666
  * We were reading pipelineruns twice because of symbolic links

* Related to GoogleCloudPlatform/kubeflow-distribution#42 CI for blueprint deployments

* Update OWNERs file.
  * Add Reming and Yuan; remove folks no longer actively working on Kubeflow
@jlewi
Copy link
Contributor Author

jlewi commented Jun 10, 2020

With kubeflow/testing#679 and kubeflow/examples#803 I'm able to run the tekton pipeline to run mnist.
https://github.com/kubeflow/testing/blob/master/tekton/testing/nb-test-run.yaml

Running mnist is currently failing

  • It looks like the example notebook is installing a version of fairing that has a dependency on the Azure SDK that is broken

    • This looks like a related issue with the examples that should have been fixed.
  • It looks like there might be an issue uploading the notebook to GCS.

k8s-ci-robot pushed a commit to kubeflow/testing that referenced this issue Jun 10, 2020
)

* Use ACM to deploy the auto-deploy infrastructure

  * Previously we were using ACM to manage Tekton resources but not
    the auto-deploy server and reconciler

  * Move the Makefile with the hydration rules to the root of the repository

Fix double deployments of blueprints #666
  * We were reading pipelineruns twice because of symbolic links

* Related to GoogleCloudPlatform/kubeflow-distribution#42 CI for blueprint deployments

* Update OWNERs file.
  * Add Reming and Yuan; remove folks no longer actively working on Kubeflow
@jlewi
Copy link
Contributor Author

jlewi commented Jun 10, 2020

I think we need to solve:
kubeflow/testing#613 (comment)

k8s-ci-robot pushed a commit to kubernetes/test-infra that referenced this issue Jun 12, 2020
jlewi pushed a commit to jlewi/gcp-blueprints that referenced this issue Jun 15, 2020
* Write a golang test to verify we can hydrate the manifests

* Trigger go tests under prow

* Related to GoogleCloudPlatform#42
k8s-ci-robot pushed a commit that referenced this issue Jun 17, 2020
* Add CI to verify hydration works.

* Write a golang test to verify we can hydrate the manifests

* Trigger go tests under prow

* Related to #42

* Latest.

* Latest.

* Fix labels.

* * Fix the tests by moving go.mod to the root of the repo. It looks
  like that is what the go unittests task currently requires.
@jlewi
Copy link
Contributor Author

jlewi commented Jun 26, 2020

CI is largely setup. Follow on work is being covered by the more specific issues that have been linked to this issue.

@jlewi jlewi closed this as completed Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant