Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push the server images from the crossbuild CI #1400

Closed
luxas opened this issue Dec 18, 2016 · 39 comments
Closed

Push the server images from the crossbuild CI #1400

luxas opened this issue Dec 18, 2016 · 39 comments
Assignees

Comments

@luxas
Copy link
Member

luxas commented Dec 18, 2016

Hi,

Can we make the cross-build push the kube-apiserver, controller-manager, scheduler, proxy and hyperkube images on every run? I think it should be pretty straightforward code-wise.

They could and should be in another repository than gcr.io/google_containers, for instance gcr.io/kubernetes-ci. We should soon start the migration from gcr.io/google_containers to something more kubernetes-specific and where we can set different ACLs anyway.

Many many users have asked for this, and it would indeed make it much more convenient to test k8s against HEAD

Can we make this happen soon? Thoughts?
cc @ixdy @spxtr @rmmh @jessfraz

@ixdy
Copy link
Member

ixdy commented Dec 20, 2016

cc @david-mcmahon

This is a good idea, but I'm not sure it will happen in the near future.

One challege: how do we garbage collect old images? We'll build a lot of these images very quickly.

@krzyzacy
Copy link
Member

you can use gcr.io/k8s-testimages if you want, that's the place we put our test images.

@kad
Copy link
Member

kad commented Dec 21, 2016

in fact, images already built and published as tarballs to gs://kubernetes-release-dev/ci/, including ones for pull requests. Example:
gs://kubernetes-release-dev/ci/v1.6.0-alpha.0.353+ce14c180806534-pull-gke-gci/bin/linux/amd64/kube-scheduler.tar

what is missing, is actually to publish those images into some registry (something like gcr.io/kubernetes-dev-ci/ ?)

@ixdy
Copy link
Member

ixdy commented Dec 21, 2016

How are you planning to use the (hypothetical) images in the registry?

@kad
Copy link
Member

kad commented Dec 21, 2016

one example could be validation of PRs locally. kubeadm can be instructed to use images from different registry with specific version tag.

@luxas
Copy link
Member Author

luxas commented Dec 22, 2016

yes, in order to test k8s images against master

we could recycle old ones if we have to

@luxas
Copy link
Member Author

luxas commented Jan 3, 2017

ping @ixdy Could we push the images generated by the crossbuild CI to gcr.io/kubernetes-release-dev and have a CI job that removes all images older than one or two releases (for instance, if latest.txt is v1.8.0-alpha.0+aaa, there would be v1.7.x CI images but not v1.6.x images)

WDYT? I could probably implement it, I think it's quite straightforward code-wise.

@ixdy
Copy link
Member

ixdy commented Jan 3, 2017

Sounds good to me. I don't have time to work on this right now, but I can review PRs.

@luxas
Copy link
Member Author

luxas commented Jan 3, 2017

@ixdy I'm not sure I can write the cleanup part, but it's not a high-priority because it would start doing things first in v1.8 anyway... or it could be a Job instead.

Anyway, I can make so it the cross-build pushes the images, sure :)

@ixdy ixdy self-assigned this Mar 9, 2017
@ixdy
Copy link
Member

ixdy commented Mar 9, 2017

not yet, but we'd like to start doing this.

@mikedanese
Copy link
Member

https://github.com/bazelbuild/rules_docker#docker_push would do this for our bazel builds.

@ixdy
Copy link
Member

ixdy commented Jun 20, 2017

The federation projects push the hyperkube image from CI builds, but it seems like only the 10 or so most recent tags are there. I'll check with some of those folks to see how they're doing image lifecycle management.

@ixdy
Copy link
Member

ixdy commented Jun 20, 2017

oh, no, gcloud just limits listing to the 10 most recent tags by default.

@roberthbailey
Copy link
Contributor

Where do the hyperkube images get pushed to? We should consolidate on a single place for pushing all ci images.

@ixdy
Copy link
Member

ixdy commented Jun 20, 2017

the federation hyperkube images are pushed to gcr.io/k8s-jkns-e2e-gce-federation, gcr.io/k8s-jkns-e2e-gce-f8n-1-6, and gcr.io/k8s-jkns-e2e-gce-f8n-1-7.

@roberthbailey
Copy link
Contributor

@ixdy - can you modify the builders to push images? @luxas and I talked about creating a reaper to delete images that are too old (say 30-60 days) but we could probably get by for longer than that if need be. Hopefully we can find a volunteer who wants a bounded starter project to build the reaper.

@ixdy
Copy link
Member

ixdy commented Jun 20, 2017

I don't have bandwidth right now to plumb through all of the changes needed.

Things I imagine are necessary:

  1. Update the kubernetes_build scenario (kubernetes/test-infra/scenarios/kubernetes_build.py) to take a flag indicating where images should be pushed.
  2. Based on this flag, set KUBE_DOCKER_REGISTRY when running the build.
  3. Either update the scenario to also set KUBE_DOCKER_IMAGE_TAG (based on the version), or update the logic in kubernetes/kubernetes/build/lib/release.sh to keep images if KUBE_DOCKER_REGISTRY is set. (Currently it requires both env vars to keep images.)
  4. Update kubernetes/release/push-build.sh to additionally push the images. I think this logic is mostly already in the release repo, but I think it's only used by anago right now. (We'd probably need to add a new flag, set by the scenario.)

Other thoughts:

  • The federation builds push images right now, but in a separate code path with scripts in federation/cluster/. I imagine they could instead use the "normal" build/push setup as described above if implemented. @madhusudancs @shashidharatd
  • There's yet another way of doing builds/pushes, kubetest, which wraps some of these steps and replaces others. I'm not sure whether to focus effort there or in the scenario.
  • Bazel requires different (fewer) changes, though Bazel doesn't support non-amd64 yet. :(

@madhusudancs
Copy link
Contributor

@ixdy we are ready to move to whatever is the "normal" way. So yeah, I am fine moving to the approach you described above, if it is built.

@luxas
Copy link
Member Author

luxas commented Jun 21, 2017

@madhusudancs Would you have time to implement what @ixdy described above?

@roberthbailey
Copy link
Contributor

Alternatively, maybe @fejta can suggest another assignee from the engprod team to help with this task?

@madhusudancs
Copy link
Contributor

@luxas No. But happy to delegate :)

@ixdy
Copy link
Member

ixdy commented Jun 21, 2017

Thinking about this a bit more, I'm concerned about cleanup of docker images on development machines. In the normal (non-release) build workflow, we build the images, save them out to tarfiles, and then delete. With my proposal in #1400 (comment), we would eventually end up with 10s-100s of images with no clear cleanup mechanism. (I'm not sure docker images -aq | xargs docker rmi counts.)

Which leads me to a different proposal, similar to what Bazel does, and hearkening back to @luxas' suggestion in kubernetes/kubeadm#309:

  1. In the kubernetes build scripts (kubernetes/kubernetes/build/lib/common.sh), save an iimages manifest file along with the images being built - basically map the docker tarfiles to tags.
  2. Otherwise, continue as before - delete the images locally after they've been saved.
  3. Add functionality to kubernetes/release/push-build.sh:
    a. read the manifest file
    b. docker load each tarfile listed and retag if needed
    c. docker push the new tag
    d. delete the loaded docker images
  4. Update the test-infra build scenario to add necessary flags to push-build.sh.

I think this is overall less work than my earlier suggestion.

This would also probably help prevent something like kubernetes/kubernetes#47307 from happening again, if we add the push-build functionality to anago, too.

@ixdy
Copy link
Member

ixdy commented Jun 22, 2017

I have a basic POC of my last proposal in ixdy/kubernetes@93fcdc1 and ixdy/kubernetes-release@f251200.

It doesn't handle hyperkube, though, since hyperkube has a different and inconsistent workflow - we don't save it as a tarfile anywhere, and I'm not sure where we would save it. We certainly don't want to bundle it in the server tarball.

@ixdy
Copy link
Member

ixdy commented Jun 23, 2017

kubernetes/kubernetes#47939 and kubernetes/release#355 seem to work in local testing. If those get merged, next steps would be to update the various build jobs to set KUBE_BUILD_HYPERKUBE=y before builds (if needed) and pass --docker-registry to release/push-build.sh.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 24, 2017
Automatic merge from submit-queue (batch tested with PRs 47650, 47936, 47939, 47986, 48006)

Save docker image tarfiles in _output/release-images/$arch/

Additionally, add option `KUBE_BUILD_HYPERKUBE` to build hyperkube
outside of the release flow.

**What this PR does / why we need it**: Saves all of the docker tarfiles in a separate directory that the release scripts can use to push to a docker registry. This is easier than trying to guess which images should be pushed from the local docker engine, and supports work in kubernetes/test-infra#1400. 

If we eventually use this for the official release workflow (`anago`) this may prevent something like #47307 from happening again.

**Release note**:

```release-note
NONE
```

/release-note-none
/assign @luxas @david-mcmahon 
cc @madhusudancs @roberthbailey
@ixdy
Copy link
Member

ixdy commented Jun 26, 2017

we probably need to shave the #3207 yak before this will work for the CI cross build.

@ixdy
Copy link
Member

ixdy commented Jun 26, 2017

another thing I'm trying to figure out: which builds should produce a hyperkube image? Seems like at least ci-cross and federation (ci and pull) need it. Any others?

@roberthbailey
Copy link
Contributor

which builds should produce a hyperkube image?

I thought we were set up so that for each PR we did a single "build" step and then let e2e tests run against that build. In that case, then only the build step would need to build & push images (including hyperkube) and all per-PR tests could be run assuming that the images exist in gcr.io already.

@ixdy
Copy link
Member

ixdy commented Jun 27, 2017

For CI testing we have a single build job which produces the binaries that all of the e2e test jobs use.
For PR testing, we currently build and then test in each job. We have plans to fix this (build only once), but we're not there yet.
Since building hyperkube adds a bit of time, and is potentially flaky (lots of dependencies to download in the image), I'm worried about enabling it on all PR jobs if it's not needed.

@luxas
Copy link
Member Author

luxas commented Jun 28, 2017

For CI testing we have a single build job which produces the binaries that all of the e2e test jobs use.

Perfect!

For PR testing, we currently build and then test in each job. We have plans to fix this (build only once), but we're not there yet.

So basically the bazel or kubernetes-build will be the only "real" presubmits and the we'll hook up a lot of other tests in run_after_completition?

Since building hyperkube adds a bit of time, and is potentially flaky (lots of dependencies to download in the image), I'm worried about enabling it on all PR jobs if it's not needed.

Indeed. But I think the federation job depends on hyperkube...
First of all, we could make the make hyperkube call more flake-proof, by retrying 5 times or so if it fails
What about starting with migrating to one/two "build" jobs that ignore hyperkube and hook up everything but federation in after_completetion and leave federation special for now?

With those improvements we would still be much better off than we're now.

And then the ci-cross job would actually push some builds. Due to its periodic nature, we wouldn't get it run for every single commit, but I don't think that's a problem.

It would be cool to have tags like :latest, :latest-1.7 etc. automatically populated for the pushed images as well 👍

@ixdy
Copy link
Member

ixdy commented Jun 28, 2017

So basically the bazel or kubernetes-build will be the only "real" presubmits and the we'll hook up a lot of other tests in run_after_completition?

That's the idea, yes.

Re: hyperkube: I'm considering creating a debian-hyperkube-base-$arch image that installs all of the necessary dependencies, and then using that as a base when building the hyperkube image.

It'll solve the flakiness and slowness concerns about building the hyperkube image every time, and it'll make building the hyperkube image in bazel (something I haven't done yet) much easier. (You can manage a bunch of deb dependencies in bazel, but it's a pain.)

I'm planning to get the ci-cross job building everything (including hyperkube) and pushing to gcr.io very soon. That should at least enable downstream testing and integration of the artifacts.

I'll tackle the hyperkube image and other jobs (e.g. pull jobs, eliminating federation redundancy, etc) after that.

Re: tags - I worry about that enabling an anti-pattern we want to discourage. I think we don't want to support loading a cluster from arbitrary moving tags, since that may results in different components having different versions, especially if nodes are added later.

If the tags are fully resolved at cluster start time that might be OK, but I worry that's now how they'd be used.

In any case, that's a whole different discussion. :)

@ixdy
Copy link
Member

ixdy commented Jun 29, 2017

ARGH:

W0629 15:32:49.013] /var/lib/jenkins/workspace/ci-kubernetes-cross-build/go/src/k8s.io/release/lib/releaselib.sh: line 895: jq: command not found
W0629 15:32:49.013] tar: manifest.json: Not found in archive
W0629 15:32:49.013] tar: Exiting with failure status due to previous errors

@ixdy
Copy link
Member

ixdy commented Jun 29, 2017

this is a double-fail:

  • The docker daemon on the Jenkins VMs is so old (1.9.1) that it doesn't embed manifest.json.
  • The Jenkins VMs don't have jq installed.

@ixdy
Copy link
Member

ixdy commented Jun 30, 2017

First set of images have appeared:

$ gcloud container images list --repository gcr.io/kubernetes-ci-images 
NAME
gcr.io/kubernetes-ci-images/cloud-controller-manager-amd64
gcr.io/kubernetes-ci-images/cloud-controller-manager-arm
gcr.io/kubernetes-ci-images/cloud-controller-manager-arm64
gcr.io/kubernetes-ci-images/cloud-controller-manager-ppc64le
gcr.io/kubernetes-ci-images/cloud-controller-manager-s390x
gcr.io/kubernetes-ci-images/cloud-controller-manager
gcr.io/kubernetes-ci-images/hyperkube-amd64
gcr.io/kubernetes-ci-images/hyperkube-arm
gcr.io/kubernetes-ci-images/hyperkube-arm64
gcr.io/kubernetes-ci-images/hyperkube-ppc64le
gcr.io/kubernetes-ci-images/hyperkube-s390x
gcr.io/kubernetes-ci-images/hyperkube
gcr.io/kubernetes-ci-images/kube-aggregator-amd64
gcr.io/kubernetes-ci-images/kube-aggregator-arm
gcr.io/kubernetes-ci-images/kube-aggregator-arm64
gcr.io/kubernetes-ci-images/kube-aggregator-ppc64le
gcr.io/kubernetes-ci-images/kube-aggregator-s390x
gcr.io/kubernetes-ci-images/kube-aggregator
gcr.io/kubernetes-ci-images/kube-apiserver-amd64
gcr.io/kubernetes-ci-images/kube-apiserver-arm
gcr.io/kubernetes-ci-images/kube-apiserver-arm64
gcr.io/kubernetes-ci-images/kube-apiserver-ppc64le
gcr.io/kubernetes-ci-images/kube-apiserver-s390x
gcr.io/kubernetes-ci-images/kube-apiserver
gcr.io/kubernetes-ci-images/kube-controller-manager-amd64
gcr.io/kubernetes-ci-images/kube-controller-manager-arm
gcr.io/kubernetes-ci-images/kube-controller-manager-arm64
gcr.io/kubernetes-ci-images/kube-controller-manager-ppc64le
gcr.io/kubernetes-ci-images/kube-controller-manager-s390x
gcr.io/kubernetes-ci-images/kube-controller-manager
gcr.io/kubernetes-ci-images/kube-proxy-amd64
gcr.io/kubernetes-ci-images/kube-proxy-arm
gcr.io/kubernetes-ci-images/kube-proxy-arm64
gcr.io/kubernetes-ci-images/kube-proxy-ppc64le
gcr.io/kubernetes-ci-images/kube-proxy-s390x
gcr.io/kubernetes-ci-images/kube-proxy
gcr.io/kubernetes-ci-images/kube-scheduler-amd64
gcr.io/kubernetes-ci-images/kube-scheduler-arm
gcr.io/kubernetes-ci-images/kube-scheduler-arm64
gcr.io/kubernetes-ci-images/kube-scheduler-ppc64le
gcr.io/kubernetes-ci-images/kube-scheduler-s390x
gcr.io/kubernetes-ci-images/kube-scheduler
$ gcloud container images list-tags gcr.io/kubernetes-ci-images/hyperkube-amd64
DIGEST        TAGS                               TIMESTAMP
1c31b4837f61  v1.8.0-alpha.1.602_231c0783ed97bc  2017-06-29T21:46:20

@luxas
Copy link
Member Author

luxas commented Jun 30, 2017

@ixdy WOOOOT 🎉!!!
Thank you!

I'll go ahead and make kubeadm able to pick those up 👍

@ixdy
Copy link
Member

ixdy commented Jun 30, 2017

FYI gs://kubernetes-release-dev/ci-cross/latest.txt will contain the latest version (it is published after everything has been pushed), but you'll need to convert the + in the version to a _ for the docker tag.

@luxas
Copy link
Member Author

luxas commented Jul 5, 2017

@ixdy I think this can be closed now, right?
Or is there anything more you're planning to implement?

@ixdy
Copy link
Member

ixdy commented Jul 6, 2017

I'd like to get federation builds using these images, and we might want to have some PR jobs (e.g. the kubeadm one) uploading their images, but those can probably be separate efforts.

@luxas
Copy link
Member Author

luxas commented Jul 6, 2017

You choose ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants