Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to include images in custom node build #3634

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stmcginnis
Copy link
Contributor

@stmcginnis stmcginnis commented May 31, 2024

Updated / alternate take on #2303.

This adds a new kind build add-image command that provides the ability to "preload" container images in a custom node image.

Closes: #2160

Help output

Updated kind build output:

$ kind build
Build a kind node image by creating a new image from source, or combining existing images in a new image.

Usage:
  kind build [command]

Available Commands:
  add-image   Update node image with extra images preloaded
  node-image  Build the node image

And add-image help output:

$ kind build add-image -h
Use an existing node image as a base and preload extra container images.

Usage:
  kind build add-image <IMAGE> [IMAGE...] [flags]

Flags:
      --arch string         architecture to build for, defaults to the host architecture
      --base-image string   name:tag of the base image to use for the build (default "kindest/node:v1.30.0@sha256:047357ac0cfea04663786a612ba1eaba9702bef25227a794b52890dd8bcd692e")
  -h, --help                help for add-image
      --image string        name:tag of the resulting image to be built (default "kindest/custom-node:latest")

Command execution

The new command can be run to preload more container images into the new node image, using the default base image and target image:

$ kind build add-image nginx postgres golang:1.22
Adding [nginx postgres golang:1.22] images to base image
Creating build container based on "kindest/node:v1.30.0@sha256:047357ac0cfea04663786a612ba1eaba9702bef25227a794b52890dd8bcd692e"
Importing images into build container kind-build-1717165213-626335550
Saving new image kindest/custom-node:latest
sha256:38ad8e7460ce80324d4a9dbb76a94f7c5434f7ea1d6260c8566f9c797933c6c3
Add image build completed.

New cluster creation

The custom node image can then be used to create a new cluster:

$ cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/custom-node:latest
EOF
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/custom-node:latest) 🖼
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋

Verification

We can see that the cluster created with our modified node image has the expected images included:

$ docker exec -it kind-control-plane crictl images
IMAGE                                           TAG                  IMAGE ID            SIZE
...
docker.io/library/golang                        1.22                 5905f95343e84       846MB
docker.io/library/nginx                         latest               4f67c83422ec7       192MB
docker.io/library/postgres                      latest               cff6b68a194a6       439MB
...

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 31, 2024
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 31, 2024
@BenTheElder
Copy link
Member

We have a real problem with cross-architecture, there's going to be really confusing results if it's from local side-loading instead of pulling. I'm wondering if we should just only permit pull-based and require using a local or remote registry for adding additional pre-baked images.

@BenTheElder
Copy link
Member

(the docker/podman local store doesn't really have a good way to keep an image at both architectures, they have to have unique names)

@BenTheElder
Copy link
Member

(also don't take that as a given and rewrite everything yet ... just a thought .... this is kind of tricky, there's a lot of confusing behaviors and we should do our best to avoid confusion)

@stmcginnis
Copy link
Contributor Author

(the docker/podman local store doesn't really have a good way to keep an image at both architectures, they have to have unique names)

I've been doing some additional testing, and it looks like even the existing kind build node-image --arch $ARCH doesn't work. Should we just drop this flag and require building on a host that matches your desired target arch?

@BenTheElder
Copy link
Member

I've been doing some additional testing, and it looks like even the existing kind build node-image --arch $ARCH doesn't work. Should we just drop this flag and require building on a host that matches your desired target arch?

That's a bug, because I use this to publish multi-arch images on every release:
https://github.com/kubernetes-sigs/kind/blob/main/hack/release/build/push-node.sh

@BenTheElder
Copy link
Member

BenTheElder commented May 31, 2024

(For kind build node-image we pull images OR we side-load architecture specific locally built images from k8s builds, it's possible we regressed after the recent work to add new types other than source builds, or that there's a bug that happens in your environment but not mine)

@stmcginnis
Copy link
Contributor Author

or that there's a bug that happens in your environment but not mine

That may very well be the case - I would not be surprised at all. If you (or someone else that has this working) are able to test this change in your environment, that would make me more comfortable.

So I'm not clear from the comments above if something more is needed here. We are pulling only, not side-loading from local files. But do you mean not even that? That we should spin up the builder container, then pull from within there directly?

@stmcginnis
Copy link
Contributor Author

OK, I played around a bit, and it's actually a little cleaner doing it the way I think you were suggesting. Updated to pull directly from the builder container. Just let me know if I'm off track here. ;)

@xlf12
Copy link

xlf12 commented Jun 13, 2024

THX for this PR ...

I tried it on my arm Mac and build a kind node image twice once for amd64 and then for amr64. That part work very well.
Afterwards I created a multi-archive manifest by hand, done :-)

A small issue popped up ...

In this two step approach there is this base image kindest/node:v1.30.0@sha256:047357... used. Basically an archive for amd64 and then another archive for arm64 pulled is pull by my docker desktop locally to do the build for and or arm.

Well, both archive have the same multi-archive hash, so that the second step failed because it seems to be the case that we can't have two different kindest/node:v... images below one (multi-archive) sha256 hash in place at the local machine.

No real problem. After each step of building the custom node images I pushed it into my registry. I cleared everything locally and the way was free. Another new multi-archive custom node image could be build again for the other arch. In a final step by pulling the custom images again from it and creating the multi-archive by hand.

@stmcginnis
Copy link
Contributor Author

Awesome, thanks for trying things out and reporting back @xlf12! Great to hear it's (mostly) working for someone else too.

Well, both archive have the same multi-archive hash, so that the second step failed because it seems to be the case that we can't have two different kindest/node:v... images below one (multi-archive) sha256 hash in place at the local machine.

I believe this is a known issue with image building today. IIRC @BenTheElder has mentioned something similar, so maybe he can confirm or give some helpful hints on that.

@xlf12
Copy link

xlf12 commented Jun 14, 2024

The feature itself is so valuable to us that this multi-archive issue IMHO could simply be ignored and perhaps - my Go skills are too poor for go on it - so might be passed on to a future hero.

It's just a bit of a usability issue, I'd say. It doesn't prevent us from creating images in multiple archives. You can always work around that with a script, which works fine. Not everyone will have a use case like that, either.

@BenTheElder
Copy link
Member

I've been doing some additional testing, and it looks like even the existing kind build node-image --arch $ARCH doesn't work. Should we just drop this flag and require building on a host that matches your desired target arch?

I'd like to track down this bug, which might inform us to how to approach adding more image building commands ... I don't think what we have currently is very nice.

The feature itself is so valuable to us that this multi-archive issue IMHO could simply be ignored and perhaps - my Go skills are too poor for go on it - so might be passed on to a future hero.

I'd rather not do this sort of "ignore it until later and then need to majorly redesign and introduce breaking changes", there are already alternatives to this entire feature, I recommend https://kind.sigs.k8s.io/docs/user/local-registry/

Well, both archive have the same multi-archive hash, so that the second step failed because it seems to be the case that we can't have two different kindest/node:v... images below one (multi-archive) sha256 hash in place at the local machine.

This is a docker limitation. Docker doesn't actually store images by (image:tag, arch), only by (image:tag/image@sha).

Previously we're forced to use the run + commit approach because we needed --privileged to get containerd running, but we might be able to come up with another work around today, either getting image imports to work without priv or maybe https://docs.docker.com/reference/cli/docker/buildx/build/#allow

If we built images in buildkit mode, it can handle this, and it can natively handle producing a multi-arch artifact (though again only if you push it to a registry, it cannot store it in the local docker storage that you can docker run / kind create cluster directly from, docker's runtime storage is not really multi arch, you need distinct image names)

This is why building the node images is clunky and looks like this:
https://github.com/kubernetes-sigs/kind/blob/0a7403e49c529d22cacd3a3f3606b9d8a5c16ae7/hack/release/build/push-node.sh

(basically how you described building and pushing your image)

@BenTheElder BenTheElder added this to the v0.24.0 milestone Jun 20, 2024
@BenTheElder BenTheElder self-assigned this Jul 9, 2024
@BenTheElder
Copy link
Member

[sorry we are deep in the infra migration stuff, I'm going to dive into this soonish ...]

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 6, 2024
@BenTheElder BenTheElder modified the milestones: v0.24.0, v0.25.0 Aug 14, 2024
This adds a new `kind build add-image` command that provides the ability
to "preload" container images in a custom node image.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
Co-authored-by: Curt Bushko <cbushko@gmail.com>
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stmcginnis
Once this PR has been reviewed and has the lgtm label, please ask for approval from bentheelder. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2024
@rakeshayola
Copy link

When I tested this branch to add images from our local antifactory, I saw below error, any idea why I get this error? Same works fine if I use public images.

host="": failed to do request: Head "*": tls: failed to verify certificate: x509: certificate signed by unknown authority

Stack Trace:
sigs.k8s.io/kind/pkg/errors.WithStack
sigs.k8s.io/kind/pkg/errors/errors.go:59
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/build/internal/container/docker.(*containerCmd).Run
sigs.k8s.io/kind/pkg/build/internal/container/docker/exec.go:111
sigs.k8s.io/kind/pkg/build/internal/build.(*ContainerdImporter).Pull
sigs.k8s.io/kind/pkg/build/internal/build/imageimporter.go:52
sigs.k8s.io/kind/pkg/build/addimage.(*buildContext).Build
sigs.k8s.io/kind/pkg/build/addimage/buildcontext.go:85
sigs.k8s.io/kind/pkg/build/addimage.Build
sigs.k8s.io/kind/pkg/build/addimage/build.go:50
sigs.k8s.io/kind/pkg/cmd/kind/build/addimage.runE
sigs.k8s.io/kind/pkg/cmd/kind/build/addimage/addimage.go:68
sigs.k8s.io/kind/pkg/cmd/kind/build/addimage.NewCommand.func1
sigs.k8s.io/kind/pkg/cmd/kind/build/addimage/addimage.go:45
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.8.0/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.8.0/command.go:1039
sigs.k8s.io/kind/cmd/kind/app.Run
sigs.k8s.io/kind/cmd/kind/app/main.go:53
sigs.k8s.io/kind/cmd/kind/app.Main
sigs.k8s.io/kind/cmd/kind/app/main.go:35
main.main
sigs.k8s.io/kind/main.go:25
runtime.main
runtime/proc.go:271
runtime.goexit
runtime/asm_amd64.s:1695

@BenTheElder
Copy link
Member

host="": failed to do request: Head "*": tls: failed to verify certificate: x509: certificate signed by unknown authority

This is one of the problems with the current approach -- pulling happens inside the node containers and the node containers do not have additional certs installed on your host (most likely that's what's happening here anyhow)

This is not very easy to solve.

@stmcginnis
Copy link
Contributor Author

Hmm, what if we mount /etc/ssl/certs when building? I haven't tried much here, but it seems like we could support this and just document some "caveats".

@BenTheElder
Copy link
Member

Hmm, what if we mount /etc/ssl/certs when building? I haven't tried much here, but it seems like we could support this and just document some "caveats".

We haven't currently settled on a solution to this with the running nodes either, something like that might work but ... that only solves this particular variation on the problem even if it works. (And ... it has portability issues)

The user may also e.g. have docker credential plugins enabled for image pulling (and containerd doesn't even have these, there are CRI auth plugins but that's different).

What would be ideal is if we had a solution to pull images in an architecture specific way without conflicts and then import those into the node, but we haven't yet (and we also have to consider nerdctl and podman these days).

Also, once we commit to either pulling in the node or pulling on the host, users will depend on it, so we will have to decide though.

@BenTheElder
Copy link
Member

We've kinda punted these with kind build node-image because it's simply not supported to interact with third party image hosts except for the base image and that's good enough for building images from upstream releases.

One possible answer is requiring adding via tarballs, but that's less convenient.

On the other hand if we attempt to support pulling from third party hosts and we don't pull via the host runtime, we're going to get a LOT of support issues confused when auth / networking doesn't work.

But if we pull to the host, we're back to the problems we ran into before with architectures, and I don't think we want to build new features that aren't multi-arch aware. That one might be solvable, it needs revisiting.

@BenTheElder
Copy link
Member

I think maybe we should start by just supporting the equivalent of image-archive, and let the caller be responsible for supplying the images with their preferred tooling.

I think this will fall into three use cases:

  1. Locally built images to test
  2. Optimizing download of ancillary images in a test environment
  3. Creating an artifact for airgapped testing
  1. already has robust alternatives in kind load and local registry, but I'm sure people would like other options. For 1) I would encourage decoupling and not using this PR for that purpose and using a local registry or else load.

Those could also work for 2/3) but this could certainly be more convenient, especially because local registry requires replacing the manifests with your local image in the case of third party workloads you are not actively developing, and because loading at runtime is not optimally efficient.

In a subset of those cases, this may be a better option, in even more cases it may just feel more convenient.

In all cases, the actual pull part is painful to do portably, and an image archive will contain the intended tags and also give the user the ability to rewrite tags (pull, retag, save, then run this command).

Maybe we start there and then based on user feedback we can revisit kind handling the pulling?

The user has to identify the set of images anyhow, I don't think telling them to perform the pull and export first is a huge burden and it has efficiency wins (may be cached on the host at their discretion)

then it's up to the user to handle authentication and the architecture issues (some users may be better off using something like crane for multi-arch pull and export, YMMV)

@BenTheElder
Copy link
Member

cc @aojea

@BenTheElder
Copy link
Member

Maybe we start there and then based on user feedback we can revisit kind handling the pulling?

that is, take image archives as the input? and avoid being responsible for image pulling mess, let the user decide between ease and caching (docker pull; docker save ...) versus cross-platform flexibility (crane pull --platform) or local builds (bazel or ko or ...)

@stmcginnis
Copy link
Contributor Author

Just a note: the latest responses have been on my queue to get back to and address, just haven't had the time to lately. I should be able to get to it soon. At first glance, just supporting image archives seems like a reasonable first step to avoid some of the complexity and would add value.

@BenTheElder BenTheElder removed this from the v0.25.0 milestone Oct 16, 2024
@micahhausler
Copy link
Member

FWIW I would love to see this merged. I have a use case where I'm running an authorizing webhook as a static pod and want the webhook image pre-loaded in the kind node image. My workaround today is to run kind create ... and while that spins, run kind load image-archive in another terminal. It's ok for local development, but not easy to automate. Something like this PR would let me serially run kind build add-image authz-webhook:latest my-kind-node:latest and kind create -f kind.yaml without any dancing, and easy to add to CI.

@aojea
Copy link
Contributor

aojea commented Oct 25, 2024

FWIW I would love to see this merged. I have a use case where I'm running an authorizing webhook as a static pod and want the webhook image pre-loaded in the kind node image. My workaround today is to run kind create ... and while that spins, run kind load image-archive in another terminal. It's ok for local development, but not easy to automate. Something like this PR would let me serially run kind build add-image authz-webhook:latest my-kind-node:latest and kind create -f kind.yaml without any dancing, and easy to add to CI.

Dockerfile

FROM kindest/node:v1.31.0

RUN echo "Installing my stuff ..." \

@BenTheElder
Copy link
Member

I think we're going to rewrite it, pending bandwidth, to work around image archives rather than pulling images, for the reasons discussed above.

@BenTheElder
Copy link
Member

Current suggested alternative is to use a local registry for the image. Though Antonio's suggestion is also possible if trickier.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 25, 2024

Though Antonio's suggestion is also possible if trickier.

Also there's no guarantee that we don't make changes to the images that break this in the future, we only support that the node images can create kubernetes clusters. Depending on specific tools to be installed on them is not supported and subject to change. We've changed distros and container runtimes before and we're always looking to minimize the base image size.

So it's workable, but may not be stable ...

The registry approach is pretty stable, just make sure to use the current version that avoids depending on deprecated containerd config :|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ability to include additional images in the kindest/node image
7 participants