Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI create multi node cluster failures by nerdctl #3534

Merged
merged 1 commit into from
Mar 3, 2024

Conversation

yankay
Copy link
Member

@yankay yankay commented Feb 28, 2024

The CI uses the container installed by apt-get install docker-ce. The version may not be right.
So, it has been changed to the "nerdctl full package" to fix CI nerdctl test failures.

Fixes #3533

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 28, 2024
@yankay
Copy link
Member Author

yankay commented Feb 28, 2024

The error logs change to

Command Output: time="2024-02-28T06:55:49Z" level=fatal msg="failed to verify networking settings: failed to create default network: subnet 10.4.0.0/24 overlaps with other one on this address space"

It may relate to: containerd/nerdctl#1371

@yankay yankay closed this Feb 28, 2024
@yankay yankay reopened this Feb 28, 2024
@yankay yankay changed the title [WIP]Fix CI create multi node cluster failures by nerdctl Fix CI create multi node cluster failures by nerdctl Feb 28, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2024
@yankay yankay changed the title Fix CI create multi node cluster failures by nerdctl [wip]Fix CI create multi node cluster failures by nerdctl Feb 28, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2024
@yankay yankay changed the title [wip]Fix CI create multi node cluster failures by nerdctl Fix CI create multi node cluster failures by nerdctl Feb 28, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 28, 2024
# Install nerdctl full package
sudo curl -sSL https://github.com/containerd/nerdctl/releases/download/v${NERDCTL_VERSION}/nerdctl-full-${NERDCTL_VERSION}-linux-amd64.tar.gz | sudo tar -xvz -C /usr/local
# Start Containerd
sudo curl -sSL https://raw.githubusercontent.com/containerd/containerd/main/containerd.service > containerd.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have pinned dependencies in ci and not a moving target

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aojea

It has been changed by using the containerd.service included by nerd-full pacakge.

sudo ctr version
# Install CNI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these no longer required?

Copy link
Member Author

@yankay yankay Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @aojea

Because the nerdctl-full contains the CNI dependency, it also includes:

# nerdctl (full distribution)
- nerdctl: v1.7.4
- containerd: v1.7.13
- runc: v1.1.12
- CNI plugins: v1.4.0
- BuildKit: v0.12.5
- Stargz Snapshotter: v0.15.1
- imgcrypt: v1.1.9
- RootlessKit: v2.0.1
- slirp4netns: v1.2.2
- bypass4netns: v0.4.0
- fuse-overlayfs: v1.13
- containerd-fuse-overlayfs: v1.0.8
- Kubo (IPFS): v0.26.0
- Tini: v0.19.0
- buildg: v0.4.1

ref to: https://github.com/containerd/nerdctl/releases/tag/v1.7.4

Using the nerdctl-full instead of Minimal Package can make the CI easy to maintaince, because the nerdctl manage the dependency version mapping.

@yankay yankay force-pushed the fix-ci branch 4 times, most recently from 0b5a969 to 7c0b28d Compare March 3, 2024 04:40
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
@aojea
Copy link
Contributor

aojea commented Mar 3, 2024

/lgtm
/approve

Thanks for keeping the CI healthy

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, yankay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2024
@k8s-ci-robot k8s-ci-robot merged commit 6c8c0f0 into kubernetes-sigs:main Mar 3, 2024
21 checks passed
@aojea
Copy link
Contributor

aojea commented Mar 5, 2024

I´ve seen this failure https://github.com/kubernetes-sigs/kind/actions/runs/8067092392/job/22036693761?pr=3530

ERROR: failed to create cluster: command "nerdctl run --name kind-worker2 --hostname kind-worker2 --label io.x-k8s.kind.role=worker --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false kindest/node:v1.29.2@sha256:51a1434a5397193442f0be2a297b488b6c919ce8a3931be0ce822606ea5ca245" failed with error: exit status 1

Command Output: time="2024-02-27T15:19:47Z" level=fatal msg="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time="2024-02-27T15:19:42Z" level=fatal msg="failed to call cni.Setup: plugin type=\"firewall\" failed (add): running [/usr/sbin/iptables -t filter -N CNI-FORWARD --wait]: exit status 4: iptables v1.8.7 (nf_tables): CHAIN_USER_ADD failed (File exists): chain CNI-FORWARD\n"\nFailed to write to log, write /var/lib/nerdctl/1935db59/containers/default/6d22f642f1a5ec76f7a3b0ad7d1852ba8002b5572215f8713e4b95f3bc62ed7a/oci-hook.createRuntime.log: file already closed: unknown"

@aojea
Copy link
Contributor

aojea commented Apr 1, 2024

This is still failing

https://github.com/kubernetes-sigs/kind/actions/runs/8505888448/job/23295140055?pr=3563

RROR: failed to create cluster: command "nerdctl run --name kind-worker2 --hostname kind-worker2 --label io.x-k8s.kind.role=worker --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false kindest/node:v1.29.2@sha256:51a1434a5397193442f0be2a297b488b6c919ce8a3931be0ce822606ea5ca245" failed with error: exit status 1

Command Output: time="2024-04-01T08:34:37Z" level=fatal msg="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time=\"2024-04-01T08:34:34Z\" level=fatal msg=\"failed to call cni.Setup: plugin type=\\\"firewall\\\" failed (add): running [/usr/sbin/iptables -t filter -N CNI-ISOLATION-STAGE-2 --wait]: exit status 4: iptables v1.8.7 (nf_tables):  CHAIN_USER_ADD failed (File exists): chain CNI-ISOLATION-STAGE-2\\n\"\nFailed to write to log, write /var/lib/nerdctl/1935db59/containers/default/18e88fcb538d49417539810b[25](https://github.com/kubernetes-sigs/kind/actions/runs/8505888448/job/23295140055?pr=3563#step:8:26)67922886e120771be00f165b5d64cf41a381f5/oci-hook.createRuntime.log: file already closed: unknown"

Stack Trace: 
sigs.k8s.io/kind/pkg/errors.WithStack
	sigs.k8s.io/kind/pkg/errors/errors.go:59
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
	sigs.k8s.io/kind/pkg/exec/local.go:124
sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl.createContainerWithWaitUntilSystemdReachesMultiUserSystem
	sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl/provision.go:383
sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl.planCreation.func3
	sigs.k8s.io/kind/pkg/cluster/internal/providers/nerdctl/provision.go:123
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
	sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
	runtime/asm_amd64.s:1598
Error: Process completed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flaky test: Create multi node cluster error with nerdctl
3 participants