Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to runtime.v1 for CRI #125

Closed
endocrimes opened this issue Nov 5, 2022 · 32 comments · Fixed by #139
Closed

Migrate to runtime.v1 for CRI #125

endocrimes opened this issue Nov 5, 2022 · 32 comments · Fixed by #139
Milestone

Comments

@endocrimes
Copy link

endocrimes commented Nov 5, 2022

👋 cri-dockerd friends,

Unfortunately it was bought to our attention today that cri-dockerd still exclusively supports v1alpha2. The Kubelet is removing support for v1alpha2 in 1.26 (kubernetes/kubernetes#110618).

The technical changes required in cri-dockerd should be minimal depending on your plans to continue to support v1alpha2, or only support v1 (cri-o and containerd are both in the process of dropping or have dropped v1alpha2).

For end users of cri-dockerd, assuming v1 support is added, nothing changes: just make sure you're running the correct release of cri-dockerd. If v1 support is not added, kubelet's will fail to start after an upgrade to 1.26.

@evol262 evol262 changed the title cri-api v1alpha2 is being removed from Kubernetes Migrate to runtime.v1 for CRI Nov 6, 2022
@evol262
Copy link
Contributor

evol262 commented Nov 6, 2022

I renamed this issue, to start. The issue description reads much more like a release note, which is somewhat bizarre considering that 1.26 is not GA yet. Yes, v1 support will be added by then

@afbjorklund
Copy link
Contributor

afbjorklund commented Nov 6, 2022

This change went in after alpha.3, so I guess beta.0 will be the first release including this kubelet commit ?

@endocrimes : If you "upgrade" the API support in cri-dockerd, wouldn't that break kubernetes 1.25 and earlier ?

It would break 1.22, the v1 was in 1.23...

https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/#container-runtime-interface-cri-v1-is-default

But this is not an issue, it has dockershim.

i.e. only 1.24 and up will use cri-dockerd

@evol262
Copy link
Contributor

evol262 commented Nov 6, 2022

This is, honestly, something I'll probably broach with upstream also, though I don't expect to get much traction.

I completely understand that upstream wants to move fast and that they'd like CRI/CNI/CSI/CXI maintainers to implement APIs rather than keeping everything in core, and I don't expect upstream to have any kind of "LTS" release, but a one year window which includes enough major releases to take something from "deprecation warning" to "this will break now" is really rough for the rest of the ecosystem.

Kubeflow just caught up with 1.22 recently, still doesn't work on 1.25, and probably will have even more problems on 1.26. It's likely that cri-dockerd will need to support v1alpha2 (and I'm amazed that cri-o/containerd are not) also just so it isn't a hard break in the codebase.

Even though I don't think we'd diverge much, most CRIs aren't "big enough" to warrant multiple supported releases just so users can have a sane experience rather than adding another column to their deployment (k8s 1.24 -> kubeflow 1.6 -> istio >= 1.10 -> ...). Sure, the release cadence slowed to 3 per year from 4, but it's starting to be worryingly balkanized for vendors who need to support customers who only upgrade, say, every 18-24 months.

@afbjorklund

This comment was marked as outdated.

@endocrimes
Copy link
Author

The issue description reads much more like a release note, which is somewhat bizarre considering that 1.26 is not GA yet. Yes, v1 support will be added by then

I was mostly trying to preempt any issues or confusion with folks who aren't maintainers and might see this issue (we all remember the... confusion and storm around removing dockershim from core 😅, and I wanted to avoid it again).

@endocrimes
Copy link
Author

I completely understand that upstream wants to move fast and that they'd like CRI/CNI/CSI/CXI maintainers to implement APIs rather than keeping everything in core, and I don't expect upstream to have any kind of "LTS" release, but a one year window which includes enough major releases to take something from "deprecation warning" to "this will break now" is really rough for the rest of the ecosystem.

I don't disagree that it's rough, but we also... don't have enough maintainers and keeping around alpha api forever isn't something we can really do (especially when something has reached maturity) because it's super expensive both in maintainer time and test-infra costs.

We do encourage folks from the ecosystem to show up to relevant sigs and discuss their concerns and timelines - For sig-node, that's #sig-node in the Kubernetes slack and https://github.com/kubernetes/community/blob/master/sig-node/README.md#meetings for longer term discussion. I hope to see you there in the future.

@BenTheElder
Copy link
Contributor

If you "upgrade" the API support in cri-dockerd, wouldn't that break kubernetes 1.25 and earlier ?

No you can support both apis as kubernetes did for some time

There are CRI implementations that support both and work across all kubernetes versions with CRI

For example you can see the approach to support both in containerd at containerd/containerd#5619

@BenTheElder
Copy link
Contributor

Kubernetes API support timelines are not owned by SIG node and are well defined. By moving to v1 there will be a longer support timeline than alpha https://kubernetes.io/docs/reference/using-api/deprecation-policy/

At this time there are no plans to remove GA APIs or to move to a v2 of any API.

@evol262
Copy link
Contributor

evol262 commented Nov 6, 2022

I'm not in any way suggesting that the API support timeline is owned by any SIG or that it isn't well-defined. Arbitrarily, this single piece is easy enough for CRI maintainers to do (and we can do both in parallel without too much work).

I am saying, instead, that whether it's well-defined or not, the window between the graduation of any given feature and the deprecation of v1alphaX or v1betaX is frequently short enough that consumers of "downstream" (whether that's docker users, cri-o in openshift, containerd in charmed k8s, etc) may not upgrade at all during the window where any given feature has graduated but the v1[alpha|beta] is available as a bridge.

I've been there, and I understand that the infrastructure is expensive, that backporting bugfixes/changes to deprecated APIs takes a lot of maintainer time, that if you never remove it, some consumers will never bite the bullet. A "carrot and stick" (new features/fixes vs life support only) works sometimes, but it's a question for the broader ecosystem than just any given SIG. For cri-dockerd, containerd, cri-o, or any given individual piece, it's fine. For larger projects which cross boundaries (kubeflow, for example), they're often holding on by their fingertips.

For the "customers" with k8s distros or major projects, it frequently means that they are spending the maintainer time on two completely different k8s releases so their customer base isn't left high and dry. That isn't strictly an upstream problem except insofar as upstream is at least nominally partly comprised of those maintainers. I'll join the meetings.

To be clear, I'm not expecting upstream to change the workflow or timelines. I've been in your position. I also don't think I'd be the first one to bring something up (especially considering the cadence slowed not that long ago), but I'd be remiss if I didn't say something, because this is reminding me of the, uh, good old days of Openstack where the need to graduate incubating changes meant the project moved so fast that almost any large-scale environment's deployment windows meant a full-bore reinstall after the "bridge" functionality was introduced, then deprecated, then removed before they ever had a chance to upgrade at all

@afbjorklund

This comment was marked as resolved.

@afbjorklund
Copy link
Contributor

afbjorklund commented Nov 14, 2022

The beta.0 is out now, making it easier to test this:

minikube start --kubernetes-version=v1.26.0-beta.0 --container-runtime=docker

Currently the kubelet is failing to start, with the error:

"command failed" err="failed to run Kubelet: validate service connection: CRI v1 runtime API is not implemented for endpoint \"/var/run/cri-dockerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"


The actual error shown to the user is a slightly less helpful:

💢 initialization failed, will try again: wait: /bin/bash -c "sudo env PATH="/var/lib/minikube/binaries/v1.26.0-beta.0:$PATH" kubeadm init --config /var/tmp/minikube/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap,Mem,SystemVerification,FileContent--proc-sys-net-bridge-bridge-nf-call-iptables": Process exited with status 1

❌ Exiting due to K8S_KUBELET_NOT_RUNNING: wait: /bin/bash -c "sudo env PATH="/var/lib/minikube/binaries/v1.26.0-beta.0:$PATH" kubeadm init --config /var/tmp/minikube/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,DirAvailable--var-lib-minikube,DirAvailable--var-lib-minikube-etcd,FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml,FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml,FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml,FileAvailable--etc-kubernetes-manifests-etcd.yaml,Port-10250,Swap,Mem,SystemVerification,FileContent--proc-sys-net-bridge-bridge-nf-call-iptables": Process exited with status 1

(trying again didn't exactly help here, but it tries twice...)

minikube ssh -- sudo journalctl -xeu kubelet shows log

@evol262 evol262 added this to the v0.3.0 milestone Nov 14, 2022
@evol262
Copy link
Contributor

evol262 commented Nov 14, 2022

Patch will be going up this week. We've sat still for a little bit since 1.25, but we'll target this, customization via a config file instead of cmdline args only, and re-syncing with the rancher changes to cri-dockerd for the 0.3.0. I'm definitely expecting the "support both runtime variants" to be available in main this week, though.

@afbjorklund
Copy link
Contributor

afbjorklund commented Nov 20, 2022

On the blog now, that containerd 1.5 (and cri-dockerd 0.2) will not be supported for 1.26:

https://kubernetes.io/blog/2022/11/18/upcoming-changes-in-kubernetes-1-26/#cri-api-removal

Not yet updated in the documentation, but I think that will happen with it goes from RC to GA:

https://github.com/containerd/containerd/blob/main/RELEASES.md#kubernetes-support

@medyagh
Copy link

medyagh commented Dec 6, 2022

Patch will be going up this week. We've sat still for a little bit since 1.25, but we'll target this, customization via a config file instead of cmdline args only, and re-syncing with the rancher changes to cri-dockerd for the 0.3.0. I'm definitely expecting the "support both runtime variants" to be available in main this week, though.

is the Patch merged into the code base ? do you mind referencing the PR in this issue ?

@shubham-yewalekar
Copy link

when can we expect this patch to be merged in the codebase as k8s 1.26 GA is released now ?

@evol262
Copy link
Contributor

evol262 commented Dec 9, 2022

Sorry, I didn't forget about this issue, but I got pulled off into customer cases for the past couple of weeks. I expect a patch for this to land today

@camphor-networks
Copy link

Hi, will there be a release soon with a fix for this to be able to deploy K8 1.26+ with docker ?

@evol262
Copy link
Contributor

evol262 commented Dec 12, 2022

Yes, today.

@camphor-networks
Copy link

thanks evol262. I really appreciate your kind support!

@evol262
Copy link
Contributor

evol262 commented Dec 13, 2022

Sorry, it's gonna be one more day until I finish cleaning up the tests. Dual support is a surprisingly large diff...

@camphor-networks
Copy link

camphor-networks commented Dec 13, 2022 via email

@mko237
Copy link

mko237 commented Dec 14, 2022

I'm also running into this issue and hoping for this release. Thanks for your support!

@shanghaojia-1
Copy link

Thank you for your work.

@alexmasi
Copy link

+1

@shubham-yewalekar
Copy link

Any updates on this ? When can we expect the fix to be merged?

@camphor-networks
Copy link

Thanks! I guess v0.2.7 with this fix will be released soon! Thanks once again and happy holidays! @evol262

@evol262
Copy link
Contributor

evol262 commented Dec 20, 2022

Actually released with 0.3.0. It just went up after the packages built

@camphor-networks
Copy link

camphor-networks commented Dec 20, 2022 via email

@shubham-yewalekar
Copy link

Actually released with 0.3.0. It just went up after the packages built

Thanks a lot for the release @evol262

@evol262
Copy link
Contributor

evol262 commented Dec 20, 2022

Great. Will deploy tomorrow and let you know if I face any issue. I hope this release is good and set for years to come :-)

Considering how fast upstream changes the spec, I don't know about year_s_, but maybe year ;)

@afbjorklund
Copy link
Contributor

There is a bug with the fallback endpoints in crictl version 1.26.0, but if you patch that it works OK:

$ sudo ./build/bin/crictl version
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
ERRO[0000] validate service connection: CRI v1 runtime API is not available for endpoint "unix:///var/run/dockershim.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory" 
ERRO[0000] validate service connection: CRI v1 runtime API is not implemented for endpoint "unix:///run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService 
ERRO[0000] validate service connection: CRI v1 runtime API is not available for endpoint "unix:///run/crio/crio.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/crio/crio.sock: connect: no such file or directory" 
Version:  0.1.0
RuntimeName:  docker
RuntimeVersion:  20.10.12
RuntimeApiVersion:  v1

The workaround, until it is fixed, is to set up the /etc/crictl.yaml config file to point at cri-dockerd:

runtime-endpoint: unix:///var/run/cri-dockerd.sock
$ sudo crictl version
Version:  0.1.0
RuntimeName:  docker
RuntimeVersion:  20.10.12
RuntimeApiVersion:  v1
  • cri-dockerd 0.3.0 (0de30fc)
  • crictl version 1.26.0

@imaxun

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.