Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea no longer works with Kind after upgrading to Docker Desktop 4.27.0 #5939

Closed
antoninbas opened this issue Jan 29, 2024 · 3 comments · Fixed by #5979
Closed

Antrea no longer works with Kind after upgrading to Docker Desktop 4.27.0 #5939

antoninbas opened this issue Jan 29, 2024 · 3 comments · Fixed by #5979
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
After upgrading to Docker Desktop 4.27.0 (the latest version as the time of writing this), Antrea will no longer install successfully on a Kind cluster. The install-cni container will fail with:

modprobe: FATAL: Module openvswitch not found in directory /lib/modules/6.6.12-linuxkit
Failed to load the OVS kernel module from the container, try running 'modprobe openvswitch' on your Nodes

To Reproduce
Download and install Docker Desktop 4.27.0: https://docs.docker.com/desktop/release-notes/#4270
Install Antrea with helm install -n kube-system antrea antrea/antrea
Look at Pods with kubectl get pods -A

Expected
Antrea should install successfully

Actual behavior
Antrea Agent Pods cannot be created.

Versions:
Tested with Antrea v1.15.0 and v1.14.2, but this is independent of the Antrea version (specific to the Docker Desktop version).

Additional context
There were some major changes to Docker Desktop (and more precisely the LinuxKit VM) in 4.26 and 4.27.
The 2 breaking changes in our case were introduced in 4.27.0:

  1. The openvswitch kernel module is now built-in and no longer a "loadable" module.
  1. The Kernel configuration was changed, and in particular support for conntrack zones was disabled
@antoninbas antoninbas added the kind/bug Categorizes issue or PR as related to a bug. label Jan 29, 2024
@dgageot
Copy link

dgageot commented Jan 30, 2024

Hi @antoninbas, Docker Desktop maintainer typing here! We're so sorry for breaking your workflow.

We've indeed recently made big changes to Docker Desktop's kernel config in order to make it smaller/faster. Sorry that this has a negative impact on you.

  • Support for conntrack zones will be back soon. As soon as 4.27.1 if we manage to make it part of the hot fix in preparation. Otherwise, it'll for sure be part of 4.28.
  • openvswitch module will probably stay as it is, as a static module. That's how we hope to install all the kernel modules in the future. Do you think it's possible to change your check to recognise such setup?

@antoninbas
Copy link
Contributor Author

@dgageot Thanks for the quick reply.
I also commented at docker/for-mac#7151 (comment).
Yes, we will enhance our implementation to handle that case and skip calling modprobe when the module is already built into the kernel.

@antoninbas antoninbas self-assigned this Jan 30, 2024
@antoninbas
Copy link
Contributor Author

Small update: the conntrack kernel configuration should be fixed in 4.27.2. When that is released (later this month?), we can implement "auto-detection of built-in kernel modules" in the Antrea Agent (for openvswitch) and validate that we can run with Docker Desktop 4.27.2.

antoninbas added a commit to antoninbas/antrea that referenced this issue Feb 10, 2024
If a module is built-in, trying to load the module with modprobe inside
a container may fail (insted of just being a no-op). This will cause
Antrea initialization to fail unless agent.dontLoadKernelModules is
explicitly set to true.

Now that the Docker Desktop LinuxKit VM comes with openvswitch built-in
(starting with 4.27.0), trying to install "default" Antrea (i.e.,
without setting agent.dontLoadKernelModules) in a Kind cluster running
with Docker Desktop on macOS will fail. To make sure that users will not
run into this issue, we add logic to the install_cni script to skip the
modprobe call if the module is built-in.

After this agent, there should be very limited use cases for the
agent.dontLoadKernelModules parameter, but there is no harm in keeping
in case it is needed in the future or for some corner cases.

I also realized that the "--skip-kmod" flag for the start_ovs script did
not provide any value. Either the openvswitch module needs to be
explicitly loaded, in which case the install_cni script will take care
of it anyway, or it should not be loaded at all (e.g., because it is
built-in). Additionally, because we do not mount the host's /lib/modules
to the antrea-ovs container, it is not possible to load any kernel
module from the container.

Fixes antrea-io#5939

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Feb 21, 2024
If a module is built-in, trying to load the module with modprobe inside
a container may fail (insted of just being a no-op). This will cause
Antrea initialization to fail unless agent.dontLoadKernelModules is
explicitly set to true.

Now that the Docker Desktop LinuxKit VM comes with openvswitch built-in
(starting with 4.27.0), trying to install "default" Antrea (i.e.,
without setting agent.dontLoadKernelModules) in a Kind cluster running
with Docker Desktop on macOS will fail. To make sure that users will not
run into this issue, we add logic to the install_cni script to skip the
modprobe call if the module is built-in.

After this change, there should be very limited use cases for the
agent.dontLoadKernelModules parameter, but there is no harm in keeping
in case it is needed in the future or for some corner cases.

I also realized that the "--skip-kmod" flag for the start_ovs script did
not provide any value. Either the openvswitch module needs to be
explicitly loaded, in which case the install_cni script will take care
of it anyway, or it should not be loaded at all (e.g., because it is
built-in). Additionally, because we do not mount the host's /lib/modules
to the antrea-ovs container, it is not possible to load any kernel
module from the container.

Fixes #5939

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Feb 21, 2024
If a module is built-in, trying to load the module with modprobe inside
a container may fail (insted of just being a no-op). This will cause
Antrea initialization to fail unless agent.dontLoadKernelModules is
explicitly set to true.

Now that the Docker Desktop LinuxKit VM comes with openvswitch built-in
(starting with 4.27.0), trying to install "default" Antrea (i.e.,
without setting agent.dontLoadKernelModules) in a Kind cluster running
with Docker Desktop on macOS will fail. To make sure that users will not
run into this issue, we add logic to the install_cni script to skip the
modprobe call if the module is built-in.

After this change, there should be very limited use cases for the
agent.dontLoadKernelModules parameter, but there is no harm in keeping
in case it is needed in the future or for some corner cases.

I also realized that the "--skip-kmod" flag for the start_ovs script did
not provide any value. Either the openvswitch module needs to be
explicitly loaded, in which case the install_cni script will take care
of it anyway, or it should not be loaded at all (e.g., because it is
built-in). Additionally, because we do not mount the host's /lib/modules
to the antrea-ovs container, it is not possible to load any kernel
module from the container.

Fixes antrea-io#5939

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Feb 22, 2024
If a module is built-in, trying to load the module with modprobe inside
a container may fail (insted of just being a no-op). This will cause
Antrea initialization to fail unless agent.dontLoadKernelModules is
explicitly set to true.

Now that the Docker Desktop LinuxKit VM comes with openvswitch built-in
(starting with 4.27.0), trying to install "default" Antrea (i.e.,
without setting agent.dontLoadKernelModules) in a Kind cluster running
with Docker Desktop on macOS will fail. To make sure that users will not
run into this issue, we add logic to the install_cni script to skip the
modprobe call if the module is built-in.

After this change, there should be very limited use cases for the
agent.dontLoadKernelModules parameter, but there is no harm in keeping
in case it is needed in the future or for some corner cases.

I also realized that the "--skip-kmod" flag for the start_ovs script did
not provide any value. Either the openvswitch module needs to be
explicitly loaded, in which case the install_cni script will take care
of it anyway, or it should not be loaded at all (e.g., because it is
built-in). Additionally, because we do not mount the host's /lib/modules
to the antrea-ovs container, it is not possible to load any kernel
module from the container.

Fixes #5939

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants